Contents:

Agilio Open vSwitch
  • Docs »
  • Agilio Open vSwitch documentation

Agilio OVS with TC Offload User’s Guide

The Agilio SmartNIC Architecture

./Conceptual_architecture.png

The conceptual architecture of the Agilio SmartNIC

The Agilio CX SmartNICs are based on the NFP-4000 and are available in low profile PCIe and OCM v2 NIC form factors suitable for use in COTS servers. This is a 60 core processor with eight cooperatively multithreaded threads per core. The flow processing cores have an instruction set that is optimized for networking. This ensures an unrivaled level of flexibility within the data plane while maintaining performance. The OVS datapath can also be enabled without a server reboot.

Further extensions such as BPF offload, SR-IOV or custom offloads can be added without any hardware modifications or even server reboot. These extensions are not covered by this guide, which deals with the basic and OVS-TC offload firmware only.

The basic firmware offers a wide variety of features including RSS (Receive Side Scaling), Checksum Offload (IPv4/IPv6, TCP, UDP, tx/rx), LSO (Large Segmentation Offload), IEEE 802.3ad, Link flow control, 802.1AZ Link Aggregation, etc. For more details regarding currently supported features, refer to Basic Firmware Features.

Hardware Installation

This user guide focuses on x86 deployments of Open vSwitch hardware acceleration in supported versions of Ubuntu 18.04, Red Hat Enterprise Linux (RHEL) 7.5+, and CentOS 7.5. As detailed in Validating the Driver, Netronome’s Agilio SmartNIC firmware is now upstreamed with the latest supported kernel versions of Ubuntu and RHEL/CentOS. Whilst out-of-tree driver source files are available and installation instructions are included in Appendix C: Installing the Out-of-Tree NFP Driver, it is highly recommended, where possible, to make use of the upstreamed drivers. Wherever applicable, separate instructions for RHEL/CentOS and Ubuntu are provided.

Note

All commands in this guide are assumed to be run as root

Identification

The serial number is printed beside a bar code on the outside of the card and is of the form SMAAMDAXXXX-XXXXXXXXXXXX. The AMDAXXXX section denotes the assembly ID. In a running system, the assembly ID and serial number of a PCI device may be determined by using the ethtool debug interface, if the netdev associated with the SmartNIC is known, by passing it as the first argument to the following script:

1
2
3
4
5
6
7
8
#!/bin/bash
DEVICE=$1
ethtool -W ${DEVICE} 0
DEBUG=$(ethtool -w ${DEVICE} data /dev/stdout | strings)
SERIAL=$(echo "${DEBUG}" | grep "^SN:")
ASSY=$(echo ${SERIAL} | grep -oE AMDA[0-9]{4})
echo ${SERIAL}
echo Assembly: ${ASSY}

Consult SmartNIC Netdev Interfaces for methods identifying the netdev.

Note

The strings command is commonly provided by the binutils package. This can be installed by yum install binutils or apt-get install binutils, depending on your distribution.

Physical installation

Physically install the SmartNIC in the host server and ensure proper cooling e.g. airflow over card. Ensure the PCI slot is at least Gen3 x8 (can be placed in Gen3 x16 slot). Once installed, power up the server and open a terminal. Further details and support about the hardware installation process can be reviewed in the Hardware User Manual available from Netronome’s support site.

Validation

Use the following command to validate that the SmartNIC is being correctly detected by the host server and to identify its PCI address:

# lspci -Dnnd 19ee:4000; lspci -Dnnd 19ee:6000
0000:02:00.0 Ethernet controller [0200]: Netronome Systems, Inc. Device    [19ee:4000]

Note

The lspci command is commonly provided by the pciutils package. This can be installed by yum install pciutils or apt-get install pciutils, depending on your distribution.

Validating the Driver

The Netronome SmartNIC physical function driver with support for OVS-TC offload is included in Linux 4.13 and later kernels. The list of minimum required operating system distributions and their respective kernels which include the nfp driver are as follows:

Operating System Kernel package version
RHEL/CentOS 7.5 3.10.0-862.el7
RHEL/CentOS 7.6 3.10.0-957.el7
RHEL 8.0 4.18.0-80.el8
Ubuntu 18.04 LTS 4.15.0-20.21

Note

Only the x86_64 architecture has been verified, if support for other architectures is required please contact Netronome support: Contact Us.

Confirm Upstreamed NFP Driver

To confirm that your current operating system supplies the upstreamed nfp module, you can use the modinfo command:

# modinfo nfp | head -3
filename:
/lib/modules/3.10.0-862.el7/kernel/drivers/net/ethernet/netronome/nfp/nfp.ko.xz
description:    The Netronome Flow Processor (NFP) driver.
license:        GPL

Note

If the module is not found in your current kernel, refer to Appendix C: Installing the Out-of-Tree NFP Driver for more instructions, or upgrade your distributions and kernel version to a version that includes the upstreamed drivers.

Confirm that the NFP Driver is Loaded

Use lsmod to list the loaded driver modules and look for the nfp string:

# lsmod | grep nfp
nfp                   161364  0

If the NFP driver is not loaded, the following command loads it manually:

# modprobe nfp

Validating the Firmware

Netronome SmartNICs are fully programmable devices and depend on the driver to load firmware onto the device at runtime. It is important to note that the functionality of the SmartNIC significantly depends on the firmware loaded. The firmware files should be present in the following directory (contents may vary depending on the installed firmware and distribution layout):

# ls -ogR --time-style="+" /lib/firmware/netronome/
/lib/firmware/netronome/:
total 8
drwxr-xr-x. 2 4096  flower
drwxr-xr-x. 2 4096  nic
lrwxrwxrwx  1   31  nic_AMDA0081-0001_1x40.nffw -> nic/nic_AMDA0081-0001_1x40.nffw
lrwxrwxrwx  1   31  nic_AMDA0081-0001_4x10.nffw -> nic/nic_AMDA0081-0001_4x10.nffw
lrwxrwxrwx  1   31  nic_AMDA0096-0001_2x10.nffw -> nic/nic_AMDA0096-0001_2x10.nffw
lrwxrwxrwx  1   31  nic_AMDA0097-0001_2x40.nffw -> nic/nic_AMDA0097-0001_2x40.nffw
lrwxrwxrwx  1   36  nic_AMDA0097-0001_4x10_1x40.nffw -> nic/nic_AMDA0097-0001_4x10_1x40.nffw
lrwxrwxrwx  1   31  nic_AMDA0097-0001_8x10.nffw -> nic/nic_AMDA0097-0001_8x10.nffw
lrwxrwxrwx  1   36  nic_AMDA0099-0001_1x10_1x25.nffw -> nic/nic_AMDA0099-0001_1x10_1x25.nffw
lrwxrwxrwx  1   31  nic_AMDA0099-0001_2x10.nffw -> nic/nic_AMDA0099-0001_2x10.nffw
lrwxrwxrwx  1   31  nic_AMDA0099-0001_2x25.nffw -> nic/nic_AMDA0099-0001_2x25.nffw
lrwxrwxrwx  1   34  pci-0000:04:00.0.nffw -> flower/nic_AMDA0097-0001_2x40.nffw
lrwxrwxrwx  1   34  pci-0000:06:00.0.nffw -> flower/nic_AMDA0096-0001_2x10.nffw

/lib/firmware/netronome/flower:
total 11692
lrwxrwxrwx. 1      17  nic_AMDA0081-0001_1x40.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0081-0001_4x10.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0096-0001_2x10.nffw -> nic_AMDA0096.nffw
-rw-r--r--. 1 3987240  nic_AMDA0096.nffw
lrwxrwxrwx. 1      17  nic_AMDA0097-0001_2x40.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0097-0001_4x10_1x40.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0097-0001_8x10.nffw -> nic_AMDA0097.nffw
-rw-r--r--. 1 3988184  nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0099-0001_2x10.nffw -> nic_AMDA0099.nffw
lrwxrwxrwx. 1      17  nic_AMDA0099-0001_2x25.nffw -> nic_AMDA0099.nffw
-rw-r--r--. 1 3990552  nic_AMDA0099.nffw

/lib/firmware/netronome/nic:
total 12220
-rw-r--r--. 1 1380496  nic_AMDA0081-0001_1x40.nffw
-rw-r--r--. 1 1389760  nic_AMDA0081-0001_4x10.nffw
-rw-r--r--. 1 1385608  nic_AMDA0096-0001_2x10.nffw
-rw-r--r--. 1 1385664  nic_AMDA0097-0001_2x40.nffw
-rw-r--r--. 1 1391944  nic_AMDA0097-0001_4x10_1x40.nffw
-rw-r--r--. 1 1397880  nic_AMDA0097-0001_8x10.nffw
-rw-r--r--. 1 1386616  nic_AMDA0099-0001_1x10_1x25.nffw
-rw-r--r--. 1 1385608  nic_AMDA0099-0001_2x10.nffw
-rw-r--r--. 1 1386368  nic_AMDA0099-0001_2x25.nffw

If netronome/flower is not present, the linux-firmware package on the system is probably outdated and does not contain the upstreamed OVS-TC firmware. Refer to Appendix H: Upgrading TC Firmware for upgrade instructions. The NFP driver will search for firmware in /lib/firmware/netronome in the following order:

1: serial-_SERIAL_.nffw
2: pci-_PCI_ADDRESS_.nffw
3: nic-_ASSEMBLY-TYPE___BREAKOUTxMODE_.nffw

This search is logged by the kernel when the driver is loaded. For example:

# dmesg | grep -A 4 nfp.*firmware
[  3.260788] nfp 0000:04:00.0: nfp: Looking for firmware file in order of priority:
[  3.260810] nfp 0000:04:00.0: nfp:   netronome/serial-00-15-4d-13-51-0c-10-ff.nffw: not found
[  3.260820] nfp 0000:04:00.0: nfp:   netronome/pci-0000:04:00.0.nffw: not found
[  3.262138] nfp 0000:04:00.0: nfp:   netronome/nic_AMDA0097-0001_2x40.nffw: found, loading...

The version of the loaded firmware for a particular netdev interface, as found in SmartNIC Netdev Interfaces (for example enp4s0), or a physical port representor (for example, enp4s0np0) can be displayed with the ethtool command:

# ethtool -i enp4s0np0
driver: nfp
version: 3.10.0-862.el7.x86_64 SMP mod_u
firmware-version: 0.0.3.5 0.20 nic-2.0.7 nic
expansion-rom-version:
bus-info: 0000:04:00.0

Firmware versions are displayed in order: NFD version, NSP version, APP FW version, driver APP. The specific output above shows that basic NIC firmware is running on the card, as indicated by nic in the firmware-version field.

Selecting the TC Offload Firmware

In order to initialize the SmartNIC with the TC offload firmware, a symbolic link based on the PCI address of the SmartNIC should be created to the desired firmware. When the kernel module is loaded, it will load the specified firmware instead of the default CoreNIC firmware. The TC offloaded firmware is located in the netronome/flower directory in lib/firmware.

Review SmartNIC Netdev Interfaces to identify the SmartNIC’s netdev. The script in Identification details how to identify the SmartNIC’s assembly.

The following script extract illustrates how to create and persist this symbolic link:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
DEVICE=${1}
DEFAULT_ASSY=scan
ASSY=${2:-${DEFAULT_ASSY}}
APP=${3:-flower}

if [ "x${DEVICE}" = "x" -o ! -e /sys/class/net/${DEVICE} ]; then
    echo Syntax: ${0} device [ASSY] [APP]
    echo
    echo This script associates the TC Offload firmware
    echo with a Netronome SmartNIC.
    echo
    echo device: is the network device associated with the SmartNIC
    echo ASSY: defaults to ${DEFAULT_ASSY}
    echo APP: defaults to flower. flower-next is supported if updated
    echo      firmware has been installed.
    exit 1
fi

# It is recommended that the assembly be determined by inspection
# The following code determines the value via the debug interface
if [ "${ASSY}x" = "scanx" ]; then
    ethtool -W ${DEVICE} 0
    DEBUG=$(ethtool -w ${DEVICE} data /dev/stdout | strings)
    SERIAL=$(echo "${DEBUG}" | grep "^SN:")
    ASSY=$(echo ${SERIAL} | grep -oE AMDA[0-9]{4})
fi

PCIADDR=$(basename $(readlink -e /sys/class/net/${DEVICE}/device))
FWDIR="/lib/firmware/netronome"

# AMDA0081 and AMDA0097 uses the same firmware
if [ "${ASSY}" = "AMDA0081" ]; then
    if [ ! -e ${FWDIR}/${APP}/nic_AMDA0081.nffw ]; then
       ln -sf nic_AMDA0097.nffw ${FWDIR}/${APP}/nic_AMDA0081.nffw
   fi
fi

FW="${FWDIR}/pci-${PCIADDR}.nffw"
ln -sf "${APP}/nic_${ASSY}.nffw" "${FW}"

# insert distro-specific initramfs section here...

For RHEL 7.5+ and CentOS 7.5 systems, it is recommended to append the following snippet:

42
43
44
45
# RHEL 7.5+ and CentOS 7.5 distro-specific initramfs section
DRACUT_CONF=/etc/dracut.conf.d/98-nfp-firmware.conf
echo "install_items+=\" ${FW} \"" > "${DRACUT_CONF}"
dracut -f

This adds the symlink and firmware to the initramfs. Alternatively, for Ubuntu 18.04 systems, append the following snippet, instead:

42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Ubuntu 18.04 distro-specific initramfs section
HOOK=/etc/initramfs-tools/hooks/agilio_firmware
cat >${HOOK} << EOF
#!/bin/sh
PREREQ=""
prereqs()
{
    echo "\$PREREQ"
}
case "\$1" in
prereqs)
    prereqs
    exit 0
    ;;
esac
. /usr/share/initramfs-tools/hook-functions
cp "${FW}" "\${DESTDIR}${FW}"
if have_module nfp ; then
    manual_add_modules nfp
fi
exit 0
EOF
chmod a+x "${HOOK}"
update-initramfs -u

As an example:

  • The script has been assembled into ./agilio-tc-fw-select.sh
  • A netdev associated with the SmartNIC is p5p1
  • The user wishes to auto-detect the Assembly ID
# ./agilio-tc-fw-select.sh p5p1 scan
# rmmod nfp
# modprobe nfp

If the out-of-tree firmware repository has been installed (as described in Appendix H: Upgrading TC Firmware) and the user wishes to select that instead:

# ./agilio-tc-fw-select.sh p5p1 scan flower-next
# rmmod nfp
# modprobe nfp

Verify Firmware is Loaded

The firmware should indicate that it has the FLOWER capability. This can be confirmed by inspecting the kernel message buffer using dmesg:

# dmesg | grep nfp
[ 3131.714215] nfp 0000:04:00.0 eth4: Netronome NFP-6xxx Netdev: TxQs=8/8 RxQs=8/8
[ 3131.714221] nfp 0000:04:00.0 eth4: VER: 0.0.5.5, Maximum supported MTU: 9420
[ 3131.714227] nfp 0000:04:00.0 eth4: CAP: 0x20140673 PROMISC RXCSUM TXCSUM RXVLAN GATHER TSO1 RSS2 AUTOMASK IRQMOD FLOWER

Loading of flower firmware may also be confirmed using ethtool. AOTC indicates that OVS-TC firmware was loaded, as does flow. e.g.:

# ethtool -i ens3np0
driver: nfp
version: 3.10.0-862.el7.x86_64 SMP mod_u
firmware-version: 0.0.5.5 0.22 0AOTC28A.5642 flow
expansion-rom-version:
bus-info: 0000:04:00.0

SmartNIC Netdev Interfaces

Representors

Representor netdevs, or representors, are netdevs created to represent the switch-side of a port. When Flower firmware for Agilio CX SmartNIC is loaded the following netdevs are created:

  • A netdev for the PCI physical function (PF) to represent the PCI connection between the host and the card.
  • Representor netdevs for each physical port (MAC) of the card. These are used to allow configuration, for example of link state, of the port, to access statistics of the port and to carry fallback traffic. Fallback traffic are packets which are not handled by the datapath on the SmartNIC, usually because there is no matching rule present, and thus sent to the host for processing.
  • A representor netdev for the PF. This is not currently used in an OVS-TC system.

When SR-IOV VFs (virtual functions) are instantiated, a representor netdev is created for each VF. Like representors for physical ports, these are used for configuration, statistics and fallback packets. When using OVS-TC it is the physical port representor netdevs, and VF representor netdevs that are attached to OVS which then allow OVS to configure the associated ports and VFs to send and receive fallback packets.

Identification

To identify the Agilio NIC interfaces, begin by identifying the physical function and physical port representor names. This may be determined by examining the netdevs of the PF PCI devices for the Agilio NIC. These PCI devices may be determined using the lspci tool to list devices with Netronome vendor:device tuples (19ee:4000 and 19ee:6000). The netdevs associated with these devices may be determined by examining sysfs:

1
2
3
#!/bin/bash
BDFS=$({ lspci -Dmmd 19ee:4000; lspci -Dmmd 19ee:6000; } | cut -f 1 -d " ")
for i in $BDFS; do ls /sys/bus/pci/drivers/nfp/$i/net/; done

An example output of this would be:

enp4s0np0  enp4s0np1  p6p1

Where enp4s0np0 and enp4s0np1 are the physical port representors and p6p1 is the physical function netdev:

The naming scheme for each port and physical function is dependent on the motherboard and the PCI slot into which the NFP is installed. The PF name should be that associated with the PCI slot and the physical port representor names should be the PF name with np[x] appended.

Note

Platform and BIOS configuration as well as enabling biosdevname can affect the port naming policies.

To confirm that the representor enp4s0np0 is a physical port, verify the contents of the following file in sysfs:

# cat /sys/class/net/enp4s0np0/phys_port_name
p0

The physical ports will report the physical port name, while the physical function (in this case p6p1) will report an error.

# cat /sys/class/net/p6p1/phys_port_name
cat /sys/class/net/p6p1/phys_port_name: Operation not supported

Once a physical port name has been determined, it is possible to determine the phys_switch_id of the the NFP. This is required to determine the names of the VF representors when multiple NFPs are installed in a host. If an NFP has more than one physical port, both ports will share the same phys_switch_id. The PF will report an error when its phys_switch_id is queried. For example, the phys_switch_id of the device for which enp4s0np0 is a physical port, is:

# cat /sys/class/net/enp4s0np0/phys_switch_id
00154d13510c

Please refer to the section Configuring SR-IOV for information on how to instantiate VFs.

To identify VF representors, query the devices listed in /sys/class/net for phys_port_name and phys_switch_id. VFs will share the switch id and report their individual VF number in the form p0vf[x]. To the following script creates a translation variable in bash that translates from VF index to interface name:

1
2
3
4
5
6
7
8
9
#!/bin/bash
declare -A vf_repr_ifname
for ifname in $(ls /sys/class/net); do
    pn=$(cat /sys/class/net/${ifname}/phys_port_name 2> /dev/null)
    [ "x${pn}" != "x" ] || continue
    vfidx=$(echo "${pn}" | sed -rn 's/pf0vf([0-9]+)$/\1/p')
    [ "x${vfidx}" != "x" ] || continue
    vf_repr_ifname[${vfidx}]="${ifname}"
done

Note

This operation is not atomic and so any other subsystem that renames the network devices may invalidate this table.

The virtual functions associated with a PF PCI address are symlinked into the sysfs directory associated with the PF PCI device. For example, if the PF is located at 0000:04:00.0, VF1 would be at 0000:04:08.1, and VF9 would be at 000:04:09.1. In /sys/bus/pci/devices/0000:04:00.0/ virtfn0 and virtfn9 would link to those addresses:

# ls -og --time-style="+" /sys/bus/pci/drivers/nfp/0000:04:00.0/virtfn[19]
lrwxrwxrwx 1 0  /sys/bus/pci/drivers/nfp/0000:04:00.0/virtfn1 -> ../0000:04:08.1
lrwxrwxrwx 1 0  /sys/bus/pci/drivers/nfp/0000:04:00.0/virtfn9 -> ../0000:04:09.1

Support for biosdevname

Netronome NICs support biosdevname netdev naming with recent versions of the utility, circa December 2018, e.g. RHEL 8.0 and up. Furthermore, biosdevname will only be supported on kernel v4.19+. There are some notable points to be aware of:

  • Whenever an unsupported netdev is considered for naming, the biosdevname naming will be skipped and the next inline naming scheme will take preference, e.g. the systemd naming policies.
  • Netdevs in breakout mode are not supported for naming.
  • VF netdevs will still be subject to biosdevname naming irrespective of the breakout mode of other netdevs.
  • Physical function netdevs are not supported for naming.
  • PF and VF representor netdevs are not supported for naming.
  • When using an older version of the biosdevname utility or an older kernel, users will observe inconsistent naming of netdevs.

To disable biosdevname users can add biosdevname=0 to the kernel command line.

Refer to the online biosdevname documentation for more details about the naming policy convention that will be applied.

Install Open vSwitch

Installation From a Recent Distribution

The preferred method of installing and upgrading Open vSwitch is through the distribution repositories. The minimum recommended versions are those provided in supported releases of distributions. As a guide they are as follows:

Operating System Version
RHEL 7.5 2.9.0-19.el7fdp
CentOS 7.5+ 2.9.0-3.el7
RHEL 7.6 2.9.0-55.el7fdp
RHEL 8.0 2.11.0-9.el8fdp
Ubuntu 18.04 LTS 2.9.0-0ubuntu1

RHEL 7.5+

Please refer to Appendix B: Red Hat Repositories for information on configuring Red Hat repositories. Once the repositories are configured, install Open vSwitch using yum:

# yum install openvswitch

CentOS 7.5+

For CentOS it is recommended to add OpenStack repositories from RDO. This can be achieved by using yum directly. First install yum-utils to get the yum-config-manger utility, then install the repository:

# yum install yum-utils
# yum install centos-release-openstack-rocky

It is recommended to disable this repository by default and only enable it for the Open vSwitch install:

# yum-config-manager --disable centos-openstack*

Install Open vSwitch by temporarily enabling the repository for the specific yum call:

# yum install --enablerepo centos-openstack-rocky openvswitch

At the time of writing this will install openvswitch-2.10.1-3.

Ubuntu 18.04 LTS

In Ubuntu, Open vSwitch can be installed using apt-get:

# apt-get update
# apt-get install openvswitch-switch

Check OVS Install

If the installation procedure completed successfully, systemctl status openvswitch.service should return the service status. More information on using Open vSwitch is provided later in Using Open vSwitch.

Installation from Source

Installing Open vSwitch from source is only recommended for developers, the official Open vSwitch Git Repository contains instructions on how to build the packages from the source for each supported operating system.

Using the Linux Driver

Configuring SR-IOV

To configure SR-IOV virtual functions, ensure that SR-IOV is enabled in the BIOS of the host machine. If SR-IOV is disabled or unsupported by the motherboard/chipset being used, the kernel message log will contain a PCI SR-IOV:-12 error when trying to create a VF. This can be queried using the dmesg tool. The number of supported virtual functions on a netdev is exposed by sriov_totalvfs in sysfs. For example, if ens3 is the interface associated with the SmartNIC’s physical function, the following command will return the total supported number of VF’s:

# cat /sys/class/net/ens3/device/sriov_totalvfs
55

Virtual functions can be allocated to an network interface by writing an integer to the sysfs file. For example, to allocate 16 virtual functions to ens3:

# echo 16 > /sys/class/net/ens3/device/sriov_numvfs

Note

The current Netronome cards supporting TC offload only have a single PF. This means that, even though the SR-IOV interfaces are exposed on the PF netdev and the physical port representors, they refer to the same underlying physical function. It is therefore an error to attempt to allocate VF’s to multiple physical port representors.

See Open vSwitch Hardware Offload Example for a practical application. SR-IOV Virtual functions cannot be re-allocated dynamically. In order to change the number of allocated virtual functions, existing functions must first be deallocated by writing a 0 to the sysfs file. Otherwise, the system will return a device or resource busy error:

# echo 0 > /sys/class/net/ens3/device/sriov_numvfs

Note

Ensure any VMs are shut down and applications that may be using the VFs are stopped before deallocation.

In order to persist the virtual functions on the system, it is suggested that the system networking scripts be updated to manage them. The following snippet illustrates how to implement such a configuration with NetworkManager for the physical function ens3:

1
2
3
4
5
6
7
8
cat >/etc/NetworkManager/conf.d/nfp.conf << EOF
[keyfile]
unmanaged-devices=driver:nfp,driver:nfp_netvf,except:interface-name=ens3
[device]
match-device=interface-name:ens3
sriov-num-vfs=4
EOF
systemctl restart NetworkManager

This will setup NetworkManager to create 4 VF interfaces connected to the PF on ens3.

Note

It is recommended to prevent NetworkManager from managing all NFP interfaces other than the PF. Having NetworkManager manage the representor interfaces can interfere with the operation of OVS-TC.

In Ubuntu systems, networkd-dispatcher can be used in place of NetworkManager, as demonstrated below:

1
2
3
4
5
6
7
8
#!/bin/sh
cat > /usr/lib/networkd-dispatcher/routable.d/50-ifup-noaddr << 'EOF'
#!/bin/sh
ip link set mtu 9420 dev ens3
ip link set up dev ens3
cat /sys/class/net/ens3/device/sriov_totalvfs > /sys/class/net/ens3/device/sriov_numvfs
EOF
chmod u+x /usr/lib/networkd-dispatcher/routable.d/50-ifup-noaddr

Configuring Interface Media Mode

This section details the configuration of the SmartNIC physical interfaces.

Note

For older kernels that do not support the configuration methods outlined below, please refer to Appendix D: Working with Board Support Package on how to make use of the BSP toolkit to configure interfaces.

Configuring interface Maximum Transmission Unit (MTU)

The MTU of interfaces can temporarily be set using the iproute2 or ifconfig tools. Note that this change will not persist. Setting this via Network Manager, or other appropriate OS configuration tool, is recommended.

Set interface ens3np0’s MTU to 9000 bytes:

# ip link set dev ens3np0 mtu 9000

It is the responsibility of the user or the orchestration layer to set appropriate MTU values when handling jumbo frames or utilizing tunnels. For example, if packets sent from a VM are to be encapsulated on the card and egress a physical port, then the MTU of the VF should be set to lower than that of the physical port to account for the extra bytes added by the additional header.

If a setup is expected to see fallback traffic between the SmartNIC and the kernel then the user should also ensure that the PF MTU is appropriately set to avoid unexpected drops on this path.

Configuring FEC modes

Agilio CX 2x25GbE SmartNICs support FEC mode configuration, e.g. Auto, Firecode BaseR, Reed Solomon and Off modes. Each physical port’s FEC mode can be set independently via the ethtool command. To view the currently supported FEC modes of the interface use the following:

# ethtool ens3np0
Settings for ens3np0:
    Supported ports: [ FIBRE ]
    Supported link modes:   Not reported
    Supported pause frame use: No
    Supports auto-negotiation: No
    Supported FEC modes: None BaseR RS
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: BaseR RS
    Speed: 25000Mb/s
    Duplex: Full
    Port: Direct Attach Copper
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: on
    Link detected: yes

The output above details which FEC modes are supported for this interface. Note that the Agilio CX 2x25GbE SmartNIC used for the example above only supports Firecode BaseR FEC mode on ports that are forced to 10G speed.

Note

ethtool FEC support is only available in kernel 4.14 and newer or RHEL 7.5+ CentOS 7.5, and equivalent distributions. The Netronome upstream kernel driver provides ethtool FEC support from kernel 4.15. Furthermore, the SmartNIC NVRAM version must be at least 020025.020025.02006e to support ethtool FEC get/set operations.

To determine your version of the current SmartNIC NVRAM, examine the kernel message buffer:

# dmesg | grep 'nfp.*BSP'
[2387.682046] nfp 0000:82:00.0: BSP: 020025.020025.020072

This example lists a version of 020025.020025.020072 which is sufficient to support ethtool FEC mode configuration. To update your SmartNIC NVRAM flash, refer to Appendix E: Updating NFP Flash or contact Netronome support.

If the SmartNIC NVRAM or the kernel does not support ethtool modification of FEC modes, no supported FEC modes will be listed in the ethtool output for the port. This could be because of an outdated kernel version or an unsupported distribution (e.g. Ubuntu 16.04, irrespective of the kernel version).

# ethtool enp130s0np0
Settings for enp130s0np0:
...
Supported FEC modes: None

To show the currently active FEC mode for either the netdev or the physical port representors:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: Auto

To force the FEC mode for a particular port, autonegotiation must be disabled with the following:

# ip link set enp130s0np0 down
# ethtool -s enp130s0np0 autoneg off
# ip link set enp130s0np0 up

Note

In order to change the autonegotiation configuration the port must be down.

Note

Changing the autonegotiation configuration will not affect the SmartNIC port speed. Please see Configuring interface link-speed to adjust this setting.

To modify the FEC mode to Firecode BaseR:

# ethtool --set-fec enp130s0np0 encoding baser

Verify the newly selected mode:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: BaseR

To modify the FEC mode to Reed Solomon:

# ethtool --set-fec enp130s0np0 encoding rs

Verify the newly selected mode:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: RS

To modify the FEC mode to Off:

# ethtool --set-fec enp130s0np0 encoding off

Verify the newly selected mode:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: Off

Revert back to the default Auto setting:

# ethtool --set-fec enp130s0np0 encoding auto

Verify the setting again:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: Auto

Note

FEC and auto negotiation settings are persisted on the SmartNIC across reboots.

Setting Interface Breakout Mode

The following commands only work on kernel versions 4.13 and later. If your kernel is older than 4.13 or you do not have devlink support enabled, refer to the following section on configuring interfaces: Configure Media Settings.

Note

Breakout mode settings are only applicable to Agilio CX 40GbE and CX 2x40GbE SmartNICs.

Determine the card’s PCI address:

# lspci -Dkd 19ee:4000
0000:04:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
    Subsystem: Netronome Systems, Inc. Device 4001
    Kernel driver in use: nfp
    Kernel modules: nfp

List the devices:

# devlink dev show
pci/0000:04:00.0

Split the first physical 40G port from 1x40G to 4x10G ports:

# devlink port split pci/0000:04:00.0/0 count 4

Split the second physical 40G port from 1x40G to 4x10G ports:

# devlink port split pci/0000:04:00.0/4 count 4

If the SmartNIC’s port is already configured in breakout mode (it has already been split) then devlink will respond with an argument error. Whenever change to the port configuration are made, the original netdevs associated with the port will be removed from the system.

# dmesg | tail
[ 5696.432306] nfp 0000:04:00.0: nfp: Port #0 config changed, unregistering. Driver reload required before port will be operational again.
[ 6270.553902] nfp 0000:04:00.0: nfp: Port #4 config changed, unregistering. Driver reload required before port will be operational again.

The driver needs to be reloaded for the changes to take effect. Older driver/SmartNIC NVRAM versions may require a system reboot for changes to take effect. The driver communicates events related to port split/unsplit in the system logs. The driver may be reloaded with the following command:

# rmmod nfp; modprobe nfp

After reloading the driver, the netdevs associated with the split ports will be available for use:

# ip link show
...
68: enp4s0np0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
69: enp4s0np0s1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
70: enp4s0np0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
71: enp4s0np0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
72: enp4s0np1s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
73: enp4s0np1s1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
74: enp4s0np1s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
75: enp4s0np1s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

Note

There is an ordering constraint to splitting and unsplitting the ports on Agilio CX 2x40GbE SmartNICs. The first physical 40G port cannot be split without the second physical port also being split, hence 1x40G + 4x10G is always invalid even if it’s only intended to be a transitional mode. The driver will reject such configurations.

Breakout mode persists on the SmartNIC across reboots. To revert back to the original 2x40G ports use the unsplit subcommand.

To unsplit port 1:

# devlink port unsplit pci/0000:04:00.0/4

To unsplit port 0:

# devlink port unsplit pci/0000:04:00.0/0

The NFP drivers will again have to be reloaded (rmmod nfp then modprobe nfp) for unsplit changes in the port configuration to take effect.

Confirming Connectivity

Allocating IP Addresses

Under RHEL 7.5+ and CentOS 7.5+, the network configuration is managed by default using NetworkManager. It is recommended to disable NetworkManager on the NFP interfaces when using OVS-TC, as it can interfere with the TC rules that get installed on the interfaces. The easiest way to achieve this is to configure NetworkManager to ignore interfaces which are bound to nfp drivers. The config file for this can be created by:

cat >/etc/NetworkManager/conf.d/nfp.conf << EOF
[keyfile]
unmanaged-devices=driver:nfp,driver:nfp_netvf,except:interface-name=ens1
EOF
systemctl restart NetworkManager

Verification can be done by looking at the output of nmcli d before and after the commands above. All the interfaces that are bound to the nfp or nfp_netvf driver, except the PF ens1, should now be in the unmanaged state.

Use iproute2 to configure an IP on the port for a quick connectivity test. Remember to also make sure that the PF is up, ens1 in the example below:

# ip address add 10.0.0.2/24 dev ens1np0
# ip link set ens1np0 up
# ip link set ens1 up

Pinging interfaces

After you have successfully assigned IP addresses to the NFP interfaces, perform a ping to another address on the same subnet to test to confirm connectivity:

# ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.067 ms
64 bytes from 10.0.0.2: icmp_seq=4 ttl=64 time=0.062 ms

Basic Firmware Features

In this section ethtool will be used to view and configure SmartNIC interface parameters.

View Interface Parameters

The -k flag can be used to view current interface configurations. For example, using a Agilio CX 1x40GbE NIC with a physical port representor enp4s0np0:

# ethtool -k enp4s0np0
Features for enp4s0np0:
rx-checksumming: off [fixed]
tx-checksumming: off
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: off
tx-scatter-gather: off [fixed]
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
busy-poll: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
rx-udp_tunnel-port-offload: off [fixed]

Setting Interface Settings

Unless otherwise stated, changing the interface settings detailed below will not require reloading of the NFP drivers for changes to take effect, unlike the interface breakouts described in Configuring Interface Media Mode. If the interface has more than one physical port, changes must be applied to the physical function netdev and those settings will reflect on both ports. Unlike the basic CoreNIC firmware, each physical port on the interface cannot be configured independently and attempting to do so will produce an error. In this section, ens3 will be used as an example of a physical function netdev.

Receive Checksum Offload

When enabled, checksum calculation and error checking comparison for received packets is offloaded to the NFP SmartNIC’s flow processor rather than the host CPU.

To enable receive checksum offload:

# ethtool -K ens3 rx on

To disable receive checksum offload:

# ethtool -K ens3 rx off

Transmit Checksum Offload

When enabled, checksum calculation for outgoing packets is offloaded to the NFP SmartNIC’s flow processor rather than the host’s CPU.

To enable transmit checksum offload:

# ethtool -K ens3 tx on

To disable transmit checksum offload:

# ethtool -K ens3 tx off

Scatter/Gather

When enabled the NFP will use scatter/gather I/O, also known as Vectored I/O, which allows a single procedure call to sequentially read data from multiple buffers and write it to a single data stream. Only changes to the scatter-gather interface settings (from on to off or off to on) will produce a terminal output as shown below:

# ethtool -K ens3 sg on
Actual changes:
scatter-gather: on
        tx-scatter-gather: on
generic-segmentation-offload: on

# ethtool -K ens3 sg off
Actual changes:
scatter-gather: on
        tx-scatter-gather: on
generic-segmentation-offload: on

TCP Segmentation Offload (TSO)

When enabled, this parameter causes all functions related to the segmentation of TCP packets at egress to be offloaded to the NFP.

To enable TCP segmentation offload:

# ethtool -K ens3 tso on

To disable TCP segmentation offload:

# ethtool -K ens3 tso off

Generic Segmentation Offload (GSO)

This parameter offloads segmentation for transport layer protocol data units other than segments and datagrams for TCP/UDP respectively to the NFP. GSO operates at packet egress.

To enable generic segmentation offload:

# ethtool -K ens3 gso on

To disable generic segmentation offload:

# ethtool -K ens3 gso off

Generic Receive Offload (GRO)

This parameter enables software implementation of Large Receive Offload (LRO), which aggregates multiple packets at ingress into a large buffer before they are passed higher up the networking stack.

To enable generic receive offload:

# ethtool -K ens3 gro on

To disable generic receive offload:

# ethtool -K ens3 gro off

Note

Take note that scripts that use ethtool -i ${INTERFACE} to get bus-info will not work on representors as this information is not populated for representor devices.

Using Open vSwitch

Running Open vSwitch

RHEL 7.5+ and CentOS 7.5

Start Open vSwitch:

# systemctl start openvswitch

Check status of Open vSwitch:

# systemctl status openvswitch
● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled)
      Active: active (exited) since Mon 2018-05-07 11:18:16 SAST; 2min 13s ago
    Process: 130744 ExecStop=/bin/true (code=exited, status=0/SUCCESS)
    Process: 131101 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 131101 (code=exited, status=0/SUCCESS)

  May 07 11:18:16 r730-dev-51 systemd[1]: Starting Open vSwitch...
  May 07 11:18:16 r730-dev-51 systemd[1]: Started Open vSwitch.

The openvswitch service controls the ovsdb-server and ovs-vswitchd services. Check their status too:

# systemctl status ovsdb-server
● ovsdb-server.service - Open vSwitch Database Unit
   Loaded: loaded (/usr/lib/systemd/system/ovsdb-server.service; static;
                      vendor preset: disabled)
   Active: active (running) since Mon 2018-05-07 11:18:16 SAST; 5min ago
 Process: 130869 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd
   stop (code=exited, status=0/SUCCESS)
 Process: 130898 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd
   --no-monitor --system-id=random --ovs-user=#{OVS_USER_ID} start $OPTIONS
 (code=exited, status=0/SUCCESS)
   Process: 130895 ExecStartPre=/usr/bin/chown #{OVS_USER_ID} /var/run/openvswitch
 (code=exited, status=0/SUCCESS)
Main PID: 130939 (ovsdb-server)
   Tasks: 1
  CGroup: /system.slice/ovsdb-server.service
             `-130939 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer
             -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock...


             May 07 11:18:16 r730-dev-51 systemd[1]: Starting Open vSwitch Database Unit...
             May 07 11:18:16 r730-dev-51 runuser[130932]: pam_unix(runuser:session): sess...)
             May 07 11:18:16 r730-dev-51 runuser[130932]: pam_unix(runuser:session): sess...h
             May 07 11:18:16 r730-dev-51 runuser[130936]: pam_unix(runuser:session): sess...)
             May 07 11:18:16 r730-dev-51 runuser[130936]: pam_unix(runuser:session): sess...h
             May 07 11:18:16 r730-dev-51 ovs-ctl[130898]: Starting ovsdb-server [  OK  ]
             May 07 11:18:16 r730-dev-51 ovs-vsctl[130940]: ovs|00001|vsctl|INFO|Called a...0
             May 07 11:18:16 r730-dev-51 ovs-vsctl[130946]: ovs|00001|vsctl|INFO|Called as...
             May 07 11:18:16 r730-dev-51 ovs-ctl[130898]: Configuring Open vSwitch system...]
             May 07 11:18:16 r730-dev-51 systemd[1]: Started Open vSwitch Database Unit.
             Hint: Some lines were ellipsized, use -l to show in full.
# systemctl status ovs-vswitchd
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static;
                      vendor preset: disabled)
     Active: active (running) since Mon 2018-05-07 11:18:16 SAST; 6min ago
   Process: 130747 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop
       (code=exited, status=0/SUCCESS)
   Process: 130955 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server
       --no-monitor --system-id=random --ovs-user=#{OVS_USER_ID} start $OPTIONS
           (code=exited, status=0/SUCCESS)
   Process: 130952 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages
       (code=exited, status=0/SUCCESS)
   Process: 130949 ExecStartPre=/usr/bin/chown :hugetlbfs /dev/hugepages
       (code=exited, status=0/SUCCESS)
 Main PID: 130987 (ovs-vswitchd)
     Tasks: 50
   CGroup: /system.slice/ovs-vswitchd.service
              `-130987 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer
              -vsyslog:err -vfile:info --mlockall --user openvswitch:huge...


              May 07 11:18:16 r730-dev-51 systemd[1]: Starting Open vSwitch Forwarding Unit...
              May 07 11:18:16 r730-dev-51 ovs-ctl[130955]: Starting ovs-vswitchd [  OK  ]
              May 07 11:18:16 r730-dev-51 ovs-ctl[130955]: Enabling remote OVSDB managers ...]
              May 07 11:18:16 r730-dev-51 systemd[1]: Started Open vSwitch Forwarding Unit.
              Hint: Some lines were ellipsized, use -l to show in full

Enable Open vSwitch so that it will run on reboot:

# systemctl enable openvswitch

Ubuntu 18.04 LTS

Start Open vSwitch:

# systemctl start openvswitch-switch

Check status of Open vSwitch:

# systemctl status openvswitch-switch
● openvswitch-switch.service - Open vSwitch
   Loaded: loaded (/lib/systemd/system/openvswitch-switch.service; enabled; vend
      Active: active (exited) since Wed 2018-05-09 08:35:44 UTC; 20s ago
   Main PID: 1824 (code=exited, status=0/SUCCESS)
      Tasks: 0 (limit: 1153)
    CGroup: /system.slice/openvswitch-switch.service

The openvswitch-vswitch service controls the ovsdb-server and ovs-vswitchd services. Check their status too:

# systemctl status ovsdb-server
● ovsdb-server.service - Open vSwitch Database Unit
   Loaded: loaded (/lib/systemd/system/ovsdb-server.service; static; vendor pres
   Active: active (running) since Wed 2018-05-09 08:35:44 UTC; 1min 38s ago
    Tasks: 1 (limit: 1153)
   CGroup: /system.slice/ovsdb-server.service
              └─1749 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:


# systemctl status ovs-vswitchd
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/lib/systemd/system/ovs-vswitchd.service; static; vendor pres
   Active: active (running) since Wed 2018-05-09 08:35:44 UTC; 2min 6s ago
    Main PID: 1813 (ovs-vswitchd)
   Tasks: 1 (limit: 1153)
      CGroup: /system.slice/ovs-vswitchd.service
                 └─1813 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer

Enable Open vSwitch so that it will run on reboot:

# systemctl enable openvswitch-switch
Synchronizing state of openvswitch-switch.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable openvswitch-switch

Configuring Open vSwitch Hardware Offload

To enable TC offloading in Open vSwitch, the hw-tc-offload flag for the representors of any ports that will send or receive offloaded traffic must be set to true. Unlike interface settings described in Setting Interface Settings hw-tc-offload flags must be set for each physical port representor. Hardware TC offload is enabled by default and can be verified for each port using ethtool. Mote that the PF interface won’t show the hw-tc-offload flag being set by default. For example:

# ethtool -k ens3 | grep hw-tc-offload
hw-tc-offload: on

The setting may be toggled for each port independently between on and off using ethtool:

# ethtool -K ens3np0 hw-tc-offload on

Note

Hardware offload changes won’t persist across reboots. The default setting for TC offloads when using the flower firmware is on.

Configure Open vSwitch hardware offload:

# ovs-vsctl set Open_vSwitch . other_config:hw-offload=true other_config:tc-policy=none

This change will persist across reboots. But, in the absence of a reboot, Open vSwitch must be restarted:

In RHEL 7.5+ and CentOS 7.5 this is performed by the command:

# systemctl restart openvswitch

In Ubuntu 18.04, the following command is used instead:

# systemctl restart openvswitch-switch

Open vSwitch Hardware Offload Example

Create an Open vSwitch bridge and add two interfaces; the representors of the first physical port and the VF. Please refer to section SmartNIC Netdev Interfaces for information on netdevs of the SmartNICs and Configuring SR-IOV for creating VFs associated with a physical interface. The following example requires at least one VF representor (in this case eth1) associated with the PF netdev.

Create an Open vSwitch bridge:

# ovs-vsctl add-br br0

Add representor netdev for the first physical port to the bridge:

# ovs-vsctl add-port br0 enp4s0np0

Add the representor netdev of the first VF to bridge:

# ovs-vsctl add-port br0 eth1

The ovs-vsctl show command can be used to verify the config of the bridge, and the kernel datapath can be verified with ovs-dpctl show:

# ovs-vsctl show
5e9b8d4b-4a29-41af-92f1-3d9f161aa176
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "eth1"
            Interface "eth1"
        Port "enp4s0np0"
            Interface "enp4s0np0"
    ovs_version: "2.9.0"

# ovs-dpctl show
system@ovs-system:
  lookups: hit:19 missed:14 lost:0
  flows: 14
  masks: hit:84 total:5 hit/pkt:2.55
  port 0: ovs-system (internal)
  port 1: br0 (internal)
  port 2: enp4s0np0
  port 3: eth1

Packets should now be able to flow between the VF and the external port. The view of Open vSwitch for offloaded and non-offloaded flows can be seen listed using ovs-dpctl. The port numbers used for in_port and the (output) actions correspond to those listed by ovs-dpctl show as shown above.

View offloaded datapath flows:

# ovs-dpctl dump-flows type=offloaded
in_port(2),eth(src=00:15:4d:0e:08:a7,dst=66:11:3e:c9:cf:2f),eth_type(0x0806), packets:2, bytes:92, used:187.890s, actions:3
in_port(2),eth(src=00:15:4d:0e:08:a7,dst=66:11:3e:c9:cf:2f),eth_type(0x0800), packets:9, bytes:882, used:188.860s, actions:3
...

View non-offloaded datapath flows:

# ovs-dpctl dump-flows type=ovs
recirc_id(0),in_port(3),eth(src=66:11:3e:c9:cf:2f,dst=33:33:ff:c9:cf:2f),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:1,2
recirc_id(0),in_port(3),eth(src=66:11:3e:c9:cf:2f,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:140, used:1399.137s, actions:1,2
...

View both offloaded and non-offloaded datapath flows:

# ovs-dpctl dump-flows
in_port(2),eth(src=00:15:4d:0e:08:a7,dst=66:11:3e:c9:cf:2f),eth_type(0x0806), packets:2, bytes:92, used:187.890s, actions:3
in_port(2),eth(src=00:15:4d:0e:08:a7,dst=66:11:3e:c9:cf:2f),eth_type(0x0800), packets:9, bytes:882, used:188.860s, actions:3
...
recirc_id(0),in_port(3),eth(src=66:11:3e:c9:cf:2f,dst=33:33:ff:c9:cf:2f),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:1,2
recirc_id(0),in_port(3),eth(src=66:11:3e:c9:cf:2f,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:140, used:1399.137s, actions:1,2
...

Note

Note that type=offloaded is just an indication that a flow is handled by the TC datapath. This does not guarantee that it has been offloaded to the SmartNIC, the TC commands shown next provides a much better indication.

The non-offloaded flows are present in the Open vSwitch kernel datapath. The offloaded flows are present in hardware, and are configured by Open vSwitch via the Kernel’s TC subsystem. The kernel’s view of these flows may be observed using the tc command:

# tc -s filter show ingress dev enp4s0np0
filter protocol arp pref 1 flower
filter protocol arp pref 1 flower handle 0x1
  dst_mac 66:11:3e:c9:cf:2f
  src_mac 00:15:4d:0e:08:a7
  eth_type arp
  not_in_hw
      action order 1: mirred (Egress Redirect to device eth1) stolen
      index 1 ref 1 bind 1 installed 409 sec used 187 sec
      Action statistics:
      Sent 92 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
      backlog 0b 0p requeues 0
      cookie len 16 e053c4819648461a

filter protocol ip pref 2 flower
filter protocol ip pref 2 flower handle 0x1
  dst_mac 66:11:3e:c9:cf:2f
  src_mac 00:15:4d:0e:08:a7
  eth_type ipv4
  in_hw
      action order 1: mirred (Egress Redirect to device eth1) stolen
      index 4 ref 1 bind 1 installed 409 sec used 188 sec
      Action statistics:
      Sent 882 bytes 9 pkt (dropped 0, overlimits 0 requeues 0)
      backlog 0b 0p requeues 0
      cookie len 16 b68ca7de9c465000

# tc -s filter show ingress dev eth1
filter protocol arp pref 1 flower
filter protocol arp pref 1 flower handle 0x1
  dst_mac 00:15:4d:0e:08:a7
  src_mac 66:11:3e:c9:cf:2f
  eth_type arp
  not_in_hw
      action order 1: mirred (Egress Redirect to device enp4s0np0) stolen
      index 2 ref 1 bind 1 installed 409 sec used 187 sec
      Action statistics:
      Sent 56 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
      backlog 0b 0p requeues 0
      cookie len 16 5049f238734ef962

filter protocol ip pref 2 flower
filter protocol ip pref 2 flower handle 0x1
  dst_mac 00:15:4d:0e:08:a7
  src_mac 66:11:3e:c9:cf:2f
  eth_type ipv4
  in_hw
      action order 1: mirred (Egress Redirect to device enp4s0np0) stolen
      index 3 ref 1 bind 1 installed 409 sec used 188 sec
      Action statistics:
      Sent 882 bytes 9 pkt (dropped 0, overlimits 0 requeues 0)
      backlog 0b 0p requeues 0
      cookie len 16 3dae846e6b41a778

Appendix A: Netronome Repositories

All the software mentioned in this document can be obtained via the official Netronome repositories. Please find instructions below on how to enable access to the aforementioned repositories from your respective Linux distributions.

Configuring Repositories

For RHEL 7.5+ and CentOS 7.5:

1
2
3
4
5
6
7
8
cat << 'EOF' > /etc/yum.repos.d/netronome.repo
[netronome]
name=netronome
baseurl=https://rpm.netronome.com/repos/centos/
gpgcheck=0
enabled=1
EOF
yum makecache

For Ubuntu 18.04 LTS:

1
2
3
4
mkdir -p /etc/apt/sources.list.d/
echo "deb [trusted=yes] https://deb.netronome.com/apt stable main" > \
    /etc/apt/sources.list.d/netronome.list
apt-get update

Appendix B: Red Hat Repositories

TC offload is only available in Open vSwitch version 2.8, with additional offloads enabled in 2.9. The standard Red Hat Subscription only enables Open vSwitch version 2.5. For this reason, an additional subscription may be required to enable repositories that contain a newer version of Open vSwitch. Please consult with Red Hat directly to determine your subscription needs.

Note

The Red Hat documentation with regards to enabling the specific repositories is regarded to be authoritative. The steps below are for illustrative purposes only.

Register the system with subscription-manager:

# subscription-manager register

List all available pools:

# subscription-manager list --all --available

Identify the IDs of the license pools that provide the following products:

  • Red Hat Enterprise Linux
  • Red Hat Enterprise Linux Fast Datapath

This can be done by using the --matches flag:

# subscription-manager list --available --matches="Red Hat Enterprise Linux Fast Datapath"

Attach the system to these pools (by using the correct license pool IDs):

# subscription-manager attach --pool=${RHEL_PACKAGE_POOL_ID}

Enable the Fast Datapath repository for the relevant version of RHEL:

RHEL 7.5+:

# subscription-manager repos --enable rhel-7-fast-datapath-rpms

RHEL 8.0+:

# subscription-manager repos --enable fast-datapath-for-rhel-8-x86_64-rpms

Appendix C: Installing the Out-of-Tree NFP Driver

The nfp driver can be installed via the Netronome repository, or built from source, depending on your requirements.

Note

The Out-of-Tree driver currently does not provide support for TC firmware on RHEL/CentOS.

Install Driver via Netronome Repository

Please refer to Appendix A: Netronome Repositories on how to configure the Netronome repository applicable to your distribution. When the repository has been successfully added, install the nfp-driver package using the commands below.

Ubuntu 18.04 LTS

# apt-cache search nfp-driver
agilio-nfp-driver-dkms - agilio-nfp-driver driver in DKMS format.

# apt-get install agilio-nfp-driver-dkms

Kernel Changes

Take note that installing the dkms driver will only install it for the currently running kernel. When you upgrade the installed kernel it may not automatically update the the nfp module to use the version in the dkms package. In kernel versions older than v4.16 the MODULE_VERSION parameter of the in-tree module was not set, which causes dkms to pick the module with the highest srcversion hash (https://github.com/dell/dkms/issues/14). This is worked around by the package install step by adding --force to the dkms install, but this will not trigger on a kernel upgrade. To work around this issue, boot into the new kernel and then re-install the agilio-nfp-driver-dkms package.

This should not be a problem when upgrading from kernels v4.16 and newer as the MODULE_VERSION has been added and the dkms version check should work properly. It’s not possible to determine which nfp.ko file was loaded by only relying on information provided by the kernel. However, it’s possible to confirm that the binary signature of a file on disk and the module loaded in memory is the same.

To confirm that the module in memory is the same as the file on disk, compare the srcversion tag. The in-memory module’s tag is at /sys/module/nfp/srcversion. The default on-disk version can be queried with modinfo:

# cat /sys/module/nfp/srcversion # In-memory module
# modinfo nfp | grep "^srcversion:" # On-disk module

If these tags are in sync, the filename of the module provided by a modinfo query will identify the origin of the module:

# modinfo nfp | grep "^filename:"

If these tags are not in sync, there are likely conflicting copies of the module on the system: the initramfs may be out of sync or the module dependencies may be inconsistent.

The in-tree kernel module is usually located at the following path (please note, this module may be compressed with a .xz extension):

/lib/modules/$(uname -r)/kernel/drivers/net/ethernet/netronome/nfp/nfp.ko

The dkms module is usually located at the following path:

/lib/modules/$(uname -r)/updates/dkms/nfp.ko

To ensure that the out-of-tree driver is correctly loaded instead of the in-tree module, the following commands can be run:

1
2
3
4
5
mkdir -p /etc/depmod.d
echo "override nfp * extra" > /etc/depmod.d/netronome.conf
depmod -a
modprobe -r nfp; modprobe nfp
update-initramfs -u

Building from Source

Driver sources for Netronome Flow Processor devices, including the NFP-4000 and NFP-6000 models can be found at: https://github.com/Netronome/nfp-drv-kmods

Ubuntu 18.04 Dependencies

# apt-get update
# apt-get install -y linux-headers-$(uname -r) build-essential libelf-dev

Clone, Build and Install

1
2
3
4
5
git clone https://github.com/Netronome/nfp-drv-kmods.git
cd nfp-drv-kmods
make
make install
depmod -a

Appendix D: Working with Board Support Package

The NFP BSP provides infrastructure software and a development environment for managing NFP based platforms.

Install Software from Netronome Repository

Please refer to Appendix A: Netronome Repositories on how to configure the Netronome repository applicable to your distribution. When the repository has been successfully added install the BSP package using the commands below.

RHEL 7.5+ and CentOS 7.5

# yum list available | grep nfp-bsp
nfp-bsp-6000-b0.x86_64                   2018.10.31.1229-1          netronome

# yum install nfp-bsp-6000-b0 --nogpgcheck
# reboot

Ubuntu 18.04 LTS

# apt-cache search nfp-bsp
nfp-bsp-6000-b0 - Netronome NFP BSP

# apt-get install nfp-bsp-6000-b0

Install Software From deb/rpm Package

Obtain Software

The latest BSP packages can be obtained at the downloads area of the Netronome Support site (https://help.netronome.com).

Install the prerequisite dependencies

RHEL 7.5+ and CentOS 7.5 Dependencies

No dependency installation required

Ubuntu 18.04 LTS Dependencies

To install the BSP package dependencies on Ubuntu 18.04, run:

# apt-get install -y libjansson4

NFP BSP Package

Install the NFP BSP package provided by Netronome Support.

RHEL 7.5+ and CentOS 7.5 Install
# yum install -y nfp-bsp-6000-*.rpm --nogpgcheck
Ubuntu 18.04 LTS Install
# dpkg -i nfp-bsp-6000-*.deb

Using BSP tools

Enable CPP access

The NFP has an internal Command Push/Pull (CPP) bus that allows debug access to the SmartNIC internals. CPP access allows user space tools raw access to chip internals and is required to enable the use of most BSP tools. Only the out-of-tree (oot) driver allows CPP access.

Follow the steps from Install Driver via Netronome Repository to install the oot nfp driver. After the nfp module has been built load the driver with CPP access:

# depmod -a
# rmmod nfp
# modprobe nfp nfp_dev_cpp=1

To persist this option across reboots, a number of options are available; the distribution specific documentation will detail that process more thoroughly. Care must be taken that the settings are also applied to any initramfs images generated.

Configure Media Settings

Alternatively to the process described in Configuring Interface Media Mode, BSP tools can be used to configure the port speed of the SmartNIC use the following commands. Note, a reboot is still required for changes to take effect.

Agilio CX 2x25GbE - AMDA0099

To set the port speed of the CX 2x25GbE the following commands can be used:

Set port 0 and port 1 to 10G mode:

# nfp-media phy1=10G phy0=10G

Set port 1 to 25G mode:

# nfp-media phy1=25G+

To change the FEC settings of the 2x25GbE the following commands can be used:

# nfp-media --set-aneg=phy0=[S|A|I|C|F] --set-fec=phy0=[A|F|R|N]

Where the parameters for each argument are:

--set-aneg=:

S
search - Search through supported modes until link is found. Only one side should be doing this. It may result in a mode that can have physical layer errors depending on SFP type and what the other end wants. Long DAC cables with no FEC WILL have physical layer errors.
A
auto - Automatically choose mode based on speed and SFP type.
C
consortium - Consortium 25G auto-negotiation with link training.
I
IEEE - IEEE 10G or 25G auto-negotiation with link training.
F
forced - Mode is forced with no auto-negotiation or link training.

--set-fec=:

A
auto - Automatically choose FEC based on speed and SFP type.
F
Firecode - BASE-R Firecode FEC compatible with 10G.
R
Reed-Solomon - Reed-Solomon FEC new for 25G.
N
none - No FEC is used.
Agilio CX 1x40GbE - AMDA0081

Set port 0 to 40G mode:

# nfp-media phy0=40G

Set port 0 to 4x10G fanout mode:

# nfp-media phy0=4x10G
Agilio CX 2x40GbE - AMDA0097

Set port 0 and port 1 to 40G mode:

# nfp-media phy0=40G phy1=40G

Set port 0 to 4x10G fanout mode:

# nfp-media phy0=4x10G

For mixed configuration the highest port must be in 40G mode e.g.:

# nfp-media phy0=4x10G phy1=40G

Appendix E: Updating NFP Flash

The NVRAM flash software on the SmartNIC can be updated in one of two ways, either via ethtool or via the BSP userspace tools. In both cases, the BSP package needs to be installed to gain access to the intended flash image. After the flash has been updated, the system needs to be rebooted to take effect.

Note

The ethtool interface is only available for hosts running kernel 4.16 or higher when using the in-tree driver. Please use the out of tree driver to enable ethtool flashing on older kernels.

Note

Updating the flash via ethtool is only supported if the existing flash version is greater than 0028.0028.007c. Installed NVRAM flash version can be checked with the command dmesg | grep BSP. Cards running older versions of the NVRAM flash must be updated using the method in Update via BSP Userspace Tools

Refer to Appendix D: Working with Board Support Package to acquire the BSP tool package.

Update via Ethtool

To update the flash using ethtool, the reflashing utilities used in the Netronome directory in the system must first be relocated so that ethtool has access to them:

# cp /opt/netronome/flash/flash-nic.bin /lib/firmware
# cp /opt/netronome/flash/flash-one.bin /lib/firmware

Thereafter, ethtool can be used to reflash the software loaded onto the SmartNIC devices identified by either their PF netdev or their physical port representor:

# ethtool -f ens3 flash-nic.bin
# ethtool -f ens3 flash-one.bin
# reboot

Update via BSP Userspace Tools

Obtain Out of Tree NFP Driver

To update the flash using the BSP userspace tools, use the following steps. Refer to Appendix C: Installing the Out-of-Tree NFP Driver on installing the out of tree NFP driver and to load the driver with CPP access.

Flash the Card

The following commands may be executed for each card installed in the system using the PCI address of the particular card. In this section, the card’s PCI address is assumed to be 0000:04:00.0. First reload the NFP drivers with CPP access enabled:

# rmmod nfp
# modprobe nfp nfp_pf_netdev=0 nfp_dev_cpp=1

Then use the included Netronome flashing tools to reflash the card:

# /opt/netronome/bin/nfp-flash --preserve-media-overrides \
    -w /opt/netronome/flash/flash-nic.bin -Z 0000:04:00.0
# /opt/netronome/bin/nfp-flash -w /opt/netronome/flash/flash-one.bin \
    -Z 0000:04:00.0
# reboot

Appendix F: Upgrading the Kernel

RHEL 7.5+

Only kernel packages released by Red Hat and installable as part of the distribution installation and upgrade procedure are supported.

CentOS 7.5

The CentOS package installer yum will manage an update to the supported kernel version. The command yum install kernel-${VERSION} updates the kernel for CentOS. First search for available kernel packages then install the desired release:

# yum list --showduplicates kernel
kernel.x86_64
3.10.0-862.el7
base
kernel.x86_64
3.10.0-862.2.3.el7
updates
kernel.x86_64
3.10.0-862.3.2.el7
updates

# yum install kernel-3.10.0-862.el7

Ubuntu 18.04 LTS

If desired, alternative kernels may be installed. For example, at the time of writing, v4.18 is the newest stable kernel.

Acquire packages

1
2
3
4
5
6
BASE=http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18/
wget\
    $BASE/linux-headers-4.18.0-041800_4.18.0-041800.201808122131_all.deb \
    $BASE/linux-headers-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    $BASE/linux-image-unsigned-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    $BASE/linux-modules-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb

Install packages

1
2
3
4
5
dpkg -i \
    linux-headers-4.18.0-041800_4.18.0-041800.201808122131_all.deb \
    linux-headers-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    linux-image-unsigned-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    linux-modules-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb

Appendix G: Updating Kernel Boot Parameters

Note

In order to enable VFs to be bound to the vfio-pci driver such that they may be utilized by VMs, IOMMU must be enabled in both the BIOS of the host machines, as well as the kernel.

RHEL 7.5+ and CentOS 7.5 Grub Config

# grubby --update-kernel=ALL --args="intel_iommu=on"
# reboot

Ubuntu 18.04 LTS Grub Config

# sed -i 's/#*GRUB_CMDLINE_LINUX_DEFAULT.*/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/g' /etc/default/grub
# update-grub2
# reboot

Appendix H: Upgrading TC Firmware

The preferred method of installing and upgrading Agilio firmware is via the distribution repositories. The minimum recommended versions are those provided in GA releases of distributions. As a guide they are as follows:

Operating System Firmware package version
RHEL 7.5 20180220-62.git6d51311.el7
RHEL 7.6 20180911-68.git85c5d90.el7
RHEL 8.0 20190111-92.gitd9fb2ee6.el8
Ubuntu 18.04 LTS 1.173

Netronome provides firmware packages with newer features as out-of-tree repositories. The corresponding installation packages can be obtained from Netronome Support if access to the repositories is not available. (https://help.netronome.com).

Installing Updated TC Firmware via the Netronome Repository

Please refer to Appendix A: Netronome Repositories on how to configure the Netronome repository applicable to your distribution. When the repository has been successfully added install the agilio-flower-app-firmware package using the commands below.

In RHEL 7.5+ and CentOS 7.5:

# yum install agilio-flower-app-firmware

In Ubuntu 18.04 LTS:

# apt-get install agilio-flower-app-firmware

Installing Updated TC Firmware from Package Installations

The latest firmware can be obtained at the downloads area of the Netronome Support site (https://help.netronome.com). Install the packages provided by Netronome Support using the commands below.

In RHEL 7.5+ and CentOS 7.5:

# yum install -y agilio-flower-app-firmware-*.rpm

In Ubuntu 18.04 LTS:

# dpkg -i agilio-flower-app-firmware-*.deb

Select Updated TC Firmware

Once installed the updated TC firmware should be selected using the script described in section Selecting the TC Offload Firmware. To select the updated TC firmware it should be called with flower-next as its last parameter:

# ./agilio-tc-fw-select.sh p5p1 scan flower-next

Once selected the driver should be reloaded to use the new firmware:

# rmmod nfp; modprobe nfp

The initramfs image should also be update to ensure the correct firmware version is loaded at boot time.

In RHEL 7.5+ and CentOS 7.5 this is done using:

# dracut -f

In Ubuntu 18.04 LTS use the command:

# update-initramfs -u

Appendix I: Offloadable Flows

Flows may be offloaded to hardware if they meet the criteria described in this section.

Note

The maximum number of flows that can be offloaded in RHEL 7.5/7.6 and Ubuntu 18.04 is 128k. This has been increased to 480k in kernel 4.20 and has been backported to the 4.18-based kernel provided by RHEL 8.0.

Matches

A flow may be offloaded if it matches only on the following fields:

Metadata Input Port
Layer 2 Ethernet: Type, Addresses
VLAN: Outermost ID, Priority
Layer 3 IPv4: Addresses, Protocol, TTL, TOS, Frag
IPv6: Addresses, Protocol, Hop Limit, TOS, Frag
Layer 4 TCP: Ports, Flags
UDP: Ports
SCTP: Ports
Tunnel ID
IPv4: Outer Address
UDP: Outer Destination Port

Actions

A flow may be offloaded if:

  1. The input port of the flow is:

    1. A physical port or VF on an Agilio SmartNIC or;
    2. A supported tunnel vport whose ingress packets are received on a physical port on an Agilio SmartNIC and whose egress action is to a VF port on an Agilio SmartNIC.
  2. If present, the output actions output to:

    1. A physical port or VF on the same Agilio SmartNIC as the input port or;
    2. A tunnel vport whose egress packets are sent on a physical port of the same Agilio SmartNIC as the input port.
  3. Only the input port or output ports may be a tunnel vport, not both.

Supported tunnel vports:

  • VXLAN tunnel vports are supported as of upstream Linux kernel v4.16 and OVS v2.8, RHEL/CentOS 7.5 and Ubuntu 18.04 LTS.
  • Geneve tunnel vports are supported as of upstream Linux kernel v4.16 and OVS v2.8, RHEL/CentOS 7.6 and Ubuntu 18.10.
  • Support for Geneve options has been accepted for inclusion in upstream Linux kernel v4.19 and OVS v2.11. This is included in RHEL8+, support in CentOS and Ubuntu is pending.

UDP-based tunnel vports must use the default UDP port for the tunnel type:

  • VXLAN: port 4789.
  • Geneve: port 6801.

Offload of flows that output to more than one port is supported when using OVS v2.10+, as found in the Fast Datapath repository for RHEL 7. Otherwise only flows that output to at most one port may be offloaded.

Other than output and the implicit drop action, flows using the following actions may be offloaded:

  1. Push and Pop VLAN
  2. Masked and Unmasked Set

Flows that include a masked set of any of the following fields may be offloaded:

Layer 2 Ethernet: Type, Addresses
VLAN: ID, Priority
Layer 3 IPv4: Addresses
IPv6: Addresses
Layer 4 TCP: Ports
UDP: Ports

Flows that include an unmasked set of any of the following fields may be offloaded:

Tunnel ID
IPv4: Outer Address
UDP: Outer Destination Port

Bonds

Using native Open vSwitch bonds

Flows resulting from the following modes could be accelerated:

OVS Bonds Modes Active Backup
Balance SLB
Balance TCP

Configuring a bond in OVS in active-backup or balance-slb modes would result in flows that are offloadable. It should be noted that OVS sends packets to the LOCAL port for each bond. This results in flow rules that include actions with output to the LOCAL port. This cannot be accelerated by Agilio OVS. To prevent this from occurring, and to achieve acceleration, packets must not be sent to the LOCAL port, which can be achieved by:

# ovs-ofctl -O Openflow13 mod-port bondbr0 bondbr0 no-forward

Furthermore configuring a bond in balance-tcp mode could result in flows that are offloadable if recirculation has been disabled. This can be achieved using the following:

# ovs-appctl dpif/set-dp-features bondbr0 recirc false

It should be noted that turning off recirculation leads to exact match datapath entries (matching on L2, L3 and L4) being installed. e.g.

in_port(10),eth(src=12:23:34:45:56:67,dst=67:56:45:34:23:12),eth_type(0x0800),ipv4(src=10.10.10.10,dst=10.10.10.20,proto=6,frag=no),tcp(src=1000,dst=2000), packets:0, bytes:0, used:never, actions:6,7

This exact matching behavior leads to flow explosion, i.e. OVS will install an entry for every unique (L2, L3 or L4) packet. This in turn could lead to performance degradation, especially so when using many flows (100K and more).

Finally, OVS bonding is based on the NORMAL rule; links will not be aggregated when the bond bridge does not contain a NORMAL rule. Should match/actions be required, an additional bridge (named br0 in this example) is required on which the match/actions are performed, allowing the bond bridge to only have the NORMAL rule. This additional bridge can be connected to the bond bridge using a patch port.

Configuring Linux bonds

From RHEL 8.0+ it is possible to configure standard Linux bonds and add them to an Open vSwitch bridge for offloading. The process to create and use these bonds are shown next.

First create a bond:

# ip link add bond0 type bond

Add the mac-representor ports to the bond:

# ip link set dev ens1np0 master bond0
# ip link set dev ens1np1 master bond0

If they need to be removed from the bond:

# ip link set dev ens1np0 nomaster
# ip link set dev ens1np1 nomaster

Information about a Linux bond can be obtained by:

# cat /proc/net/bonding/bond0

Example of the output from the above command:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: ens1np0
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:15:4d:13:50:32
Slave queue ID: 0

Slave Interface: ens1np1
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:15:4d:13:50:33
Slave queue ID: 0

Not all bonding modes are supported for offloading. The currently supported modes are active-backup and balance-xor. See below for more info configuring each mode.

Note

All slaves needs to be removed from a bond before the mode can be changed.

active-backup

This mode will send traffic on only one of the ports that are aggregated in the bond. This mode is configured by executing:

# ip link set dev bond0 down
# ip link set dev ens1np0 nomaster bond0
# ip link set dev ens1np1 nomaster bond0
# ip link set dev bond0 type bond mode active-backup
# ip link set dev bond0 type bond miimon 100
# ip link set dev ens1np0 master bond0
# ip link set dev ens1np1 master bond0
# ip link set dev bond0 up

The miimon setting sets the interval on which the link state should be monitored in milliseconds. If a port down state is detected the bond will reconfigure itself to send the traffic out on one of the other ports in the bond.

balance-xor

This mode balances traffic across the aggregated ports using a hash method. To enable offloading the xmit_hash_policy value must be set to either layer3+4 or encap3+4. Other hashing methods will not be offloaded. Configuration is as follows:

# ip link set dev bond0 down
# ip link set dev ens1np0 nomaster
# ip link set dev ens1np1 nomaster
# ip link set dev bond0 type bond mode balance-xor
# ip link set dev bond0 type bond miimon 100

To use layer3+4 as hash:

# ip link set dev bond0 type bond xmit_hash_policy layer3+4

To use encap3+4 as hash:

# ip link set dev bond0 type bond xmit_hash_policy encap3+4

Add back the slave ports and up the bond:

# ip link set dev ens1np0 master bond0
# ip link set dev ens1np1 master bond0
# ip link set dev bond0 up

For more detailed information on the difference between the modes and the hash methods it is recommended to read the Linux kernel documentation on the subject.

Configuring Linux teaming

Another method of setting up link aggregated ports is to use Linux teaming. Teaming is controlled using the teamd and teamdctl utilities, as will be demonstrated below.

Creating a new team device for active-backup mode:

# teamd -t bond0 -d -c '{"runner": {"name": "activebackup"}}'

Creating a new team device for load balancing mode. The hashing method for teaming is not as well defined so for offloading to the NFP this will hash on L3 and L4:

# teamd -t bond0 -d -c '{"runner": {"name": "lacp"}}'

Ports are added using teamdctl:

# teamdctl bond0 port add ens6np0
# teamdctl bond0 port add ens6np1

The port config can be dumped using:

# teamdctl bond0 config dump

Example output:

{
    "device": "bond0",
    "ports": {
        "ens6np0": {
            "link_watch": {
                "name": "ethtool"
            }
        },
        "ens6np1": {
            "link_watch": {
                "name": "ethtool"
            }
        }
    },
    "runner": {
        "name": "lacp",
        "tx_hash": [
            "eth",
            "ipv4",
            "ipv6"
        ]
    }
}

For more usage instructions using teaming take a look at the man pages for teamd and teamdctl.

Using Linux bonds/teaming with Open vSwitch

Once the bond is configured as shown in section Configuring Linux bonds it is possible to use it with Open vSwitch, by adding the bond port to the bridge as with any other type of port. See the following example which adds a bridge, configures the bond port as well as a VF representor port and then adds two simple flow rules that forwards all traffic between the VF and the bond:

# ovs-vsctl add-br br0
# ovs-vsctl add-port br0 bond0
# ovs-vsctl add-port br0 vf0_repr
# ovs-ofctl add-flow br0 in_port=bond0,actions=output:vf0_repr
# ovs-ofctl add-flow br0 in_port=vf0_repr,actions=output:bond0

Teams are used with Open vSwitch in exactly the same way as bonds.

Note

Currently it is not possible to use bonding in conjunction with any of the tunneling protocols.

Contact Us

Netronome Systems, Inc.
2903 Bunker Hill Lane, Suite 150
Santa Clara, CA 95054
Tel: 408.496.0022 Fax: 408.586.0002
https://www.netronome.com help@netronome.com

© Copyright 2019, Netronome