Agilio SmartNIC Basic Firmware User Guide

Release Notes

The latest version of the Basic Nic Firmware is 2.1.16, released 2018/11/04.

Note

For optimal performance with the Basic Nic Firmware, please ensure to use a kernel newer than 4.15 (or RHEL equivalent); or the kmods driver. See Appendix B: Installing the Out-of-Tree NFP Driver

Note

Rapidly loading and unloading the kernel driver module can result in the card becoming unresponsive. Such a situation requires a host reboot to resolve.

Note

In this firmware release, tunnel offloading capabilities are disabled on VF ports.

Release History

2.1.16

  • New SRIOV capable firmware file
    • Support for 48 single-queue VFs on physical port 0
  • Tunnel inner header RSS support for VXLAN and Geneve
  • Improved performance

2.0.7

  • TCP (Large) Segment Offload (TSO / LSO)
  • TCP and UDP Receive Side Scaling (RSS) with CRC32 algorithm
  • Receive and Transmit Checksum Offload
  • Jumbo Frame Support up to 9216B Frames

The Agilio SmartNIC Architecture

./Conceptual_architecture.png

The conceptual architecture of the Agilio SmartNIC

The Agilio CX SmartNICs are based on the NFP-4000 and are available in low profile PCIe and OCM v2 NIC form factors suitable for use in COTS servers. This is a 60 core processor with eight cooperatively multithreaded threads per core. The flow processing cores have an instruction set that is optimized for networking. This ensures an unrivalled level of flexibility within the data plane while maintaining performance. The OVS datapath can also be enabled without a server reboot.

Further extensions such as BPF offload, SR-IOV or custom offloads can be added without any hardware modifications or even server reboot. These extensions are not covered by this guide, which deals with the basic and OVS-TC offload firmware only.

The basic firmware offers a wide variety of features including RSS (Receive Side Scaling), Checksum Offload (IPv4/IPv6,TCP,UDP,Tx/Rx), LSO (Large Segmentation Offload), IEEE 802.3ad, Link flow control, 802.1AX Link Aggregation, etc. For more details regarding currently supported features refer to the section Basic Firmware Features.

Hardware Installation

This user guide focusses on x86 deployments of Agilio hardware. As detailed in Validating the Driver, Netronome’s Agilio SmartNIC firmware is now upstreamed with certain kernel versions of Ubuntu and RHEL/Centos. Whilst out-of-tree driver source files are available and build/installation instructions are included in Appendix A: Netronome Repositories, it is highly recommended where possible to make use of the upstreamed drivers. Wherever applicable separate instructions for RHEL/Centos and Ubuntu are provided.

Identification

In a running system the assembly ID and serial number of a PCI device may be determined using the ethtool debug interface. This requires knowledge of the physical function network device identifier, or <netdev>, assigned to the SmartNIC under consideration. Consult the section SmartNIC netdev interfaces for methods on determining this identifier. The interface name <netdev> can be otherwise identified using the ip link command. The following shell snippet illustrates this method for some particular netdev whose name is cast as the argument $1:

1
2
3
4
5
6
7
8
#!/bin/bash
DEVICE=$1
ethtool -W ${DEVICE} 0
DEBUG=$(ethtool -w ${DEVICE} data /dev/stdout | strings)
SERIAL=$(echo "${DEBUG}" | grep "^SN:")
ASSY=$(echo ${SERIAL} | grep -oE AMDA[0-9]{4})
echo ${SERIAL}
echo Assembly: ${ASSY}

Note

The strings command is commonly provided by the binutils package. This can be installed by yum install binutils or apt-get install binutils, depending on your distribution.

Physical installation

Physically install the SmartNIC in the host server and ensure proper cooling e.g. airflow over card. Ensure the PCI slot is at least Gen3 x8 (can be placed in Gen3 x16 slot). Once installed, power up the server and open a terminal. Further details and support about the hardware installation process can be reviewed in the Hardware User Manual available from Netronome’s support site.

Validation

Use the following command to validate that the SmartNIC is being correctly detected by the host server and identify its PCI address, 19ee is the Netronome specific PCI vendor identifier:

# lspci -Dnnd 19ee:4000; lspci -Dnnd 19ee:6000
0000:02:00.0 Ethernet controller [0200]: Netronome Systems, Inc. Device    [19ee:4000]

Note

The lspci command is commonly provided by the pciutils package. This can be installed by yum install pciutils or apt-get install pciutils, depending on your distribution.

Validating the Driver

The Netronome SmartNIC physical function driver with support for OVS-TC offload is included in Linux 4.13 and later kernels. The list of minimum required operating system distributions and their respective kernels which include the nfp driver are as follows:

Operating System Kernel package version
RHEL/CentOS 7.4+ default
Ubuntu 16.04.04 LTS default

In order to upgrade Ubuntu 16.04.0 - 16.04.3 to a supported version, the following commands must be run:

# apt-get update
# apt-get upgrade
# apt-get dist-upgrade

Confirm Upstreamed NFP Driver

To confirm that your current Operating System contains the upstreamed nfp module:

# modinfo nfp | head -3
filename:
/lib/modules/<kernel package version>/kernel/drivers/net/ethernet/netronome/nfp/nfp.ko.xz
description:    The Netronome Flow Processor (NFP) driver.
license:        GPL

Note

If the module is not found in your current kernel, refer to Appendix B: Installing the Out-of-Tree NFP Driver for instructions on installing the out-of-tree NFP driver, or simply upgrade your distributions and kernel version to include the upstreamed drivers.

Confirm that the NFP Driver is Loaded

Use lsmod to list the loaded driver modules and use grep to match the expression for the NFP drivers:

# lsmod | grep nfp
nfp                   161364  0

If the NFP driver is not loaded, try run the following command to manually load the module:

# modprobe nfp

SmartNIC netdev interfaces

The agilio-naming-policy package ensures consistent naming of Netronome SmartNIC network interfaces. Please note that this package is optional and not required if your distribution has a sufficiently new systemd installation.

Please refer to Appendix A: Netronome Repositories on how to configure the Netronome repository applicable to your distribution. When the repository has been successfully enabled install the naming package using the commands below.

Ubuntu:

# apt-get install agilio-naming-policy

CentOS/RHEL:

# yum install agilio-naming-policy

At nfp driver initialization new netdev interfaces will be created:

# ip link

4: enp6s0np0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:15:4d:13:01:db brd ff:ff:ff:ff:ff:ff
5: enp6s0np0s1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:15:4d:13:01:dd brd ff:ff:ff:ff:ff:ff
6: enp6s0np0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:15:4d:13:01:de brd ff:ff:ff:ff:ff:ff
7: enp6s0np0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:15:4d:13:01:df brd ff:ff:ff:ff:ff:ff
8: enp6s0np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:15:4d:13:01:dc brd ff:ff:ff:ff:ff:ff

Note

Netdev naming may vary depending on your linux distribution and configuration e.g. enpAsXnpYsZ, pXpY.

To confirm the names of the interfaces, view the contents of /sys/bus/pci/devices/<pci addr>/net, using the PCI address obtained in Hardware Installation e.g.

1
2
3
#!/bin/bash
PCIA=$(lspci -d 19ee:4000 | awk '{print $1}' | xargs -Iz echo 0000:z)
echo $PCIA | tr ' ' '\n' | xargs -Iz echo "ls /sys/bus/pci/devices/z/net" | bash

The output of such a script would be similar to:

enp6s0np0s0  enp6s0np0s1  enp6s0np0s2  enp6s0np0s3  enp6s0np1

In the worst case scenario netdev types can also be discovered by reading the kernel logs.

Validating the Firmware

Netronome SmartNICs are fully programmable devices and thus depend on the driver to load firmware onto the device at runtime. It is important to note that the functionality of the SmartNIC significantly depends on the firmware loaded. The firmware files should be present in the following directory (contents may vary depending on the installed firmware):

# ls -ogR --time-style="+" /lib/firmware/netronome/
/lib/firmware/netronome/:
total 8
drwxr-xr-x. 2 4096  flower
drwxr-xr-x. 2 4096  nic
lrwxrwxrwx  1   31  nic_AMDA0081-0001_1x40.nffw -> nic/nic_AMDA0081-0001_1x40.nffw
lrwxrwxrwx  1   31  nic_AMDA0081-0001_4x10.nffw -> nic/nic_AMDA0081-0001_4x10.nffw
lrwxrwxrwx  1   31  nic_AMDA0096-0001_2x10.nffw -> nic/nic_AMDA0096-0001_2x10.nffw
lrwxrwxrwx  1   31  nic_AMDA0097-0001_2x40.nffw -> nic/nic_AMDA0097-0001_2x40.nffw
lrwxrwxrwx  1   36  nic_AMDA0097-0001_4x10_1x40.nffw -> nic/nic_AMDA0097-0001_4x10_1x40.nffw
lrwxrwxrwx  1   31  nic_AMDA0097-0001_8x10.nffw -> nic/nic_AMDA0097-0001_8x10.nffw
lrwxrwxrwx  1   36  nic_AMDA0099-0001_1x10_1x25.nffw -> nic/nic_AMDA0099-0001_1x10_1x25.nffw
lrwxrwxrwx  1   31  nic_AMDA0099-0001_2x10.nffw -> nic/nic_AMDA0099-0001_2x10.nffw
lrwxrwxrwx  1   31  nic_AMDA0099-0001_2x25.nffw -> nic/nic_AMDA0099-0001_2x25.nffw
lrwxrwxrwx  1   34  pci-0000:04:00.0.nffw -> flower/nic_AMDA0097-0001_2x40.nffw
lrwxrwxrwx  1   34  pci-0000:06:00.0.nffw -> flower/nic_AMDA0096-0001_2x10.nffw

/lib/firmware/netronome/flower:
total 11692
lrwxrwxrwx. 1      17  nic_AMDA0081-0001_1x40.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0081-0001_4x10.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0096-0001_2x10.nffw -> nic_AMDA0096.nffw
-rw-r--r--. 1 3987240  nic_AMDA0096.nffw
lrwxrwxrwx. 1      17  nic_AMDA0097-0001_2x40.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0097-0001_4x10_1x40.nffw -> nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0097-0001_8x10.nffw -> nic_AMDA0097.nffw
-rw-r--r--. 1 3988184  nic_AMDA0097.nffw
lrwxrwxrwx. 1      17  nic_AMDA0099-0001_2x10.nffw -> nic_AMDA0099.nffw
lrwxrwxrwx. 1      17  nic_AMDA0099-0001_2x25.nffw -> nic_AMDA0099.nffw
-rw-r--r--. 1 3990552  nic_AMDA0099.nffw

/lib/firmware/netronome/nic:
total 12220
-rw-r--r--. 1 1380496  nic_AMDA0081-0001_1x40.nffw
-rw-r--r--. 1 1389760  nic_AMDA0081-0001_4x10.nffw
-rw-r--r--. 1 1385608  nic_AMDA0096-0001_2x10.nffw
-rw-r--r--. 1 1385664  nic_AMDA0097-0001_2x40.nffw
-rw-r--r--. 1 1391944  nic_AMDA0097-0001_4x10_1x40.nffw
-rw-r--r--. 1 1397880  nic_AMDA0097-0001_8x10.nffw
-rw-r--r--. 1 1386616  nic_AMDA0099-0001_1x10_1x25.nffw
-rw-r--r--. 1 1385608  nic_AMDA0099-0001_2x10.nffw
-rw-r--r--. 1 1386368  nic_AMDA0099-0001_2x25.nffw

The NFP driver will search for firmware in /lib/firmware/netronome. Firmware is searched for in the following order and the first firmware to be successfully found and loaded is used by the driver:

1: serial-_SERIAL_.nffw
2: pci-_PCI_ADDRESS_.nffw
3: nic-_ASSEMBLY-TYPE___BREAKOUTxMODE_.nffw

This search is logged by the kernel when the driver is loaded. For example:

# dmesg | grep -A 4 nfp.*firmware
[  3.260788] nfp 0000:04:00.0: nfp: Looking for firmware file in order of priority:
[  3.260810] nfp 0000:04:00.0: nfp:   netronome/serial-00-15-4d-13-51-0c-10-ff.nffw: not found
[  3.260820] nfp 0000:04:00.0: nfp:   netronome/pci-0000:04:00.0.nffw: not found
[  3.262138] nfp 0000:04:00.0: nfp:   netronome/nic_AMDA0097-0001_2x40.nffw: found, loading...

The version of the loaded firmware for a particular <netdev> interface, as found in SmartNIC netdev interfaces (for example enp4s0), or an interface’s port <netdev port> (e.g. enp4s0np0) can be displayed with the ethtool command:

# ethtool -i <netdev/netdev port>
driver: nfp
version: 3.10.0-862.el7.x86_64 SMP mod_u
firmware-version: 0.0.3.5 0.22 nic-2.0.4 nic
expansion-rom-version:
bus-info: 0000:04:00.0

Firmware versions are displayed in order; NFD version, NSP version, APP FW version, driver APP. The specific output above shows that basic NIC firmware is running on the card, as indicated by “nic” in the firmware-version field.

Upgrading the firmware

The preferred method to upgrading Agilio firmware is via the Netronome repositories, however if this is not possible the corresponding installation packages can be obtained from Netronome Support (https://help.netronome.com).

Upgrading firmware via the Netronome repository

Please refer to Appendix A: Netronome Repositories on how to configure the Netronome repository applicable to your distribution. When the repository has been successfully added install the agilio-nic-firmware package using the commands below.

Ubuntu:

# apt-get install agilio-nic-firmware
# rmmod nfp; modprobe nfp
# update-initramfs -u

CentOS/RHEL:

# yum install agilio-nic-firmware
# rmmod nfp; modprobe nfp
# dracut -f

Upgrading firmware from package installations

The latest firmware can be obtained at the downloads area of the Netronome Support site (https://help.netronome.com).

Install the packages provided by Netronome Support using the commands below.

Ubuntu:

# dpkg -i agilio-nic-firmware-*.deb
# rmmod nfp; modprobe nfp
# update-initramfs -u

CentOS/RHEL:

# yum install -y agilio-nic-firmware-*.rpm
# rmmod nfp; modprobe nfp
# dracut -f

Using the Linux Driver

Configuring Interface Media Mode

The following sections detail the configuration of the SmartNIC netdev interfaces.

Note

For older kernels that do not support the configuration methods outlined below, please refer to Appendix C: Working with Board Support Package on how to make use of the BSP toolset to configure interfaces.

Configuring interface Maximum Transmission Unit (MTU)

The MTU of interfaces can temporarily be set using the iproute2 or ifconfig tools. Note that this change will not persist. Setting this via Network Manager, or other appropriate OS configuration tool, is recommended.

Set interface MTU to 9000 bytes:

# ip link set dev <netdev port> mtu 9000

It is the responsibility of the user or the orchestration layer to set appropriate MTU values when handling jumbo frames or utilizing tunnels. For example, if packets sent from a VM are to be encapsulated on the card and egress a physical port, then the MTU of the VF should be set to lower than that of the physical port to account for the extra bytes added by the additional header.

If a setup is expected to see fallback traffic between the SmartNIC and the kernel then the user should also ensure that the PF MTU is appropriately set to avoid unexpected drops on this path.

Configuring FEC modes

Agilio CX 2x25GbE SmartNICs support FEC mode configuration, e.g. Auto, Firecode BaseR, Reed Solomon and Off modes. Each physical port’s FEC mode can be set independently via the ethtool command. To view the currently supported FEC modes of the interface use the following:

# ethtool <netdev>
Settings for <netdev>:
    Supported ports: [ FIBRE ]
    Supported link modes:   Not reported
    Supported pause frame use: No
    Supports auto-negotiation: No
    Supported FEC modes: None BaseR RS
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: BaseR RS
    Speed: 25000Mb/s
    Duplex: Full
    Port: Direct Attach Copper
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: on
    Link detected: yes

One can see above which FEC modes are supported for this interface. Note that the Agilio CX 2x25GbE SmartNIC used for the example above only supports Firecode BaseR FEC mode on ports that are forced to 10G speed.

Note

Ethtool FEC support is only available in kernel 4.14 and newer or RHEL/Centos 7.5 and equivalent distributions. The Netronome upstream kernel driver provides ethtool FEC support from kernel 4.15. Furthermore, the SmartNIC NVRAM version must be at least 020025.020025.02006e to support ethtool FEC get/set operations.

To determine your version of the current SmartNIC NVRAM, look at the following system log:

# dmesg | grep 'nfp.*BSP'
[2387.682046] nfp 0000:82:00.0: BSP: 020025.020025.020072

This example lists a version of 020025.020025.020072 which is sufficient to support ethtool FEC mode configuration. To update your SmartNIC NVRAM flash, refer to Appendix E: Updating NFP Flash or contact Netronome support.

If the SmartNIC NVRAM or the kernel does not support ethtool modification of FEC modes, no supported FEC modes will be listed in the ethtool output for the port. This could be because of an outdated kernel version or an unsupported distribution (e.g. Ubuntu 16.04 irrespective of the kernel version):

# ethtool enp130s0np0
Settings for enp130s0np0:
...
Supported FEC modes: None

To show the currently active FEC mode for either the <netdev> or its physical port(s) <netdev port>:

# ethtool --show-fec <netdev>/<netdev port>
FEC parameters for <netdev>:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: Auto

To force the FEC mode for a particular port, autonegotiation must be disabled with the following:

# ip link set enp130s0np0 down
# ethtool -s enp130s0np0 autoneg off
# ip link set enp130s0np0 up

Note

In order to change the autonegotiation configuration the port must be down.

Note

Changing the autonegotiation configuration will not affect the SmartNIC port speed. Please see Configuring interface link-speed to adjust this setting.

To modify the FEC mode to Firecode BaseR:

# ethtool --set-fec <netdev port> encoding baser

Verify the newly selected mode:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: BaseR

To modify the FEC mode to Reed Solomon:

# ethtool --set-fec enp130s0np0 encoding rs

Verify the newly selected mode:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: RS

Verify the newly selected mode:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: Off

Revert back to the default Auto setting:

# ethtool --set-fec enp130s0np0 encoding auto

Finally verify the setting again:

# ethtool --show-fec enp130s0np0
FEC parameters for enp130s0np0:
Configured FEC encodings: Auto Off BaseR RS
Active FEC encoding: Auto

FEC and auto-negotiation settings are persisted on the SmartNIC across reboots.

Note

In this context setting the interface mode to auto specifies that the encoding scheme should be automatically determined if possible. It does not enable auto-negotiation of link speed between 10Gbps and 25Gbps

Setting Interface Breakout Mode

The following commands only work on kernel versions 4.13 and later. If your kernel is older than 4.13 or you do not have devlink support enabled refer to the following section on configuring interfaces: Configure Media Settings.

Note

Breakout mode settings are only applicable to Agilio CX 40GbE and CX 2x40GbE SmartNICs.

Determine the card’s PCI address:

# lspci -Dkd 19ee:4000
0000:04:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
    Subsystem: Netronome Systems, Inc. Device 4001
    Kernel driver in use: nfp
    Kernel modules: nfp

List the devices:

# devlink dev show
pci/0000:04:00.0

Split the first physical 40G port from 1x40G to 4x10G ports:

# devlink port split pci/0000:04:00.0/0 count 4

Split the second physical 40G port from 1x40G to 4x10G ports:

# devlink port split pci/0000:04:00.0/4 count 4

If the SmartNIC’s port is already configured in breakout mode (it has already been split) then devlink will respond with an argument error. Whenever change to the port configuration are made, the original netdev(s) associated with the port will be removed from the system:

# dmesg | tail
[ 5696.432306] nfp 0000:04:00.0: nfp: Port #0 config changed, unregistering. Driver reload required before port will be operational again.
[ 6270.553902] nfp 0000:04:00.0: nfp: Port #4 config changed, unregistering. Driver reload required before port will be operational again.

The driver needs to be reloaded for the changes to take effect. Older driver/SmartNIC NVRAM versions may require a system reboot for changes to take effect. The driver communicates events related to port split/unsplit in the system logs. The driver may be reloaded with the following command:

# rmmod nfp; modprobe nfp

After reloading the driver, the netdevs associated with the split ports will be available for use:

# ip link show
...
68: enp4s0np0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
69: enp4s0np0s1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
70: enp4s0np0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
71: enp4s0np0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
72: enp4s0np1s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
73: enp4s0np1s1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
74: enp4s0np1s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
75: enp4s0np1s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

Note

There is an ordering constraint to splitting and unsplitting the ports on Agilio CX 2x40GbE SmartNICs. The first physical 40G port cannot be split without the second physical port also being split, hence 1x40G + 4x10G is always invalid even if it’s only intended to be a transitional mode. The driver will reject such configurations.

Breakout mode persists on the SmartNIC across reboots. To revert back to the original 2x40G ports use the unsplit subcommand.

Unsplit Port 1:

# devlink port unsplit pci/0000:04:00.0/4

Unsplit Port 0:

# devlink port unsplit pci/0000:04:00.0/0

The NFP drivers will again have to be reloaded (rmmod nfp then modprobe nfp) for unsplit changes in the port configuration to take effect.

Confirming Connectivity

Allocating IP Addresses

Under RHEL/Centos 7.5, the network configuration is managed by default using NetworkManager. The default configuration for unset interfaces is auto, which implies that an auto-configuration client is running on them. This means that any manual configuration made using ifconfig or iproute2 will be periodically erased.

Consult the NetworkManager documentation for detailed instructions. For example, if a connection is named ens1np0 (which corresponds to the physical port representor ens1np0 of the SmartNIC), the following commands will set the IPv4 address statically, set it to autostart on boot, and up the interface:

# nmcli c m <netdev port> ipv4.method manual
# nmcli c m <netdev port> ipv4.addresses 10.0.0.2/24
# nmcli c m <netdev port> connection.autoconnect yes
# nmcli c u <netdev port>

Alternatively, if the interface is not under control of the distribution’s network management subsystem, iproute2 can be used to configure the port:

# assign IP address to interface
# ip address add 10.0.0.2/24 dev <netdev port>
# ip link set <netdev port> up

Pinging interfaces

After you have successfully assigned IP addresses to the NFP interfaces perform a standard ping test to confirm connectivity:

# ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.067 ms
64 bytes from 10.0.0.2: icmp_seq=4 ttl=64 time=0.062 ms

Basic Performance Test

iPerf is a basic traffic generator and network performance measuring tool that can be used to quickly determine the throughput achievable by a device.

Set IRQ affinity

Balance interrupts across available cores located on the NUMA node of the SmartNIC. A script to perform this action is available for download at https://raw.githubusercontent.com/Netronome/nfp-drv-kmods/master/tools/set_irq_affinity.sh

The source code of this script is also included at Appendix G: set_irq_affinity.sh Source

Example output:

# /nfp-drv-kmods/tools/set_irq_affinity.sh <netdev>

Device 0000:02:00.0 is on node 0 with cpus 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
IRQ 181 to CPU 0     (irq: 00,00000001 xps: 03,00030003)
IRQ 182 to CPU 1     (irq: 00,00000002 xps: 00,00000000)
IRQ 183 to CPU 2     (irq: 00,00000004 xps: 0c,000c000c)
IRQ 184 to CPU 3     (irq: 00,00000008 xps: 00,00000000)
IRQ 185 to CPU 4     (irq: 00,00000010 xps: 30,00300030)
IRQ 186 to CPU 5     (irq: 00,00000020 xps: 00,00000000)
IRQ 187 to CPU 6     (irq: 00,00000040 xps: c0,00c000c0)
IRQ 188 to CPU 7     (irq: 00,00000080 xps: 00,00000000)

Install iPerf

Ubuntu:

# apt-get install -y iperf

CentOS/RHEL:

# yum install -y iperf

Run iPerf Test

Server

Run iPerf on the server:

# ip address add 10.0.0.1/24 dev ens1np0
# iperf -s

Client

Allocate an ip address on the same range as used by the server, then execute the following on the client to connect to the server and start running the test:

# iperf -c 10.0.0.1 -P 4

Example output of 1x40G link:

# iperf -c 10.0.0.1 -P 4
------------------------------------------------------------
Client connecting to 10.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  5] local 10.0.0.2 port 56938 connected with 10.0.0.1 port 5001
[  3] local 10.0.0.2 port 56932 connected with 10.0.0.1 port 5001
[  4] local 10.0.0.2 port 56934 connected with 10.0.0.1 port 5001
[  6] local 10.0.0.2 port 56936 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  6]  0.0-10.0 sec  11.9 GBytes  10.3 Gbits/sec
[  3]  0.0-10.0 sec  9.85 GBytes  8.46 Gbits/sec
[  4]  0.0-10.0 sec  11.9 GBytes  10.2 Gbits/sec
[  5]  0.0-10.0 sec  10.2 GBytes  8.75 Gbits/sec
[SUM]  0.0-10.0 sec  43.8 GBytes  37.7 Gbits/sec

Using iPerf3

iPerf3 can also be used to measure performance, however multiple instances have to be chained to properly create multiple threads:

On the server:

# iperf3 -s -p 5001 & iperf3 -s -p 5002 & iperf3 -s -p 5003 & iperf3 -s -p 5004 &

On the client:

# iperf3 -c 102.0.0.6 -i 30 -p 5001 & iperf3 -c 102.0.0.6 -i 30 -p 5002 & iperf3 -c 102.0.0.6 -i 30 -p 5003 & iperf3 -c 102.0.0.6 -i 30 -p 5004 &

Example output:

[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  9.39 GBytes  8.03 Gbits/sec                  receiver
[  5]  10.00-10.04  sec  33.1 MBytes  7.77 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  9.86 GBytes  8.44 Gbits/sec                  receiver
[  5]  10.00-10.04  sec  53.6 MBytes  11.8 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  11.9 GBytes  10.2 Gbits/sec                  receiver
[  5]  10.00-10.04  sec  42.1 MBytes  9.43 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  10.2 GBytes  8.70 Gbits/sec                  receiver

Total: 37.7 Gbits/sec

95.49% of 40GbE link

Basic Firmware Features

In this section ethtool will be used to view and configure SmartNIC interface parameters.

Setting Interface Settings

Unless otherwise stated, changing the interface settings detailed below will not require reloading of the NFP drivers for changes to take effect, unlike the interface breakouts described in Configuring Interface Media Mode.

Multiple Queues

The Physical Functions on a SmartNIC support multiple transmit and receive queues.

View current settings

The -l flag can be used to view current queue/channel configuration e.g:

#ethtool -l ens1np0
Channel parameters for ens1np0:
Pre-set maximums:
RX:             20
TX:             20
Other:          2
Combined:       20
Current hardware settings:
RX:             0
TX:             12
Other:          2
Combined:       8

Configure Queues

The -L flag can be used to change interface queue/channel configuration. The following parameters can be configured:

rx
Receive ring interrupts
tx
Transmit ring interrupts
combined
interrupts that service both rx & tx rings

Note

Having RXR-only and TXR-only interrupts are not allowed.

In practice use this formula to calculate parameters for the ethtool command: combined = min(RXR, TXR) ; rx = RXR - combined ; tx = TXR - combined

To configure 8 combined interrupt servicing:

# ethtool -L <intf> rx 0 tx 0 combined 8

Receive side scaling (RSS)

RSS is a technology that focuses on effectively distributing received traffic to the spectrum of RX queues available on a given network interface based on a hash function.

View current hash parameters

The -n flag can be used to view current RSS configuration, for example by default:

# ethtool -n <netdev> rx-flow-hash tcp4
TCP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]

# ethtool -n <netdev> rx-flow-hash udp4
UDP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA

Set hash parameters

The -N flag can be used to change interface RSS configuration e.g:

# ethtool -N <netdev> rx-flow-hash tcp4 sdfn
# ethtool -N <netdev> rx-flow-hash udp4 sdfn

The ethtool man pages can be consulted for full details of what RSS flags may be set

Configuring the key

The -x flag can be used to view current interface key configuration, for example:

# ethtool -x <intf>
# ethtool -X <intf> <hkey>

View Interface Parameters

The -k flag can be used to view current interface configurations, for example using a Agilio CX 1x40GbE NIC which has an interface id enp4s0np0:

# ethtool -k <netdev>
Features for enp4s0np0:
rx-checksumming: off [fixed]
tx-checksumming: off
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: off
tx-scatter-gather: off [fixed]
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
busy-poll: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
rx-udp_tunnel-port-offload: off [fixed]

Receive Checksumming (rx-checksumming)

When enabled, checksum calculation and error checking comparison for received packets is offloaded to the NFP SmartNIC’s flow processor rather than the host CPU.

To enable rx-checksumming:

# ethtool -K <netdev> rx on

To disable rx-checksumming:

# ethtool -K <netdev> rx off

Transmit Checksumming (tx-checksumming)

When enabled, checksum calculation for outgoing packets is offloaded to the NFP SmartNIC’s flow processor rather than the host’s CPU.

To enable tx-checksumming:

# ethtool -K <netdev> tx on

To disable tx-checksumming:

# ethtool -K <netdev> tx off

Scatter and Gather (scatter-gather)

When enabled the NFP will use scatter and gather I/O, also known as Vectored I/O, which allows a single procedure call to sequentially read data from multiple buffers and write it to a single data stream. Only changes to the scatter-gather interface settings (from on to off or off to on) will produce a terminal output as shown below:

To enable scatter-gather:

# ethtool -K <netdev> sg on
Actual changes:
scatter-gather: on
        tx-scatter-gather: on
generic-segmentation-offload: on

To disable scatter-gather:

# ethtool -K <netdev> sg off
Actual changes:
scatter-gather: on
        tx-scatter-gather: on
generic-segmentation-offload: on

TCP Segmentation Offload (TSO)

When enabled, this parameter causes all functions related to the segmentation of TCP packets at egress to be offloaded to the NFP.

To enable tcp-segmentation-offload:

# ethtool -K <netdev> tso on

To disable tcp-segmentation-offload:

# ethtool -K <netdev> tso off

Generic Segmentation Offload (GSO)

This parameter offloads segmentation for transport layer protocol data units other than segments and datagrams for TCP/UDP respectively to the NFP. GSO operates at packet egress.

To enable generic-segmentation-offload:

# ethtool -K <netdev> gso on

To disable generic-segmentation-offload:

# ethtool -K <netdev> gso off

Generic Receive Offload (GRO)

This parameter enables software implementation of Large Receive Offload (LRO), which aggregates multiple packets at ingress into a large buffer before they are passed higher up the networking stack.

To enable generic-receive-offload:

# ethtool -K <netdev> gro on

To disable generic-receive-offload:

# ethtool -K <netdev> gro off

Note

Do take note that scripts that use ethtool -i <interface> to get bus-info will not work on representors as this information is not populated for representor devices.

Installing, Configuring and Using DPDK

Enabling IOMMU

In order to use the NFP device with DPDK applications, the VFIO/IGB module has to be loaded.

Firstly, the machine has to have IOMMU enabled. The following link: http://dpdk-guide.gitlab.io/dpdk-guide/setup/binding.html contains some generic information about binding devices including the possibility of using UIO instead of VFIO, and also mentions the VFIO no-IOMMU mode.

Although DPDK focuses on avoiding interrupts, there is an option of a NAPI-like approach using RX interrupts. This is supported by PMD NFP and with VFIO it is possible to have an RX interrupt per queue (with UIO just one interrupt per device). Because of this VFIO is the preferred option.

Edit grub configuration file

This change is required for working with VFIO, however when using kernels 4.5+, it is possible to work with VFIO and no-IOMMU mode. If your system comes with a kernel > 4.5, you can work with VFIO and no-IOMMU if desired by enabling this mode:

# echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode

For kernels older than 4.5, working with VFIO requires the enabling of IOMMU in the kernel at boot time. Add the following kernel parameters to /etc/default/grub to enable IOMMU:

GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt intremap=on"

It is worth noting that iommu=pt is not required for DPDK if VFIO is used, but it does avoid a performance impact in host drivers, such as the NFP netdev driver, when intel_iommu=on is enabled.

Implement changes

Apply kernel parameters changes and reboot.

Ubuntu:

# update-grub2
# reboot

CentOS/RHEL:

# grub2-mkconfig -o /boot/grub2/grub.cfg
# reboot

DPDK sources with PF PMD support

PF PMD multiport support

The PMD can work with up to 8 ports on the same PF device. The number of available ports is firmware and hardware dependent, and the driver looks for a firmware symbol during initialization to know how many can be used.

DPDK apps work with ports, and a port is usually a PF or a VF PCI device. However, with the NFP PF multiport there is just one PF PCI device. Supporting this particular configuration requires the PMD to create ports in a special way, although once they are created, DPDK apps should be able to use them as normal PCI ports.

NFP ports belonging to same PF can be seen inside PMD initialization with a suffix added to the PCI ID: wwww:xx:yy.z_port_n. For example, a PF with PCI ID 0000:03:00.0 and four ports is seen by the PMD code as:

0000:03:00.0_port_0
0000:03:00.0_port_1
0000:03:00.0_port_2
0000:03:00.0_port_3

Note

There are some limitations with multiport support: RX interrupts and device hot-plugging are not supported.

Installing DPDK

Physical Function PMD support has been upstreamed into DPDK 17.11. If an earlier version of DPDK is required, please refer to Appendix D: Obtaining DPDK-ns.

Install prerequisites:

# apt-get -y install gcc libnuma-dev make

Obtain DPDK sources:

# cd /usr/src/
# wget http://fast.dpdk.org/rel/dpdk-17.11.tar.xz
# tar xf dpdk-17.11.tar.xz
# export DPDK_DIR=/usr/src/dpdk-17.11
# cd $DPDK_DIR

Configure and install DPDK:

# export DPDK_TARGET=x86_64-native-linuxapp-gcc
# export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
# make install T=$DPDK_TARGET DESTDIR=install

Binding DPDK PF driver

Note

This section details the binding of dpdk-enabled drivers to the Physical Functions.

Attaching vfio-pci driver

Load vfio-pci driver module:

# modprobe vfio-pci

Unbind current drivers:

# PCIA=0000:$(lspci -d 19ee:4000 | awk '{print $1}')
# echo $PCIA > /sys/bus/pci/devices/$PCIA/driver/unbind

Bind vfio-pci driver:

# echo 19ee 4000 > /sys/bus/pci/drivers/vfio-pci/new_id

Attaching igb-uio driver

Load igb-uio driver module:

# modprobe uio
# DRKO=$(find $DPDK_DIR -iname 'igb_uio.ko' | head -1 )
# insmod $DRKO

Unbind current drivers:

# PCIA=0000:$(lspci -d 19ee:4000 | awk '{print $1}')
# echo $PCIA > /sys/bus/pci/devices/$PCIA/driver/unbind

Bind igb_uio driver:

# echo 19ee 4000 > /sys/bus/pci/drivers/igb_uio/new_id

Confirm attached driver

Confirm that the driver has been attached:

# lspci -kd 19ee:

01:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: nfp
        Kernel modules: nfp
01:08.0 Ethernet controller: Netronome Systems, Inc. Device 6003
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: igb_uio
        Kernel modules: nfp

Unbind driver

Determine card address:

# PCIA=$(lspci -d 19ee: | awk '{print $1}')

Unbind vfio-pci driver:

# echo 0000:$PCIA > /sys/bus/pci/drivers/vfio-pci/unbind

Unbind igb_uio driver:

# echo 0000:$PCIA > /sys/bus/pci/drivers/igb_uio/unbind

Using DPDK PF driver

Using SR-IOV

SR-IOV is a PCI feature that allows virtual functions (VFs) to be created from a physical function (PF). The VFs thus share the resources of a PF, while VFs remain isolated from each other. The isolated VFs are typically assigned to virtual machines (VMs) on the host. In this way, the VFs allow the VMs to directly access the PCI device, thereby bypassing the host kernel.

Installing the SR-IOV capable firmware

Before installing the SR-IOV capable firmware, ensure that SR-IOV is enabled in the BIOS of the host machine. If SR-IOV is disabled or unsupported by the motherboard/chipset being used, the kernel message log will contain a PCI SR-IOV:-12 error when trying to create a VF at a later stage. This can be queried using the dmesg tool.

The firmware currently running on the SmartNIC can be determined by the ethtool command. As an example, Ubuntu 18.04 LTS contains the following upstreamed firmware:

# ethtool -i enp2s0np0 | head -3
driver: nfp
version: 4.15.0-20-generic SMP mod_unloa
firmware-version: 0.0.3.5 0.22 nic-2.0.4 nic

From the above output, the upstreamed firmware is nic-2.0.4. The prefix nic indicates that the firmware implements the basic NIC functionality. The suffix 2.0.4 indicates the firmware version.

Firmware sriov-2.1.x or greater provides SR-IOV capability. There are two methods in which the firmware can be obtained, either from the linux-firmware package or from the support site.

The linux-firmware package

The SR-IOV capable firmware has been upstreamed into the linux-firmware package. For rpm packages, this will be available from linux-firmware 20181008-88 onwards. As of Ubuntu 18.10, the linux-firmware Debian package does not yet contain SR-IOV capable firmware.

Ensure that the latest linux-firmware package is installed.

For RHEL / Fedora / CentOS:

# yum update linux-firmware

The linux-firmware package will store the Netronome firmware files in the /lib/firmware/netronome directory. This directory contains symbolic links which point to the actual firmware files. The actual firmware files will be located in subdirectories, with each subdirectory related to a different SmartNIC functionality. Consider the following tree structure:

# tree /lib/firmware/netronome
/lib/firmware/netronome/
├── flower
│   ├── nic_AMDA0081-0001_1x40.nffw -> nic_AMDA0081.nffw
│   ├── nic_AMDA0081-0001_4x10.nffw -> nic_AMDA0081.nffw
│   ├── ...
├── nic
│   ├── nic_AMDA0058-0011_2x40.nffw
│   ├── nic_AMDA0058-0012_2x40.nffw
│   ├── ...
├── nic-sriov
│   ├── nic_AMDA0058-0011_2x40.nffw
│   ├── nic_AMDA0058-0012_2x40.nffw
│   ├── ...
├── nic_AMDA0058-0011_2x40.nffw -> nic/nic_AMDA0058-0011_2x40.nffw
├── nic_AMDA0058-0012_2x40.nffw -> nic/nic_AMDA0058-0012_2x40.nffw
├── ...

As can be seen from the tree structure, three functionalities (flower, nic, nic-sriov) are supplied by the linux-firmware package. If nic-sriov is missing, follow the The support site method. Point the symbolic links to the specific application required, in this case nic-sriov:

# ln -sf /lib/firmware/netronome/nic-sriov/* /lib/firmware/netronome/

The support site

The SR-IOV capable firmware can be obtained from the Netronome support site. Upon downloading the packaged firmware, install the firmware files.

For Debian / Ubuntu:

# dpkg -i agilio-sriov-firmware-2.1.x.deb

For RHEL / Fedora / CentOS:

# yum -y install agilio-sriov-firmware-2.1.x.rpm

The /lib/firmware/netronome directory contains symbolic links which point to the actual firmware files. When installing the above firmware package, the symbolic links are automatically updated to point to the new SR-IOV capable firmware files. This can be confirmed with:

# ls -og --time-style="+" /lib/firmware/netronome
...
lrwxrwxrwx 1   64  nic_AMDA0058-0011_2x40.nffw -> /opt/netronome/agilio-sriov-firmware/nic_AMDA0058-0011_2x40.nffw
...

Load firmware to SmartNIC

Remove and reload the driver. The driver will subsequently install the new firmware to the SmartNIC:

# modprobe -r nfp
# modprobe nfp

The ethtool command can be used to verify that the correct firmware has been loaded onto the SmartNIC:

# ethtool -i enp2s0np0 | head -3
driver: nfp
version: 4.15.0-20-generic SMP mod_unloa
firmware-version: 0.0.3.5 0.22 sriov-2.1.14 nic

Notice that the firmware has successfully changed from nic-2.0.4 to sriov-2.1.14.

Note

Because the /lib/firmware/netronome directory is managed by the linux-firmware package, an update to this package will cause the symbolic links to point back to the nic firmware files. If a system reboot or a driver reload occurs after the links were changed, the incorrect firmware will be loaded to the SmartNIC. In this event, repeat the Installing the SR-IOV capable firmware procedure to restore the desired functionality. A workaround is possible, but involves additional configuration of the initramfs file system. Customers interested in this workaround can Contact Us for more information.

Configuring SR-IOV

At this stage, there are still zero VFs, and only one PF (assuming only one Netronome SmartNIC is installed):

# lspci -kd 19ee:
02:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: nfp
        Kernel modules: nfp

The number of supported VFs on a netdev is exposed by sriov_totalvfs in sysfs. For example, if enp2s0np0 is the interface associated with the SmartNIC’s PF, the following command will return the total supported number of VFs:

# cat /sys/class/net/enp2s0np0/device/sriov_totalvfs
56

VFs can be allocated to a network interface by writing an integer to the sysfs file. For example, to allocate two VFs to enp2s0np0:

# echo 2 > /sys/class/net/enp2s0np0/device/sriov_numvfs

The new VFs, together with the PF, can be observed with the lspci command:

# lspci -kd 19ee:
02:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: nfp
        Kernel modules: nfp
02:08.0 Ethernet controller: Netronome Systems, Inc. Device 6003
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: nfp_netvf
        Kernel modules: nfp
02:08.1 Ethernet controller: Netronome Systems, Inc. Device 6003
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: nfp_netvf
        Kernel modules: nfp

In this example, the PF is located at PCI address 02:00.0. The two VFs are located at 02:08.0 and 02:08.1. Notice that the VFs are identified by Device 6003, and that they use the nfp_netvf kernel driver. For RHEL 7.x systems however, the VFs will use the nfp driver.

Note

If the SmartNIC has more than one physical port (phyport), the VFs will appear to be connected to all the phyports (as reported by the ip link command). This happens due to the PF being shared among all VFs. In reality, the VFs are only connected to phyport 0.

SR-IOV VFs cannot be reallocated dynamically. In order to change the number of allocated VFs, existing functions must first be deallocated by writing a 0 to the sysfs file. Otherwise, the system will return a device or resource busy error:

# echo 0 > /sys/class/net/enp2s0np0/device/sriov_numvfs

Note

Ensure any VMs are shut down and applications that may be using the VFs are stopped before deallocation.

In order to persist the VFs on the system, it is suggested that the system networking scripts be updated to manage them. The following snippet illustrates how to do this with NetworkManager for the PF enp2s0np0:

1
2
3
4
5
6
7
8
cat >/etc/NetworkManager/dispatcher.d/99-create-vfs << EOF
#!/bin/sh
# This is a NetworkManager script to persist the maximum number of VFs on a netdev
[ "enp2s0np0" == "\$1" -a "up" == "\$2" ] && \
    cat /sys/class/net/enp2s0np0/device/sriov_totalvfs > /sys/class/net/enp2s0np0/device/sriov_numvfs
exit
EOF
chmod 755 /etc/NetworkManager/dispatcher.d/99-create-vfs

In Ubuntu systems, networkd-dispatcher can be used in place of NetworkManager, using a similar approach to setting up the PF:

1
2
3
4
5
6
7
8
#!/bin/sh
cat > /usr/lib/networkd-dispatcher/routable.d/50-ifup-noaddr << 'EOF'
#!/bin/sh
ip link set mtu 9216 dev enp2s0np0
ip link set up dev enp2s0np0
cat /sys/class/net/enp2s0np0/device/sriov_totalvfs > /sys/class/net/enp2s0np0/device/sriov_numvfs
EOF
chmod u+x /usr/lib/networkd-dispatcher/routable.d/50-ifup-noaddr

To enable PCI passthrough, edit the kernel command line at /etc/default/grub. Add the parameters intel_iommu=on iommu=pt to the existing command line:

GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0,115200 intel_iommu=on iommu=pt"

Then:

# update-grub

Ensure that the /boot/grub/grub.cfg file is updated with the aforementioned parameters:

# reboot

After reboot, confirm that the kernel has been started with the parameters:

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-20-generic root=UUID=179b45a3-def2-48b0-8f2f-7a5b6b3f913b ro console=tty1 console=ttyS0,115200 intel_iommu=on iommu=pt

Using virtio-forwarder

virtio-forwarder is a userspace networking application that forwards bi-directional traffic between SR-IOV VFs and virtio networking devices in QEMU virtual machines. virtio-forwarder implements a virtio backend driver using the DPDK’s vhost-user library and services designated VFs by means of the DPDK poll mode driver (PMD) mechanism.

The steps shown here closely correlate with the comprehensive virtio-forwarder docs. Ensure that the Requirements are met and that the setup of Using SR-IOV has been completed.

Installing virtio-forwarder

For Debian / Ubuntu:

# add-apt-repository ppa:netronome/virtio-forwarder
# apt-get update
# apt-get install virtio-forwarder

For RHEL / Fedora / CentOS:

# yum install yum-plugin-copr
# yum copr enable netronome/virtio-forwarder
# yum install virtio-forwarder

virtio-forwarder makes use of the DPDK library, therefore DPDK has to be installed. Carry out the instructions of Installing DPDK.

Configuring hugepages

For Ubuntu, modify libvirt’s apparmor permissions to allow read/write access to the hugepages directory and library files for QEMU. Add the following lines to the end of /etc/apparmor.d/abstractions/libvirt-qemu:

1
2
3
4
5
6
/tmp/virtio-forwarder/** rwmix,
# for latest QEMU
/usr/lib/x86_64-linux-gnu/qemu/* rmix,
# for access to hugepages
owner "/dev/hugepages/libvirt/qemu/**" rw,
owner "/dev/hugepages-1G/libvirt/qemu/**" rw,

Also edit the existing line, such that:

1
/tmp/{,**} r,

Restart the apparmor service:

# systemctl restart apparmor.service

For virtio-forwarder, 2M hugepages are required whereas QEMU/KVM performs better with 1G hugepages. It is recommended that at least 1375 pages of 2M be reserved for virtio-forwarder. The hugepages can be configured during boot time, for which the following should be added to the Linux kernel command line parameters:

hugepagesz=2M hugepages=1375 default_hugepagesz=1G hugepagesz=1G hugepages=8

Alternatively, hugepages can be configured manually after each boot. Reserve at least 1375 * 2M for virtio-forwarder:

# echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Reserve 8G for application hugepages (modify this as needed):

# echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

Since non-fragmented memory is required for hugepages, it is recommended that hugepages be configured during boot time.

hugetlbfs needs to be mounted on the file system to allow applications to create and allocate handles to the mapped memory. The following lines mount the two types of hugepages on /dev/hugepages (2M) and /dev/hugepages-1G (1G):

# grep hugetlbfs /proc/mounts | grep -q "pagesize=2M" || \
( mkdir -p /dev/hugepages && mount nodev -t hugetlbfs -o rw,pagesize=2M /dev/hugepages/ )
# grep hugetlbfs /proc/mounts | grep -q "pagesize=1G" || \
( mkdir -p /dev/hugepages-1G && mount nodev -t hugetlbfs -o rw,pagesize=1G /dev/hugepages-1G/ )

Finally, libvirt requires a special directory inside the hugepages mounts with the correct permissions in order to create the necessary per-VM handles:

# mkdir /dev/hugepages-1G/libvirt
# mkdir /dev/hugepages/libvirt
# chown [libvirt-]qemu:kvm -R /dev/hugepages-1G/libvirt
# chown [libvirt-]qemu:kvm -R /dev/hugepages/libvirt

Note

Substitute /dev/hugepages[-1G] with your actual hugepage mount directory. A 2M hugepage mount location is created by default by some distributions.

Restart the libvirt daemon:

# systemctl restart libvirtd

To check that hugepages are correctly reserved for each page size, the hugeadm utility can be used:

# hugeadm --pool-list

      Size  Minimum  Current  Maximum  Default
   2097152     2048     2048     2048        *
1073741824        8        8        8

Binding to vfio-pci

Since the VFs need to communicate directly with virtio-forwarder, a pass-through style driver, such as vfio-pci is required. The vfio-pci module is the preferred driver, compared to uio_pci_generic and igb_uio, of which the former lacks SR-IOV compatibility whereas the latter is considered outdated.

First, unbind the VF PCI devices from their current drivers:

# lspci -Dd 19ee:6003 | awk '{print $1}' | xargs -I{} echo \
"echo {} > /sys/bus/pci/devices/{}/driver/unbind;" | bash

The VFs which now have their drivers unbound, can be observed with the lspci command:

# lspci -kd 19ee:
02:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: nfp
        Kernel modules: nfp
02:08.0 Ethernet controller: Netronome Systems, Inc. Device 6003
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel modules: nfp
02:08.1 Ethernet controller: Netronome Systems, Inc. Device 6003
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel modules: nfp

Notice that the Kernel driver in use attribute was removed. To bind the vfio-pci driver to the VFs, first load the vfio-pci driver to the Linux kernel:

# modprobe vfio-pci

Then bind the driver to the VFs:

# echo 19ee 6003 > /sys/bus/pci/drivers/vfio-pci/new_id

The VFs are now bound to the vfio-pci driver:

# lspci -kd 19ee:
02:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: nfp
        Kernel modules: nfp
02:08.0 Ethernet controller: Netronome Systems, Inc. Device 6003
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: vfio-pci
        Kernel modules: nfp
02:08.1 Ethernet controller: Netronome Systems, Inc. Device 6003
        Subsystem: Netronome Systems, Inc. Device 4001
        Kernel driver in use: vfio-pci
        Kernel modules: nfp

Launching virtio-forwarder

In this guide, the use case will be virtio-forwarder acting as a server. This means virtio-forwarder will create and host the sockets to which VMs can connect at a later stage. To configure virtio-forwarder as the server, edit /etc/default/virtioforwarder so that VIRTIOFWD_VHOST_CLIENT is assigned a blank value:

# Non-blank enables vhostuser client mode (default: server mode)
VIRTIOFWD_VHOST_CLIENT=

The virtio-forwarder service can be configured to start during boot time:

# systemctl enable virtio-forwarder

To manually start the service after installation, run:

# systemctl start virtio-forwarder

Adding VF ports to virtio-forwarder

Modify socket permissions:

# chown -R libvirt-qemu:kvm /tmp/virtio-forwarder/

Dynamically map the PCI address of each VF to virtio-forwarder as follows:

# /usr/lib/virtio-forwarder/virtioforwarder_port_control.py add \
--virtio-id 1 --pci-addr 02:08.0
status: OK
# /usr/lib/virtio-forwarder/virtioforwarder_port_control.py add \
--virtio-id 2 --pci-addr 02:08.1
status: OK

The virtio-id parameter is compulsory and denotes the id of the relay through which traffic is routed. A relay can accept only a single PCI device and a single VM.

The VF ports added to virtio-forwarder can be confirmed with:

# /usr/lib/virtio-forwarder/virtioforwarder_stats.py \
--include-inactive | grep DPDK_ADDED
relay_1.vf_to_vm.internal_state=DPDK_ADDED
relay_2.vf_to_vm.internal_state=DPDK_ADDED

The VF ports can be removed in a similar fashion:

# /usr/lib/virtio-forwarder/virtioforwarder_port_control.py remove \
--virtio-id 1 --pci-addr 02:08.0
status: OK
# /usr/lib/virtio-forwarder/virtioforwarder_port_control.py remove \
--virtio-id 2 --pci-addr 02:08.1
status: OK

It is useful to watch the virtio-forwarder journal while adding or removing ports:

# journalctl -fu virtio-forwarder

The VF entries can also be modified statically within the /etc/default/virtioforwarder file. Consult the virtio-forwarder docs for more information.

Modify guest VM XML files

The snippets in this section should be inserted in each VM’s XML file.

The following snippet configures the connection between the VM and the virtio-forwarder service. Note that virtio-forwarder1.sock refers to virtio-id 1 and relay_1. The MAC address should be assigned the value of the specific VF to be paired with the VM. If left unassigned, libvirt will assign a random MAC address which will cause the VM’s traffic to be rejected by the SmartNIC. The PCI address is internal to the VM and can be chosen arbitrarily, but should be unique within the VM itself.

1
2
3
4
5
6
7
8
9
<devices>
<interface type='vhostuser'>
    <mac address='1e:a3:32:f8:3e:83'/>
    <source type='unix' path='/tmp/virtio-forwarder/virtio-forwarder1.sock' mode='client'/>
    <model type='virtio'/>
    <alias name='net1'/>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</interface>
</devices>

The VM also has to be configured to make use of the 1G hugepages that was reserved for this purpose:

1
2
3
4
5
<memoryBacking>
<hugepages>
    <page size='1048576' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>

Allocate CPUs and memory to the VM. It is especially important to specify memAccess='shared', as this allows the host and guest VM to share RAM. This is required by virtio-forwarder to write the packets to the VM.

1
2
3
4
5
6
7
<cpu mode='custom' match='exact'>
<model fallback='allow'>SandyBridge</model>
<feature policy='require' name='ssse3'/>
<numa>
    <cell id='0' cpus='0-1' memory='3670016' unit='KiB' memAccess='shared'/>
</numa>
</cpu>

The VMs can now be booted. Observing the host’s CPU usage (e.g. htop) will show that some of the cores will be utilized to the maximum (polling mechanism). The default number of cores dedicated for virtio-forwarder is 2, and can be adjusted in /etc/default/virtioforwarder by modifying the `` VIRTIOFWD_CPU_MASK`` value.

Appendix A: Netronome Repositories

All the software mentioned in this document can be obtained via the official Netronome repositories. Please find instructions below on how to enable access to the aforementioned repositories from your respective linux distributions.

Importing GPG-key

Download and Import GPG-key to your local machine:

For RHEL/Centos 7.5, download the public key:

# wget https://rpm.netronome.com/gpg/NetronomePublic.key

Import the public key:

# rpm --import NetronomePublic.key

For Ubuntu 18.04 LTS, download the public key:

# wget https://deb.netronome.com/gpg/NetronomePublic.key

Import the public key:

# apt-key add NetronomePublic.key

Configuring repositories

For RHEL/Centos 7.5, add Netronome’s repository:

# cat << 'EOF' > /etc/yum.repos.d/netronome.repo
[netronome]
name=netronome
baseurl=https://rpm.netronome.com/repos/centos/
gpgcheck=0
enabled=1
EOF
yum makecache

For Ubuntu 18.04 LTS, add Netronome’s repository:

# mkdir -p /etc/apt/sources.list.d/
# echo "deb https://deb.netronome.com/apt stable main" > \
    /etc/apt/sources.list.d/netronome.list

Update repository lists:

# apt-get update

Appendix B: Installing the Out-of-Tree NFP Driver

The nfp driver can be installed via the Netronome repository or built from source depending on your requirement.

Install Driver via Netronome Repository

Please refer to Appendix A: Netronome Repositories on how to configure the Netronome repository applicable to your distribution. When the repository has been successfully added install the nfp-driver package using the commands below.

RHEL 7.5

First install the required dependencies for Red Hat, DKMS is required to install the out-of-tree drivers:

# yum install -y kernel-devel-$(uname -r) elfutils-libelf-devel gcc
# wget http://fr2.rpmfind.net/linux/fedora/linux/updates/28/Everything/\
x86_64/Packages/d/dkms-2.6.1-1.fc28.noarch.rpm
# rpm -ivh dkms-2.6.1-1.fc28.noarch.rpm

Then install the NFP driver from the netronome repository added previously in Configuring repositories:

# yum list available | grep nfp-driver
agilio-nfp-driver-dkms.noarch            2017.12.18.2245.77334f7-1.el7   netronome

# yum install -y agilio-nfp-driver-dkms --nogpgcheck

RHEL/CentOS 7.5:

# yum install -y kernel-devel

# yum list available | grep nfp-driver
agilio-nfp-driver-dkms.noarch            2017.12.18.2245.77334f7-1.el7   netronome

# yum install agilio-nfp-driver-dkms --nogpgcheck

Ubuntu 18.04 LTS:

# apt-cache search nfp-driver
agilio-nfp-driver-dkms - agilio-nfp-driver driver in DKMS format.

# apt-get install agilio-nfp-driver-dkms

Kernel Changes

Take note that installing the DKMS driver will only install it for the currently running kernel. When you upgrade the installed kernel it may not automatically update the the nfp module to use the version in the DKMS package. In kernel versions older than v4.16 the MODULE_VERSION parameter of the in-tree module was not set, which causes DKMS to pick the module with the highest srcversion hash (https://github.com/dell/dkms/issues/14). This is worked around by the package install step adding a --force to the DKMS install, but this will not trigger on a kernel upgrade. To work around this issue, boot into the new kernel and then re-install the agilio-nfp-driver-dkms package.

This should not be a problem when upgrading from kernels v4.16 and newer as the MODULE_VERSION has been added since this revision and the DKMS version check should work properly. It’s not possible to determine which nfp.ko file was loaded by only relying on information provided by the kernel. However, it’s possible to confirm that the binary signature of a file on disk and the module loaded in memory is the same.

To confirm that the module in memory is the same as the file on disk, compare the srcversion tag. The in-memory module’s tag is at /sys/module/nfp/srcversion. The default on-disk version can be queried with modinfo

In-memory module:

# cat /sys/module/nfp/srcversion

On-disk module:

# modinfo nfp | grep "^srcversion:"

If these tags are in sync, the filename of the module provided by a modinfo query will identify the origin of the module:

# modinfo nfp | grep "^filename:"

If these tags are not in sync, there are likely conflicting copies of the module on the system: the initramfs may be out of sync or the module dependencies may be inconsistent.

The in-tree kernel module is usually located at the following path (please note, this module may be compressed with a .xz extension):

/lib/modules/$(uname -r)/kernel/drivers/net/ethernet/netronome/nfp/nfp.ko

The DKMS module is usually located at the following path:

/lib/modules/$(uname -r)/updates/dkms/nfp.ko

To ensure that the out-of-tree driver is correctly loaded instead of the in-tree module, the following commands can be run:

# mkdir -p /etc/depmod.d
# echo "override nfp * extra" > /etc/depmod.d/netronome.conf
# depmod -a
# modprobe -r nfp; modprobe nfp
# update-initramfs -u

Building from Source

Driver sources for Netronome Flow Processor devices, including the NFP-4000 and NFP-6000 models can be found at: https://github.com/Netronome/nfp-drv-kmods

RHEL/CentOS 7.5:

# yum install -y kernel-devel-$(uname -r) gcc git

Ubuntu 18.04:

# apt-get update
# apt-get install -y linux-headers-$(uname -r) build-essential libelf-dev

Clone, Build and Install

Finally, to clone, build and install the driver:

# git clone https://github.com/Netronome/nfp-drv-kmods.git
# cd nfp-drv-kmods
# make
# make install
# depmod -a

Appendix C: Working with Board Support Package

The NFP BSP provides infrastructure software and a development environment for managing NFP based platforms.

Install Software from Netronome Repository

Please refer to Appendix A: Netronome Repositories on how to configure the Netronome repository applicable to your distribution. When the repository has been successfully added install the BSP package using the commands below.

RHEL/CentOS 7.5:

# yum list available | grep nfp-bsp
nfp-bsp-6000-b0.x86_64                   2017.12.05.1404-1          netronome

# yum install nfp-bsp-6000-b0 --nogpgcheck
# reboot

Ubuntu 18.04 LTS:

# apt-cache search nfp-bsp
nfp-bsp-6000-b0 - Netronome NFP BSP

# apt-get install nfp-bsp-6000-b0

Install Software From deb/rpm Package

Obtain Software

The latest BSP packages can be obtained at the downloads area of the Netronome Support site (https://help.netronome.com).

Install the prerequisite dependencies

RHEL/Centos 7.5 Dependencies

No dependency installation required

Ubuntu 18.04 LTS Dependencies

Install BSP dependencies:

# apt-get install -y libjansson4

NFP BSP Package

Install the NFP BSP package provided by Netronome Support.

RHEL/CentOS 7.5:

# yum install -y nfp-bsp-6000-*.rpm --nogpgcheck

Ubuntu 18.04 LTS:

# dpkg -i nfp-bsp-6000-*.deb

Using BSP tools

Enable CPP access

The NFP has an internal Command Push/Pull (CPP) bus that allows debug access to the SmartNIC internals. CPP access allows user space tools raw access to chip internals and is required to enable the use of most BSP tools. Only the out-of-tree (oot) driver allows CPP access.

Follow the steps from Install Driver via Netronome Repository to install the oot nfp driver. After the nfp module has been built load the driver with CPP access:

# depmod -a
# rmmod nfp
# modprobe nfp nfp_dev_cpp=1 nfp_pf_netdev=0

To persist this option across reboots, a number of options are available; the distribution specific documentation will detail that process more thoroughly. Care must be taken that the settings are also applied to any initramfs images generated.

Configure Media Settings

Alternatively to the process described in Configuring Interface Media Mode, BSP tools can be used to configure the port speed of the SmartNIC use the following commands. Note, a reboot is still required for changes to take effect.

Agilio CX 2x25GbE - AMDA0099

To set the port speed of the CX 2x25GbE the following commands can be used

Set port 0 and port 1 to 10G mode:

# nfp-media phy1=10G phy0=10G

Set port 1 to 25G mode:

# nfp-media phy1=25G+

To change the FEC settings of the 2x25GbE the following commands can be used:

nfp-media --set-aneg=phy0=[S|A|I|C|F] --set-fec=phy0=[A|F|R|N]

Where the parameters for each argument are:

--set-aneg=:

S
search - Search through supported modes until link is found. Only one side should be doing this. It may result in a mode that can have physical layer errors depending on SFP type and what the other end wants. Long DAC cables with no FEC WILL have physical layer errors.
A
auto - Automatically choose mode based on speed and SFP type.
C
consortium - Consortium 25G auto-negotiation with link training.
I
IEEE - IEEE 10G or 25G auto-negotiation with link training.
F
forced - Mode is forced with no auto-negotiation or link training.

--set-fec=:

A
auto - Automatically choose FEC based on speed and SFP type.
F
Firecode - BASE-R Firecode FEC compatible with 10G.
R
Reed-Solomon - Reed-Solomon FEC new for 25G.
N
none - No FEC is used.
Agilio CX 1x40GbE - AMDA0081

Set port 0 to 40G mode:

# nfp-media phy0=40G

Set port 0 to 4x10G fanout mode:

# nfp-media phy0=4x10G
Agilio CX 2x40GbE - AMDA0097

Set port 0 and port 1 to 40G mode:

# nfp-media phy0=40G phy1=40G

Set port 0 to 4x10G fanout mode:

# nfp-media phy0=4x10G

For mixed configuration the highest port must be in 40G mode e.g:

# nfp-media phy0=4x10G phy1=40G

Appendix D: Obtaining DPDK-ns

Netronome specific DPDK sources can be acquired from the Official Netronome Support site (https://help.netronome.com). If you do not have an account already, you can request access by sending an email to help@netronome.com.

Download the dpdk-ns sources or deb/rpm package from the Netronome-Support site and perform the following steps to build or install DPDK.

Build DPDK-ns from sources

To build DPDK-ns from source assuming the tarball has been downloaded to the /root directory:

# cd /root
# tar zxvf dpdk-ns.tar.gz
# cd dpdk-ns

# export RTE_SDK=/root/dpdk-ns
# export RTE_TARGET=x86_64-native-linuxapp-gcc
# make T=$RTE_TARGET install

Install DPDK-ns from packages

Ubuntu:

# apt-get install -y netronome-dpdk*.deb

CentOS/RHEL:

# yum install -y netronome-dpdk*.rpm

Appendix E: Updating NFP Flash

The NVRAM flash software on the SmartNIC can be updated in one of two ways, either via ethtool or via the BSP userspace tools. In both cases, the BSP package needs to be installed to gain access to the intended flash image. After the flash has been updated, the system needs to be rebooted to take effect.

Note

The ethtool interface is only available for hosts running kernel 4.16 or higher when using the in-tree driver. Please use the out of tree driver to enable ethtool flashing on older kernels.

Note

warning

Updating the flash via ethtool is only supported if the existing flash version is greater than 0028.0028.007c. Installed NVRAM flash version can be checked with the command dmesg | grep BSP. Cards running older versions of the NVRAM flash must be updated using the method in Update via BSP Userspace Tools

Refer to Appendix C: Working with Board Support Package to acquire the BSP tool package.

Update via Ethtool

To update the flash using ethtool, the reflashing utilities used in the Netronome directory in the system must first be relocated so that ethtool has access to them:

# cp /opt/netronome/flash/flash-nic.bin /lib/firmware
# cp /opt/netronome/flash/flash-one.bin /lib/firmware

Thereafter, ethtool can be used to reflash the software loaded onto the SmartNIC devices identified by either their PF <netdev> or their physical ports <netdev port>:

# ethtool -f <netdev/netdev port> flash-nic.bin
# ethtool -f <netdev/netdev port> flash-one.bin

Update via BSP Userspace Tools

Obtain Out of Tree NFP Driver

To update the flash using the BSP userspace tools, use the following steps. Refer to Appendix B: Installing the Out-of-Tree NFP Driver on installing the out of tree NFP driver and to load the driver with CPP access.

Flash the Card

The following commands may be executed for each card installed in the system using the PCIe ID of the particular card. First reload the NFP drivers with CPP access enabled:

# rmmod nfp
# modprobe nfp nfp_pf_netdev=0 nfp_dev_cpp=1

Then use the included netronome flashing tools to reflash the card:

# /opt/netronome/bin/nfp-flash --preserve-media-overrides \
    -w /opt/netronome/flash/flash-nic.bin -Z <PCI ID, e.g. 04:00.0>
# /opt/netronome/bin/nfp-one -Z <PCI ID, e.g. 04:00.0>
# reboot

Appendix F: Upgrading the Kernel

RHEL 7.5

It is only recommended to use kernel packages released by Red Hat and installable as part of the distribution installation and upgrade procedure.

CentOS 7.5

The CentOS package installer yum will manage an update to the supported kernel version. The command yum install kernel-<version> updates the kernel for CentOS. First search for available kernel packages then install the desired release:

# yum list --showduplicates kernel
kernel.x86_64
3.10.0-862.el7
base
kernel.x86_64
3.10.0-862.2.3.el7
updates
kernel.x86_64
3.10.0-862.3.2.el7
updates

# yum install kernel-3.10.0-862.el7

Ubuntu 18.04 LTS

If desired, alternative kernels may be installed. For example, at the time of writing, v4.18 is the newest stable kernel.

Acquire packages

To download the kernels from Ubuntu’s ppa mainline:

# BASE=http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18/
# wget\
    $BASE/linux-headers-4.18.0-041800_4.18.0-041800.201808122131_all.deb \
    $BASE/linux-headers-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    $BASE/linux-image-unsigned-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    $BASE/linux-modules-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb

Install packages

To install the packages:

# dpkg -i \
    linux-headers-4.18.0-041800_4.18.0-041800.201808122131_all.deb \
    linux-headers-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    linux-image-unsigned-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb \
    linux-modules-4.18.0-041800-generic_4.18.0-041800.201808122131_amd64.deb

Appendix G: set_irq_affinity.sh Source

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
#!/bin/bash -e

# Copyright (C) 2018 Netronome Systems, Inc.
#
# This software is dual licensed under the GNU General License Version 2,
# June 1991 as shown in the file COPYING in the top-level directory of this
# source tree or the BSD 2-Clause License provided below.  You have the
# option to license this software under the complete terms of either license.
#
# The BSD 2-Clause License:
#
#     Redistribution and use in source and binary forms, with or
#     without modification, are permitted provided that the following
#     conditions are met:
#
#      1. Redistributions of source code must retain the above
#         copyright notice, this list of conditions and the following
#         disclaimer.
#
#      2. Redistributions in binary form must reproduce the above
#         copyright notice, this list of conditions and the following
#         disclaimer in the documentation and/or other materials
#         provided with the distribution.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

usage() {
    echo "Usage: $0 NETDEV"
    echo "          Optional env vars: IRQ_NAME_FMT"
    exit 1
}

[ $# -ne 1 ] && usage

[ "a$IRQ_NAME_FMT" == a ] && IRQ_NAME_FMT=$1-rxtx

DEV=$1
if ! [ -e /sys/bus/pci/devices/$DEV ]; then
    DEV=$(ethtool -i $1 | grep bus | awk '{print $2}')
    N_TX=$(ls /sys/class/net/$1/queues/ | grep tx | wc -l)
    N_CPUS=$(ls /sys/bus/cpu/devices/ | wc -l)
fi

[ "a$DEV" == a ] && usage

NODE=$(cat /sys/bus/pci/devices/$DEV/numa_node)
CPUL=$(cat /sys/bus/node/devices/node${NODE}/cpulist | tr ',' ' ')

N_NODES=$(ls /sys/bus/node/devices/ | wc -l)

for c in $CPUL; do
    # Convert "n-m" into "n n+1 n+2 ... m"
    [[ "$c" =~ '-' ]] && c=$(seq $(echo $c | tr '-' ' '))

    CPUS=(${CPUS[@]} $c)
done

echo Device $DEV is on node $NODE with cpus ${CPUS[@]}

IRQBAL=$(ps aux | grep irqbalance | wc -l)

[ $IRQBAL -ne 1 ] && echo Killing irqbalance && killall irqbalance

IRQS=$(ls /sys/bus/pci/devices/$DEV/msi_irqs/)


IRQS=($IRQS)

node_mask=$((~(~0 << N_NODES)))
node_shf=$((N_NODES - 1))
cpu_shf=$((N_TX << node_shf))

p_mask=0
id=0
for i in $(seq 0 $((${#IRQS[@]} - 1)))
do
    ! [ -e /proc/irq/${IRQS[i]} ] && continue

    name=$(basename /proc/irq/${IRQS[i]}/$IRQ_NAME_FMT*)
    ls /proc/irq/${IRQS[i]}/$IRQ_NAME_FMT* >>/dev/null 2>/dev/null || continue

    cpu=${CPUS[id % ${#CPUS[@]}]}

    m=0
    m_mask=node_mask
    if [ $N_TX -gt $((id + ${#CPUS[@]})) ]; then
        # Only take one CPU if there will be more rings on this CPU
        m_mask=1
    fi
    # Calc the masks we should cover
    for j in `seq 0 $cpu_shf $((N_CPUS - 1))`; do
        m=$((m << cpu_shf | (m_mask << ((cpu >> node_shf) << node_shf))))
        m=$((m & ~p_mask))
    done
    xps_mask=$(printf "%x" $((m % (1 << N_CPUS))))
    # Insert comma between low and hi 32 bits, if xps_mask is long enough
    xps_mask=`echo $xps_mask | sed 's/\(.\)\(.\{8\}$\)/\1,\2/'`
    p_mask=$((p_mask | m))

    echo $cpu > /proc/irq/${IRQS[i]}/smp_affinity_list
    irq_state="irq: $(cat /proc/irq/${IRQS[i]}/smp_affinity)"

    xps_state='xps: ---'
    xps_file=/sys/class/net/$1/queues/tx-$id/xps_cpus
    if [ -e $xps_file ]; then
        echo $xps_mask > $xps_file
        xps_state="xps: $(cat $xps_file)"
    fi

    echo -e "IRQ ${IRQS[i]} to CPU $cpu     ($irq_state $xps_state)"
    ((++id))
done

Contact Us

Netronome Systems, Inc.
2903 Bunker Hill Lane, Suite 150
Santa Clara, CA 95054
Tel: 408.496.0022 Fax: 408.586.0002
https://www.netronome.com help@netronome.com