Internet-Draft Benchmarking Containerized Infra July 2023
Tran, et al. Expires 6 January 2024 [Page]
Workgroup:
Benchmarking Methodology Working Group
Internet-Draft:
draft-dcn-bmwg-containerized-infra-11
Published:
Intended Status:
Informational
Expires:
Authors:
N. Tran
Soongsil University
S. Rao
The Linux Foundation
J. Lee
Soongsil University
Y. Kim
Soongsil University

Considerations for Benchmarking Network Performance in Containerized Infrastructures

Abstract

Recently, the Benchmarking Methodology Working Group has extended the laboratory characterization from physical network functions (PNFs) to virtual network functions (VNFs). Considering the network function implementation trend moving from virtual machine-based to container-based, system configurations and deployment scenarios for benchmarking will be partially changed by how the resource allocation and network technologies are specified for containerized VNFs. This draft describes additional considerations for benchmarking network performance when network functions are containerized and performed in general-purpose hardware.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 6 January 2024.

Table of Contents

1. Introduction

The Benchmarking Methodology Working Group(BMWG) has recently expanded its benchmarking scope from Physical Network Function(PNF) running on a dedicated hardware system to Network Function Virtualization(NFV) infrastructure and Virtualized Network Function(VNF). [RFC8172] described considerations for configuring NFV infrastructure and benchmarking metrics, and [RFC8204] gives guidelines for benchmarking virtual switch which connects VNFs in Open Platform for NFV(OPNFV).

Recently NFV infrastructure has evolved to include a lightweight virtualized platform called the containerized infrastructure, where network functions are virtualized by using the host operating system (OS) virtualization instead of hardware virtualization in virtual machine (VM)-based infrastructure based on the hypervisor. In comparison to VMs, containers do not have a separate hardware and kernel. Containerized virtual network functions (C-VNF) share the same kernel space on the same host, while their resources are logically isolated in different namespaces. Considering this architecture difference between container-based and virtual-machine based NFV systems, containerized NFV network performance benchmarking might have different System Under Test(SUT) and Device Under Test(DUT) configurations compared with both black-box benchmarking and VM-based NFV infrastructure as described in [RFC8172].

In terms of networking, to route traffic between containers which are isolated in different network namespaces, a container network plugin is required. This network plugin creates the network interface inside the container and tunnels it to the host network via the Linux bridge, virtual switch (vSwitch) or direct Network Interface Card (NIC) based on chosen networking techniques. These techniques include multiple different packet acceleration solutions which have been applied recently in containerized infrastructure to enhance containerized network throughput and line-rate transmission speed. The differences in the architecture of these acceleration solutions create different containerized networking models considerations which should be noticed while benchmarking containerized network performance. Besides, the unique architecture of containerized network might cause additional resource configuration considerations.

This draft aims to provide additional considerations as specifications to guide containerized infrastructure benchmarking compared with the previous benchmarking methodology of common NFV infrastructure. These considerations include investigation of multiple networking models based on the usage of different packet acceleration techniques, and investigation of several resources configurations that might impact on containerized network performance such as CPU isolation, hugepages, CPU cores and memory allocation, service function chaining. The benchmark experiences of these mentioned considerations are also presented in this draft as references. Note that, although the detailed configurations of both infrastructures differ, the new benchmarks and metrics defined in [RFC8172] and [RFC8204] can be equally applied in containerized infrastructure from a generic-NFV point of view, and therefore defining additional evaluation metrics or methodologies are out of scope.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document is to be interpreted as described in [RFC2119]. This document uses the terminology described in [RFC8172], [RFC8204], [ETSI-TST-009].

3. Containerized Infrastructure Overview

With the proliferation and popularity of Kubernetes, in a common containerized infrastructure, pod is defined as a basic unit for orchestration and management that can host multiple containers, with shared storage and network resources. Kubernetes supports several run-time options for containers such as Docker, CRI-O and containerd. In this document, the terms container and pod are used interchangeably, and Kubernetes concepts are used for general containerized infrastructure.

For benchmarking of the containerized infrastructure, as mentioned in [RFC8172], the basic approach is to reuse existing benchmarking methods developed within the BMWG. Various network function specifications defined in BMWG should still be applied to containerized VNF(C-VNF)s for the performance comparison with physical network functions and VM-based VNFs. A major distinction of the containerized infrastructure from the VM-based infrastructure is the absence of a hypervisor. Without hypervisor, all C- VNFs share the same host and kernel space. Storage, computing, and networking resources are logically isolated between containers via different namespaces.

Container networking is provided by Container Network Interface (CNI) Plugins . CNI plugins create the network link between containers and host’s external (real) interfaces. Different kinds of CNI plugins leverage different networking technologies and solutions to create this link. These include bringing host network device into container namespace, or creating network interface pairs with one side attached to container network namespace and the other attached to the host network namespace, either direct point-to-point, or via a bridge/switching function. To support packet acceleration techniques such as user-space networking, SR-IOV or eBPF, specific CNI plugins are required. The architectural differences of these CNIs bring additional considerations when benchmarking network performance in containerized infrastructure.

4. Benchmarking Considerations

4.1. Networking Models

Container networking services in Kubernetes are provided by CNI plugins which describe network configuration in JSON format. Initially, when a pod or container is first instantiated, it has no network. CNI plugins insert a network interface into the isolated container network namespace, and performs other necessary tasks to connect the host and container network namespaces. It then allocates IP address to the interface, configures routing consistent with the IP address management plugin. Different CNIs use different networking technologies to implement this connection. Based on the chosen networking technologies, and how the packet is processed/accelerated via the kernel-space and/or the user-space of the host, these CNIs can be categorized into different container networking models. The usage of each networking model and its corresponding CNIs can affect the container networking performance.

4.1.1. Kernel-space non-Acceleration Model

  +------------------------------------------------------------------+
  | User Space                                                       |
  |   +-----------+                                  +-----------+   |
  |   |   C-VNF   |                                  |   C-VNF   |   |
  |   | +-------+ |                                  | +-------+ |   |
  |   +-|  eth  |-+                                  +-|  eth  |-+   |
  |     +---^---+                                      +---^---+     |
  |         |                                              |         |
  |         |     +----------------------------------+     |         |
  |         |     |                                  |     |         |
  |         |     |  Networking Controller / Agent   |     |         |
  |         |     |                                  |     |         |
  |         |     +-----------------^^---------------+     |         |
  ----------|-----------------------||---------------------|----------
  |     +---v---+                   ||                 +---v---+     |
  |  +--|  veth |-------------------vv-----------------|  veth |--+  |
  |  |  +-------+     Switching/Routing Component      +-------+  |  |
  |  |         (Kernel Routing Table, OVS Kernel Datapath,        |  |
  |  |         Linux Bridge, MACVLAN/IPVLAN sub-interfaces)       |  |
  |  |                                                            |  |
  |  +-------------------------------^----------------------------+  |
  |                                  |                               |
  | Kernel Space         +-----------v----------+                    |
  +----------------------|          NIC         |--------------------+
                         +----------------------+

Figure 1: Example architecture of the Kernel-Space non-Acceleration Model

Figure 1 shows kernel-space non-Acceleration model. In this model, the virtual ethernet (veth) interface on the host side can be attached to different switching/routing components based on the chosen CNI. In the case of Calico, it is the direct point-to-point attachment to the host namespace then using Kernel routing table for routing between containers. For Flannel, it is the Linux Bridge. In the case of MACVLAN/IPVLAN, it is the corresponding virtual sub-interfaces. For dynamic networking configuration, the Forwarding policy can be pushed by the controller/agent located in the user-space. In the case of Open vSwitch (OVS) [OVS], configured with Kernel Datapath, the first packet of the 'non-matching' flow can be sent to the user space networking controller/agent (ovs-switchd) for dynamic forwarding decision.

In general, the switching/routing component is running on kernel space, data packets should be processed in-network stack of host kernel before transferring packets to the C-VNF running in user-space. Not only pod-to-External but also pod-to-pod traffic should be processed in the kernel space. This design makes networking performance worse than other networking models which utilize packet acceleration techniques described in below sections. Kernel-space vSwitch models are listed below:

o Docker Network [Docker-network], Flannel Network [Flannel], Calico [Calico], OVS (OpenvSwitch) [OVS], OVN (Open Virtual Network) [OVN], MACVLAN, IPVLAN

4.1.2. User-space Acceleration Model

  +------------------------------------------------------------------+
  | User Space                                                       |
  |   +---------------+                          +---------------+   |
  |   |     C-VNF     |                          |     C-VNF     |   |
  |   | +-----------+ |    +-----------------+   | +-----------+ |   |
  |   | |  virtio   | |    |    Networking   |   | |  virtio   |-|   |
  |   +-|  /memif   |-+    | Controller/Agent|   +-|  /memif   |-+   |
  |     +-----^-----+      +-------^^--------+     +-----^-----+     |
  |           |                    ||                    |           |
  |           |                    ||                    |           |
  |     +-----v-----+              ||              +-----v-----+     |
  |     | vhost-user|              ||              | vhost-user|     |
  |  +--|  / memif  |--------------vv--------------|  / memif  |--+  |
  |  |  +-----------+                              +-----------+  |  |
  |  |                          vSwitch                           |  |
  |  |                      +--------------+                      |  |
  |  +----------------------|      PMD     |----------------------+  |
  |                         |              |                         |
  |                         +-------^------+                         |
  ----------------------------------|---------------------------------
  |                                 |                                |
  |                                 |                                |
  |                                 |                                |
  | Kernel Space         +----------V-----------+                    |
  +----------------------|          NIC         |--------------------+
                         +----------------------+

Figure 2: Example architecture of the User-Space Acceleration Model

Figure 2 shows user-space vSwitch model, in which data packets from physical network port are bypassed kernel processing and delivered directly to the vSwitch running on user-space. This model is commonly considered as Data Plane Acceleration (DPA) technology since it can achieve high-rate packet processing than a kernel-space network with limited packet throughput. For bypassing kernel and directly transferring the packet to vSwitch, Data Plane Development Kit (DPDK) is essentially required. With DPDK, an additional driver called Pull-Mode Driver (PMD) is created on vSwtich. PMD driver must be created for each NIC separately. Userspace CNI [userspace-cni] is required to create user-space network interface (virtio or memif) at each container. User-space vSwitch models are listed below:

o OVS-DPDK [ovs-dpdk], VPP [vpp]

4.1.3. eBPF Acceleration Model

  +------------------------------------------------------------------+
  | User Space                                                       |
  |    +----------------+                     +----------------+     |
  |    |      C-VNF     |                     |      C-VNF     |     |
  |    | +------------+ |                     | +------------+ |     |
  |    +-|     eth    |-+                     +-|     eth    |-+     |
  |      +-----^------+                         +------^-----+       |
  |            |                                       |             |
  -------------|---------------------------------------|--------------
  |      +-----v-------+                        +-----v-------+      |
  |      |  +------+   |                        |  +------+   |      |
  |      |  | eBPF |   |                        |  | eBPF |   |      |
  |      |  +------+   |                        |  +------+   |      |
  |      | veth tc hook|                        | veth tc hook|      |
  |      +-----^-------+                        +------^------+      |
  |            |                                       |             |
  |            |   +-------------------------------+   |             |
  |            |   |                               |   |             |
  |            |   |       Networking Stack        |   |             |
  |            |   |                               |   |             |
  |            |   +-------------------------------+   |             |
  |      +-----v-------+                        +-----v-------+      |
  |      |  +------+   |                        |  +------+   |      |
  |      |  | eBPF |   |                        |  | eBPF |   |      |
  |      |  +------+   |                        |  +------+   |      |
  |      | veth tc hook|                        | veth tc hook|      |
  |      +-------------+                        +-------------+      |
  |      |     OR      |                        |     OR      |      |
  |    +-|-------------|------------------------|-------------|--+   |
  |    | +-------------+                        +-------------+  |   |
  |    | |  +------+   |                        |  +------+   |  |   |
  |    | |  | eBPF |   |         NIC Driver     |  | eBPF |   |  |   |
  |    | |  +------+   |                        |  +------+   |  |   |
  |    | |  XDP hook   |                        |  XDP hook   |  |   |
  |    | +-------------+                        +------------ +  |   |
  |    +---------------------------^-----------------------------+   |
  |                                |                                 |
  | Kernel Space          +--------v--------+                        |
  +-----------------------|       NIC       |------------------------+
                          +-----------------+
Figure 3: Example architecture of the eBPF Acceleration Model - non-AFXDP
  +------------------------------------------------------------------+
  | User Space                                                       |
  |    +-----------------+                    +-----------------+    |
  |    |      C-VNF      |                    |      C-VNF      |    |
  |    | +-------------+ |  +--------------+  | +-------------+ |    |
  |    +-|     eth     |-+  |   CNDP APIs  |  +-|     eth     |-+    |
  |      +-----^-------+    +--------------+    +------^------+      |
  |            |                                       |             |
  |      +-----v-------+                        +------v------+      |
  -------|    AFXDP    |------------------------|    AFXDP    |------|
  |      |    socket   |                        |    socket   |      |
  |      +-----^-------+                        +-----^-------+      |
  |            |                                       |             |
  |            |   +-------------------------------+   |             |
  |            |   |                               |   |             |
  |            |   |       Networking Stack        |   |             |
  |            |   |                               |   |             |
  |            |   +-------------------------------+   |             |
  |            |                                       |             |
  |    +-------|---------------------------------------|--------+    |
  |    | +-----|------+                           +----|-------+|    |
  |    | |  +--v---+  |                           |  +-v----+  ||    |
  |    | |  | eBPF |  |         NIC Driver        |  | eBPF |  ||    |
  |    | |  +------+  |                           |  +------+  ||    |
  |    | |  XDP hook  |                           |  XDP hook  ||    |
  |    | +-----^------+                           +----^-------+|    |
  |    +-------|-------------------^-------------------|--------+    |
  |            |                                       |             |
  -------------|---------------------------------------|--------------
  |            +---------+                   +---------+             |
  |               +------|-------------------|----------+            |
  |               | +----v-------+       +----v-------+ |            |
  |               | |   netdev   |       |   netdev   | |            |
  |               | |     OR     |       |     OR     | |            |
  |               | | sub/virtual|       | sub/virtual| |            |
  |               | |  function  |       |  function  | |            |
  | Kernel Space  | +------------+  NIC  +------------+ |            |
  +---------------|                                     |------------+
                  +-------------------------------------+

Figure 4: Example architecture of the eBPF Acceleration Model - using AFXDP supported CNI
  +------------------------------------------------------------------+
  | User Space                                                       |
  |   +---------------+                          +---------------+   |
  |   |     C-VNF     |                          |     C-VNF     |   |
  |   | +-----------+ |    +-----------------+   | +-----------+ |   |
  |   | |  virtio   | |    |    Networking   |   | |  virtio   |-|   |
  |   +-|  /memif   |-+    | Controller/Agent|   +-|  /memif   |-+   |
  |     +-----^-----+      +-------^^--------+     +-----^-----+     |
  |           |                    ||                    |           |
  |           |                    ||                    |           |
  |     +-----v-----+              ||              +-----v-----+     |
  |     | vhost-user|              ||              | vhost-user|     |
  |  +--|  / memif  |--------------vv--------------|  / memif  |--+  |
  |  |  +-----^-----+                              +-----^-----+  |  |
  |  |        |                 vSwitch                  |        |  |
  |  |  +-----v-----+                              +-----v-----+  |  |
  |  +--| AFXDP PMD |------------------------------| AFXDP PMD |--+  |
  |     +-----^-----+                              +-----^-----+     |
  |           |                                          |           |
  |     +-----v-----+                              +-----v-----+     |
  ------|   AFXDP   |------------------------------|   AFXDP   |-----|
  |     |   socket  |                              |   socket  |     |
  |     +-----^----+                               +-----^-----+     |
  |           |                                          |           |
  |           |    +-------------------------------+     |           |
  |           |    |                               |     |           |
  |           |    |       Networking Stack        |     |           |
  |           |    |                               |     |           |
  |           |    +-------------------------------+     |           |
  |           |                                          |           |
  |    +------|------------------------------------------|--------+  |
  |    | +----|-------+                           +------|-----+  |  |
  |    | |  +-v----+  |                           |  +---v--+  |  |  |
  |    | |  | eBPF |  |         NIC Driver        |  | eBPF |  |  |  |
  |    | |  +------+  |                           |  +------+  |  |  |
  |    | |  XDP hook  |                           |  XDP hook  |  |  |
  |    | +------------+                           +------------+  |  |
  |    +----------------------------^-----------------------------+  |
  |                                 |                                |
  ----------------------------------|---------------------------------
  |                                 |                                |
  | Kernel Space         +----------v-----------+                    |
  +----------------------|          NIC         |--------------------+
                         +----------------------+
Figure 5: Example architecture of the eBPF Acceleration Model - using user-space vSwitch which support AFXDP PMD

The eBPF Acceleration model leverages the extended Berkeley Packet Filter (eBPF) technology [eBPF] to achieve high-performance packet processing. It enables execution of sandboxed programs inside abstract virtual machines within the Linux kernel without changing the kernel source code or loading the kernel module. To accelerate data plane performance, eBPF programs are attached to different BPF hooks inside the linux kernel stack.

One type of BPF hook is the eXpress Data Path (XDP) at the networking driver. It is the first hook that triggers eBPF program upon packet reception from external network. The other type of BPF hook is Traffic Control Ingress/Egress eBPF hook (tc eBPF). The eBPF program running at the tc hook enforce policy on all traffic exit the pod, while the eBPF program running at the XDP hook enforce policy on all traffic coming from NIC.

On the egress datapath side, whenever a packet exits the pod, it first goes through the pod’s veth interface. Then, the destination that received the packet depends on the chosen CNI plugin that is used to create container networking. If the chosen CNI plugin is a non-AFXDP-based CNI, the packet is received by the eBPF program running at veth interface tc hook. If the chosen CNI plugin is an AFXDP-supported CNI, the packet is received by the AFXDP socket [AFXDP]. AFXDP socket is a new Linux socket type which allows a fast packet delivery tunnel between itself and the XDP hook at the networking driver. This tunnel bypasses the network stack in kernel space to provide high-performance raw packet networking. Packets are transmitted between user space and AFXDP socket via a shared memory buffer. Once the egress packet arrived at the AFXDP socket or tc hook, it is directly forwarded to the NIC.

On the ingress datapath side, eBPF programs at the XDP hook/tc hook pick up packets from the NIC network devices (NIC ports). In case of using AFXDP CNI plugin [afxdp-cni], there are two operation modes: “primary” and “cdq”. In “primary” mode, NIC network devices can be directly allocated to pods. Meanwhile, in “cdq” mode, NIC network devices can be efficiently partioned to subfunctions or SR-IOV virtual functions, which enables multiple pods to share a primary network device. Then, from network devices, packets are directly delivered to the veth interface pair or AFXDP socket (via or not via AFXDP socket depends on the chosen CNI), bypass all of the kernel network layer processing such as iptables. In case of Cilium CNI [Cilium], context-switching process to the pod network namespace can also be bypassed.

Notable eBPF Acceleration models can be classified into 3 categories below. Their corresponding model architecture are shown in Figure 3, Figure 4, Figure 5.

o non-AFXDP: eBPF supported CNI such as Calico [Calico], Cilium [Cilium]

o using AFXDP supported CNI: AFXDP K8s plugin [afxdp-cni] used by Cloud Native Data Plane project [CNDP]

o using user-space vSwitch which support AFXDP PMD: OVS-DPDK [ovs-dpdk] and VPP [vpp] are the vSwitches that have AFXDP device driver support. Userspace CNI [userspace-cni] is used to enable container networking via these vSwitches.

Container network performance of Cilium project is reported by the project itself in [cilium-benchmark]. Meanwhile, AFXDP performance and comparison against DPDK are reported in [intel-AFXDP] and [LPC18-DPDK-AFXDP], respectively.

4.1.4. Smart-NIC Acceleration Model

  +------------------------------------------------------------------+
  | User Space                                                       |
  |    +-----------------+                    +-----------------+    |
  |    |      C-VNF      |                    |      C-VNF      |    |
  |    | +-------------+ |                    | +-------------+ |    |
  |    +-|  vf driver  |-+                    +-|  vf driver  |-+    |
  |      +-----^-------+                        +------^------+      |
  |            |                                       |             |
  -------------|---------------------------------------|--------------
  |            +---------+                   +---------+             |
  |               +------|-------------------|------+                |
  |               | +----v-----+       +-----v----+ |                |
  |               | | virtual  |       | virtual  | |                |
  |               | | function |       | function | |                |
  | Kernel Space  | +----^-----+  NIC  +-----^----+ |                |
  +---------------|      |                   |      |----------------+
                  | +----v-------------------v----+ |
                  | |      Classify and Queue     | |
                  | +-----------------------------+ |
                  +---------------------------------+
Figure 6: Examples of Smart-NIC Acceleration Model

Figure 6 shows Smart-NIC acceleration model, which does not use vSwitch component. This model can be separated into two technologies.

One is Single-Root I/O Virtualization (SR-IOV), which is an extension of PCIe specifications to enable multiple partitions running simultaneously within a system to share PCIe devices. In the NIC, there are virtual replicas of PCI functions known as virtual functions (VF), and each of them is directly connected to each container's network interfaces. Using SR-IOV, data packets from external bypass both kernel and user space and are directly forwarded to container’s virtual network interface. SRIOV network device plugin for Kubernetes [SR-IOV] is recommended to create an special interface at each container controlled by the VF driver.

The other technology is eBPF/XDP programs offloading to Smart-NIC card as mentioned in the previous section. It enables general acceleration of eBPF. eBPF programs are attached to XDP and run at the Smart-NIC card, which allows server CPUs to perform more application-level work. However, not all Smart-NIC cards provide eBPF/XDP offloading support.

4.1.5. Model Combination

  +-------------------------------------------------------+
  | User Space                                            |
  | +--------------------+         +--------------------+ |
  | |        C-VNF       |         |        C-VNF       | |
  | | +------+  +------+ |         | +------+  +------+ | |
  | +-|  eth |--|  eth |-+         +-|  eth |--|  eth |-+ |
  |   +---^--+  +---^--+             +--^---+  +---^--+   |
  |       |         |                   |          |      |
  |       |         |                   |          |      |
  |       |     +---v--------+  +-------v----+     |      |
  |       |     | vhost-user |  | vhost-user |     |      |
  |       |  +--|  / memif   |--|  / memif   |--+  |      |
  |       |  |  +------------+  +------------+  |  |      |
  |       |  |             vSwitch              |  |      |
  |       |  +----------------------------------+  |      |
  |       |                                        |      |
  --------|----------------------------------------|-------
  |       +-----------+              +-------------+      |
  |              +----|--------------|---+                |
  |              |+---v--+       +---v--+|                |
  |              ||  vf  |       |  vf  ||                |
  |              |+------+       +------+|                |
  | Kernel Space |                       |                |
  +--------------|           NIC         |----------------+
                 +-----------------------+
Figure 7: Examples of Model Combination deployment

Figure 7 shows the networking model when combining user-space vSwitch model and Smart-NIC acceleration model. This model is frequently considered in service function chain scenarios when two different types of traffic flows are present. These two types are North/South traffic and East/West traffic.

North/South traffic is the type that packets are received from other servers and routed through VNF. For this traffic type, Smart-NIC model such as SR-IOV is preferred because packets always have to pass the NIC. User-space vSwitch involvement in north-south traffic will create more bottlenecks. On the other hand, East/West traffic is a form of sending and receiving data between containers deployed in the same server and can pass through multiple containers. For this type, user-space vSwitch models such as OVS-DPDK and VPP are preferred because packets are routed within the user space only and not through the NIC.

The throughput advantages of these different networking models with different traffic direction cases are reported in [Intel-SRIOV-NFV].

4.2. Resources Configuration

The resources configuration consideration list here is not only applied for the C-VNF but also other components in a containerized SUT. A Containerized SUT is composed of NICs, possible cables between hosts, kernel and/or vSwitch, and C-VNFs.

4.2.1. CPU Isolation / NUMA Affinity

CPU pinning enables benefits such as maximizing cache utilization, eliminating operating system thread scheduling overhead as well as coordinating network I/O by guaranteeing resources. One example technology of CPU Pinning in containerized infrastructure is the CPU Manager for Kubernetes (CMK) [CMK]. This technology was proved to be effective in avoiding the "noisy neighbor" problem, as shown in an existing experience [Intel-EPA]. Besides, CPU Isolation techniques' benefits are not only applied for "noisy neighbor" problem. Different VNFs also neighbor each other and neighbor vSwitch if used.

NUMA affects the speed of different CPU cores when accessing different memory regions. CPU cores in the same NUMA nodes can locally access to the shared memory in that node, which is faster than remotely accessing the memory in a different NUMA node. In containerized network, packet forwarding is processed through NIC, VNF and a possible vSwitch based on chosen networking model. NIC's NUMA node alignment can be checked via the PCI devices' node affinity. Meanwhile, specific CPU cores can be direclty assigned to VNF and vSwtich via their configuration settings. Network performance can be changed depending on the location of the NUMA node whether it is the same NUMA node where the physical network interface, vSwitch and VNF are attached to. There is benchmarking experience for cross-NUMA performance impacts [cross-NUMA-vineperf]. In that tests, they consist of cross-NUMA performance with 3 scenarios depending on the location of the traffic generator and traffic endpoint. As the results, it was verified as below:

o A single NUMA Node serving multiple interfaces is worse than Cross-NUMA Node performance degradation

o Worse performance with VNF sharing CPUs across NUMA

Note that CPU Pinning and NUMA Affinity configurations considerations might also applied to VM-based VNF. As mentioned above, dedicated CPU cores of a specific NUMA node can be assigned to VNF and vSwitch via their own running configurations. NIC's NUMA node can be checked from the PCI devices' infomration. Host's NUMA nodes can be scheduled to virtual machines by specifying in their settings the chosen nodes.

4.2.2. Pod Hugepages

Hugepage configures a large page size of memory to reduce Translation Lookaside Buffer(TLB) miss rate and increase the application performance. This increases the performance of logical/virtual to physical address lookups performed by a CPU's memory management unit, and overall system performance. In the containerized infrastructure, the container is isolated at the application level, and administrators can set huge pages more granular level (e.g., Kubernetes allows to use of 2M bytes or 1G bytes huge pages for the container). Moreover, this page is dedicated to the application but another process, so the application uses the page more efficiently way. From a network benchmark point of view, however, the impact on general packet processing can be relatively negligible, and it may be necessary to consider the application level to measure the impact together. In the case of using the DPDK application, as reported in [Intel-EPA], it was verified to improve network performance because packet handling processes are running in the application together.

4.2.3. Pod CPU Cores and Memory Allocation

Different resources allocation choices may impact the container network performance. These include different CPU cores and RAM allocation to Pods, and different CPU cores allocation to the Poll Mode Driver and the vSwitch. Benchmarking experience from [ViNePERF] which was published in [GLOBECOM-21-benchmarking-kubernetes] verified that:

o 2 CPUs per Pod is insufficient for all packet frame sizes. With large packet frame sizes (over 1024), increasing CPU per pods significantly increases the throughput. Different RAM allocation to Pods also causes different throughput results

o Not assigning dedicated CPU cores to DPDK PMD causes significant performance dropss

o Increasing CPU core allocation to OVS-DPDK vSwitch does not affect its performance. However, increasing CPU core allocation to VPP vSwitch results in better latency.

Besides, regarding user-space acceleration model which uses PMD to poll packets to the user-space vSwitch, dedicated CPU cores assignment to PMD’s Rx Queues might improve the network performance.

4.2.4. Service Function Chaining

When we consider benchmarking for containerized and VM-based infrastructure and network functions, benchmarking scenarios may contain various operational use cases. Traditional black-box benchmarking focuses on measuring the in-out performance of packets from physical network ports since the hardware is tightly coupled with its function and only a single function is running on its dedicated hardware. However, in the NFV environment, the physical network port commonly will be connected to multiple VNFs(i.e., Multiple PVP test setup architectures were described in [ETSI-TST-009]) rather than dedicated to a single VNF. This scenario is called Service Function Chaining. Therefore, benchmarking scenarios should reflect operational considerations such as the number of VNFs or network services defined by a set of VNFs in a single host. [service-density] proposed a way for measuring the performance of multiple NFV service instances at a varied service density on a single host, which is one example of these operational benchmarking aspects. Another aspect in benchmarking service function chaining scenario should be considered is different network acceleration technologies. Network performance differences may occur because of different traffic patterns based on the provided acceleration method.

4.2.5. Additional Considerations

Apart from the single-host test scenario, the multi-hosts scenario should also be considered in container network benchmarking, where container services are deployed across different servers. To provide network connectivity for container-based VNFs between different server nodes, inter-node networking is required. According to [ETSI-NFV-IFA-038], there are several technologies to enable inter-node network: overlay technologies using a tunnel endpoint (e.g. VXLAN, IP in IP), routing using Border Gateway Protocol (BGP), layer 2 underlay, direct network using dedicated NIC for each pod, or load balancer using LoadBalancer service type in Kubernetes. Different protocols from these technologies may cause performance differences in container networking.

5. Security Considerations

Benchmarking activities as described in this memo are limited to technology characterization of a Device Under Test/System Under Test (DUT/SUT) using controlled stimuli in a laboratory environment with dedicated address space and the constraints specified in the sections above.

The benchmarking network topology will be an independent test setup and MUST NOT be connected to devices that may forward the test traffic into a production network or misroute traffic to the test management network.

Further, benchmarking is performed on a "black-box" basis and relies solely on measurements observable external to the DUT/SUT.

Special capabilities SHOULD NOT exist in the DUT/SUT specifically for benchmarking purposes. Any implications for network security arising from the DUT/SUT SHOULD be identical in the lab and in production networks.

6. References

6.1. Informative References

[AFXDP]
"AF_XDP", , <https://www.kernel.org/doc/html/v4.19/networking/af_xdp.html>.
[afxdp-cni]
"AF_XDP Plugins for Kubernetes", <https://github.com/intel/afxdp-plugins-for-kubernetes>.
[Calico]
"Project Calico", , <https://docs.projectcalico.org/>.
[Cilium]
"Cilium Documentation", , <https://docs.cilium.io/en/stable//>.
[cilium-benchmark]
Cilium, "CNI Benchmark: Understanding Cilium Network Performance", , <https://cilium.io/blog/2021/05/11/cni-benchmark>.
[CMK]
Intel, "Userspace CNI Plugin", , <https://github.com/intel/CPU-Manager-for-Kubernetes>.
[CNDP]
"CNDP - Cloud Native Data Plane", , <https://cndp.io/>.
[cross-NUMA-vineperf]
Anuket Project, "Cross-NUMA performance measurements with VSPERF", , <https://wiki.anuket.io/display/HOME/Cross-NUMA+performance+measurements+with+VSPERF>.
[Docker-network]
"Docker, Libnetwork design", , <https://github.com/docker/libnetwork/>.
[eBPF]
"eBPF, extended Berkeley Packet Filter", , <https://www.iovisor.org/technology/ebpf>.
[ETSI-NFV-IFA-038]
"Network Functions Virtualisation (NFV) Release 4; Architectural Framework; Report on network connectivity for container-based VNF", .
[ETSI-TST-009]
"Network Functions Virtualisation (NFV) Release 3; Testing; Specification of Networking Benchmarks and Measurement Methods for NFVI", .
[Flannel]
"flannel 0.10.0 Documentation", , <https://coreos.com/flannel/>.
[GLOBECOM-21-benchmarking-kubernetes]
Sridhar, R., Paganelli, F., and A. Morton, "Benchmarking Kubernetes Container-Networking for Telco Usecases", .
[intel-AFXDP]
Karlsson, M., "AF_XDP Sockets: High Performance Networking for Cloud-Native Networking Technology Guide", .
[Intel-EPA]
Intel, "Enhanced Platform Awareness in Kubernetes", , <https://builders.intel.com/docs/networkbuilders/enhanced-platform-awareness-feature-brief.pdf>.
[Intel-SRIOV-NFV]
Patrick, K. and J. Brian, "SR-IOV for NFV Solutions Practical Considerations and Thoughts", .
[LPC18-DPDK-AFXDP]
Karlsson, M. and B. Topel, "The Path to DPDK Speeds for AF_XDP", .
[OVN]
"How to use Open Virtual Networking with Kubernetes", , <https://github.com/ovn-org/ovn-kubernetes>.
[OVS]
"Open Virtual Switch", , <https://www.openvswitch.org/>.
[ovs-dpdk]
"Open vSwitch with DPDK", , <http://docs.openvswitch.org/en/latest/intro/install/dpdk/>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC2544]
Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, , <https://www.rfc-editor.org/rfc/rfc2544>.
[RFC8172]
Morton, A., "Considerations for Benchmarking Virtual Network Functions and Their Infrastructure", RFC 8172, , <https://www.rfc-editor.org/rfc/rfc8172>.
[RFC8204]
Tahhan, M., O'Mahony, B., and A. Morton, "Benchmarking Virtual Switches in the Open Platform for NFV (OPNFV)", RFC 8204, , <https://www.rfc-editor.org/rfc/rfc8204>.
[service-density]
Konstantynowicz, M. and P. Mikus, "NFV Service Density Benchmarking", , <https://tools.ietf.org/html/draft-mkonstan-nf-service-density-00>.
[SR-IOV]
"SRIOV for Container-networking", , <https://github.com/intel/sriov-cni>.
[userspace-cni]
Intel, "CPU Manager for Kubernetes", , <https://github.com/intel/userspace-cni-network-plugin>.
[ViNePERF]
"Project: Virtual Network Performance for Telco NFV", <https://wiki.anuket.io/display/HOME/ViNePERF>.
[vpp]
"VPP with Containers", , <https://fdio-vpp.readthedocs.io/en/latest/usecases/containers.html>.

Appendix A. Benchmarking Experience (Networking Models)

A.1. Benchmarking Environment

This appendix is our IETF Hackathon test's proof-of-concept for the different networking model benchmarking consideration. This appendix can be removed if the document is approved.

In this test, our purpose is to test the performance of different containerized networking acceleration models: User-space, eBPF, and Smart-NIC. The selected solutions for each model are: VPP, AFXDP (both OVS-AFXDP PMD case and AFXDP CNI plugin case), and SR-IOV respectively. The test is set up like below.

o Benchmarking physical servers' specifications

+-------------------+-------------------------+-------------------------+
|     Node Name     |    Specification        |      Description        |
+-------------------+-------------------------+-------------------------+
| Master Node       |- Intel(R) Xeon(R)       | Container Deployment    |
|                   |  Gold 5220R @ 2.4Ghz    |and Network Allocation   |
|                   |  (10 Cores)             |- Centos 7.7             |
|                   |- MEM 128GB              |- Kubernetes Master      |
|                   |- DISK 500GB             |- MULTUS CNI             |
|                   |- Control plane : 1G     |  Userspace CNI          |
|                   |                         |  Kubernetes SRIOV plugin|
|                   |                         |  Kubernetes AFXDP plugin|
+-------------------+-------------------------+-------------------------+
| Worker Node       |- Intel(R) Xeon(R)       | Container Service       |
|                   |  Gold 5220R @ 2.4Ghz    |- Ubuntu 22.04           |
|                   |  (80 Cores)             |  (18.04 fpr VPP test)   |
|                   |- MEM 256G               |- Kubernetes Worker      |
|                   |- DISK 2T                |- Layer 2 Forwarding     |
|                   |- Control plane : 1G     |  DPDK application       |
|                   |- Data plane : XL710-qda2|- MULTUS CNI             |
|                   |  (1NIC 2PORT- 40Gb)     |  Userspace CNI          |
|                   |                         |  Kubernetes SRIOV plugin|
|                   |                         |  Kubernetes AFXDP plugin|
+-------------------+-------------------------+-------------------------+
| Packet Generation |- Intel(R) Xeon(R)       | Packet Generator        |
| Node              |  Gold 6148 @ 2.4Ghz     |- CentOS 7.7             |
|                   |  (2Socket X 20Core)     |- installed Trex 2.4     |
|                   |- MEM 128G               |                         |
|                   |- DISK 2T                | Benchmarking Application|
|                   |- Control plane : 1G     |- T-Rex Non Drop Rate    |
|                   |- Data plane : XL710-qda2|                         |
|                   |  (1NIC 2PORT- 40Gb)     |                         |
+-------------------+-------------------------+-------------------------+
Figure 8: Test Environment-Server Specification

o Benchmarking general architecture


+-------------------------------------------------------------------+
|              Containerized Infrastructure Worker Node             |
| +--------------------------------------------------+              |
| |           POD - Multus CNI                       |              |
| |              (l2fwd)                             |              |
| |         +---------------+                        |              |
| |         |               |                        |              |
| |   +-----v------+    +---v--------+  +----------+ |              |
| |   | Userspace/ |    | Userspace/ |  | Flannel  | |              |
| |   | SRIOV/AFXDP|    | SRIOV/AFXDP|  |          | |              |
| |   |    eth1    |    |    eth2    |  |   eth0   | |              |
| |   +-----^-----=+    +----^-------+  +----------+ |              |
| +---------|----------------|-----------------------+              |
|           |                |                                      |
| +---------v----------------v-----------------------+              |
| |          Different Acceleration Options          |              |
| |       +-------------+      +-------------+       |              |
| |  +----| vhost/memif |------| vhost/memif |-----+ |              |
| |  |    +-------------+      +-------------+     | |              |
| |  |              OVS/VPP vSwitch                | |              |
| |  |                                             | |              |
| |  |    +-------------+      +-------------+     | |              |
| |  +----|  DPDK PMD   |------|  DPDK PMD   |-----+ |              |
| |       +-------------+      +-------------+       |   User Space |
+-|--------------------------------------------------|--------------+
| |                                                  |              |
| |                                                  | Kernel Space |
+-|--------------------------------------------------|--------------+
| |  +---------------------------------------------+ |              |
| |  | +-----------------+     +-----------------+ | |              |
| |  | | VF (SRIOV case) |     | VF (SRIOV case) | | |              |
| |  | +-----------------+     +-----------------+ | |              |
| |  | +-----------------+     +-----------------+ | |              |
| |  | | XDP (AFXDP case)|     | XDP (AFXDP case)| | |              |
| |  | +-----------------+     +-----------------+ | |              |
| |  +---------------------------------------------+ |              |
| +-------^----------------------^-------------------+              |
|         |                      |                       NIC Driver |
+---+ +---v----+            +----v---+ +----------------------------+
    | | PORT 0 |  40G NIC   | PORT 1 | |
    | +---^----+            +----^---+ |
    +-----|----------------------|-----+
    +-----|----------------------|-----+
+---| +---V----+            +----v---+ |----------------------------+
|   | | PORT 0 |  40G NIC   | PORT 1 | |   Packet Generator (Trex)  |
|   | +--------+            +--------+ |                            |
|   +----------------------------------+                            |
+-------------------------------------------------------------------+
Figure 9: Networking Model Test Architecture

Multus CNI is set up to enable attaching different network interfaces to pod. Flannel CNI is used for control plane networking between Kubernetes master and worker node. For user-space networking model, Userspace CNI is used for packet forwarding between pod's interfaces and VPP userspace vSwitch. For eBPF networking model, Kubernetes AFXDP plugin is used for AFXDP CNI plugin case and Userspace CNI is used for OVS vSwitch with AFXDP PMD support case. For Smart-NIC networking model, Kubernetes SR-IOV plugin is used for packet forwarding between pod's veth interfaces and NIC SR-IOV's virtual functions.

Details packet flow of each networking models can be referred from Section 4.1.

A.2. Benchmarking Results

Figure 10 shows our zero packet loss throughput test results with different packet's frame sizes as specified in [RFC2544]. The results show different throughput performances of different networking models. SR-IOV and eBPF using AFXDP-CNI have the best performances, followed by VPP vSwitch and eBPF using OVS-AFXDP. The performance gap between the 2 eBPF model variations might be caused by the limited performance of the vhost-user interface of the OVS vSwitch as mentioned in [GLOBECOM-21-benchmarking-kubernetes].

       +------------+---------------------------------------------------+
       |            |                 Model                             |
       | Frame Size +---------------------------------------------------+
       |            |  Userspace  |   eBPF    |   eBPF    |  Smart-NIC  |
       |            |    (VPP)    |(OVS-AFXDP)|(AFXDP CNI)|  (SR-IOV)   |
       +------------+-------------+-----------+-----------+-------------+
       |    64      |    7.25     |   1.64    |    4.32   |    10.48    |
       +------------+-------------+-----------+-----------+-------------+
       |    128     |    13.32    |   2.69    |    8.32   |    25.37    |
       +------------+-------------+-----------+-----------+-------------+
       |    256     |    19.26    |   3.54    |    14.47  |    30.38    |
       +------------+-------------+-----------+-----------+-------------+
       |    512     |    25.62    |   7.32    |    27.13  |    37.11    |
       +------------+-------------+-----------+-----------+-------------+
       |    1024    |    30.12    |   13.42   |    37.16  |    39.10    |
       +------------+-------------+-----------+-----------+-------------+
       |    1280    |    31.23    |   17.83   |    39.23  |    39.23    |
       +------------+-------------+-----------+-----------+-------------+
       |    1518    |    31.26    |   21.37   |    39.25  |    39.28    |
       +------------+-------------+-----------+-----------+-------------+
Figure 10: Different Networking Models Zero Packet Loss Throughput Test Results (Gbps)

Appendix B. Benchmarking Experience (Resources Configuration in Single Pod Scenario)

This appendix is our IETF Hackathon test's proof-of-concept for the resources configuration benchmarking consideration. This appendix can be removed if the document is approved.

B.1. Benchmarking Environment

In this test, we tested different NUMA and CPU Pinning configurations on VPP user-space networking model. For CPU Pinning configuration test, we deployed a noisy neighbor pod alongside the layer 2 forwarding C-VNF. The noisy neighbor is implemented using CPU stress application. Both pods are managed by the CPU Manager for Kubernetes which is a command-line program that enables CPU core pinning for container-based workloads. For NUMA configuration test, we aligned the C-VNF, vSwitch and NIC interface over 2 NUMA nodes. Different CPU Pinning and NUMA Alignment configuration scenarios are described below.

o Benchmarking physical servers' Specifications: Same as Appendix A

o Benchmarking Architecture

+---------------------------------------------------------------+
|                    Align Pod, vSwitch, NIC in                 |
|                    NUMA node 0 or NUMA node 1                 |
+---------------------------------------------------------------+
|           Containerized Infrastructure Worker Node            |
|        +--------------------------+       +-----------------+ |
|        |         Pod (l2fwd)*     |       | Noisy neighbor* | |
|        |    +-------------+       |       |                 | |
|        |    |             |       |       |                 | |
|        | +--v----+    +---v---+   |       |                 | |
|        | |  eth1 |    |  eth2 |   |       |    Stress-ng    | |
|        | +--^----+    +---^---+   |       |                 | |
|        +----|-------------|-------+       +-----------------+ |
|             |             |                                   |
|        +----v--+      +---v---+                               |
|   +----| memif |------| memif |------+                        |
|   |    +-------+      +-------+      |                        |
|   |            VPP vSwitch           |                        |
|   |                                  |                        |
|   |    +--------+     +-------+      |                        |
|   +----|  PMD   |-----|  PMD  |------+                        |
|        +--^-----+     +-----^-+                    User Space |
+-----------|-----------------|---------------------------------+
|           |                 |                                 |
|           |                 |                    Kernel Space |
+-----------|-----------------|---------------------------------+
|           |                 |                             NIC |
+-----+ +---v----+          +-v------+ +------------------------+
      | | PORT 0 |  40G NIC | PORT 1 | |
      | +---^----+          +-^------+ |
      +-----|-----------------|--------+
      +-----|-----------------|--------+
+-----| +---V----+          +-v------+ |------------------------+
|     | | PORT 0 |  40G NIC | PORT 1 | | Packet Generator (Trex)|
|     | +--------+          +--------+ |                        |
|     +--------------------------------+                        |
+---------------------------------------------------------------+

*- CPU Manager for Kubernetes configured
Figure 11: Resource Configuration Test Architecture in Single Pod scenario

o CPU Pinning Scenarios

Both the C-VNF pod and the noisy neighbor pod are configured with 3 kinds of CMK modes: Disable(no CPU Pinning), Shared (Both pods shared the same assigned CPU cores), Exclusive (Dedicated CPU cores for each pod)

o NUMA Alignment Scenarios

       +----------------------+---------+---------+
       |  Scenario  |   NIC   | vSwitch |   pod   |
       +----------------------+---------+---------+
       |     s1     |  NUMA0  |  NUMA0  |  NUMA0  |
       +----------------------+---------+---------+
       |     s2     |  NUMA0  |  NUMA0  |  NUMA1  |
       +----------------------+---------+---------+
       |     s3     |  NUMA0  |  NUMA1  |  NUMA1  |
       +----------------------+---------+---------+
       |     s4     |  NUMA0  |  NUMA1  |  NUMA0  ||
       +----------------------+---------+---------+
Figure 12: NUMA Alignment Scenarios in Single-Pod scenario

B.2. Benchmarking Results

For the CPU Pinning test, in shared mode, we assigned two CPUs for several PODs. In exclusive mode, we dedicated one CPU for one POD, independently. First, the test was conducted to figure out the line rate of the VPP switch, and the basic Kubernetes performance when CMK is disabled. After that, CMK-Shared mode and CMK-Exclusive mode were applied. During each CPU Pinning scenario test, 4 different NUMA alignment were also applied. The result is shown at Figure 13.

The test results confirm that CPU Pinning can mitigate the effect of the noisy neighbor. Exclusive mode worked better than Shared mode because of its CPU cores dedicated assignment. Regarding NUMA alignment configuration, aligning all of C-VNF, vSwitch and NIC interface in the same NUMA can optimize the network performance (scenario 1). Meanwhile, aligning vSwitch and C-VNF in different NUMA nodes can cause significant throughput degradation. (scenario 2 and 4).

       +--------------------+-------------------------------------------+
       |     CPU Pinning    |         NUMA Alignment Scenarios          |
       |                    +----------+----------+----------+----------+
       |      Scenarios     |    s1    |    s2    |    s3    |    s4    |
       +--------------------+----------+----------+----------+----------+
       |    Without CMK     |   4.78   |   2.34   |   4.39   |   2.41   |
       +--------------------+----------+----------+----------+----------+
       | CMK-Exclusive Mode |   15.63  |   7.67   |   14.33  |   7.84   |
       +--------------------+----------+----------+----------+----------+
       |  CMK-shared Mode   |   11.16  |   5.47   |   10.23  |   5.52   |
       +--------------------+----------+----------+----------+----------+
Figure 13: Different resource configurations 1518-byte packet size's zero packet loss throughput test result in single pod scenario (Gbps)

Appendix C. Benchmarking Experience (Networking Model Combination and Resources Configuration in Multi-Pod Scenario)

This appendix is our IETF Hackathon test's proof-of-concept for the model combination and resources configuration benchmarking considerations. This appendix can be removed if the document is approved.

C.1. Benchmarking Environment

The main goal of this experience was to benchmark the multi-pod scenario, in which packets are traversed through two pods. We conducted two experiments. First, we compared the networking performance between using model combination (SR-IOV-VPP) and only VPP. Second, we evaluated different NUMA alignment configurations in multi-pod case. As there are two pods in this case, NUMA alignment scenarios are different with single-pod case. Meanwhile, because CPU Pinning scenarios are the same, we did not evaluate CPU Pinning in this multi-pod case. Figure 14 is benchmarking architecture in this test, where two pods ran on the same host and vSwitch delivers packets between two pods, and SR-IOV VF handled input/output packets of the worker node. For the only VPP case, VPP vSwitch handled all packet forwarding processes as illustrated in user-space networking acceleration model section.

o Benchmarking physical servers' Specifications: Same as Appendix A

o Benchmarking Architecture

+---------------------------------------------------------------+
|                Align Pod1, Pod2, vSwitch, NIC in              |
|                    NUMA node 0 or NUMA node 1                 |
+---------------------------------------------------------------+
|             Containerized Infrastructure Worker Node          |
|  +--------------------------+    +--------------------------+ |
|  |      Pod1 (l2fwd)        |    |       Pod2 (l2fwd)       | |
|  |    +-------------+       |    |    +-------------+       | |
|  |    |             |       |    |    |             |       | |
|  | +--v----+    +---v---+   |    | +--v----+    +---v---+   | |
|  | |  eth1 |    |  eth2 |   |    | |  eth1 |    |  eth2 |   | |
|  | +--^----+    +---^---+   |    | +--^----+    +---^---+   | |
|  +----|-------------|-------+    +----|-------------|-------+ |
|       |             |                 |             |         |
|       |         +---v---+         +---v---+         |         |
|       |    +----| memif |---------| memif |---+     |         |
|       |    |    +-------+         +-------+   |     |         |
|       |    |            VPP vSwitch           |     |         |
|       |    +----------------------------------+     |    User |
|       |                                             |   Space |
+-------|---------------------------------------------|---------+
|       |                                             |   Kernel|
|       |                                             |   Space |
+-------|---------------------------------------------|---------+
|    +--v---+                                     +---v--+      |
|    | VF0  |             NIC Driver              | VF1  |      |
|    +--|---+                                     +---|--+      |
+-+ +---v----+                                   +----v---+ +---+
  | | PORT 0 |             40G NIC               | PORT 1 | |
  | +---^----+                                   +----^---+ |
  +-----|---------------------------------------------|-----+
  +-----|---------------------------------------------|-----+
+-| +---V----+                                   +----v---+ |---+
| | | PORT 0 |             40G NIC               | PORT 1 | |   |
| | +--------+                                   +--------+ |   |
| +---------------------------------------------------------+   |
|                Packet Generator (T-Rex)                       |
+---------------------------------------------------------------+
Figure 14: Multi-pod Benchmarking Scenario

o NUMA Alignment Scenarios

Based on the results from single-pod case, aligning pod and vSwitch in different NUMA node can cause degraded performance. Hence, in multi-pod case, we did not consider the cases which vSwitch and both pods are aligned to different nodes.

     +----------------------+---------+---------+---------+
     |  Scenario  |   NIC   | vSwitch |  pod1   |  pod2   |
     +----------------------+---------+---------+---------+
     |     s1     |  NUMA0  |  NUMA0  |  NUMA0  |  NUMA0  |
     +----------------------+---------+---------+---------+
     |     s2     |  NUMA0  |  NUMA0  |  NUMA0  |  NUMA1  |
     +----------------------+---------+---------+---------+
     |     s3     |  NUMA0  |  NUMA0  |  NUMA1  |  NUMA0  |
     +----------------------+---------+---------+---------+
     |     s4     |  NUMA0  |  NUMA1  |  NUMA1  |  NUMA1  |
     +----------------------+---------+---------+---------+
     |     s4     |  NUMA0  |  NUMA1  |  NUMA1  |  NUMA0  |
     +----------------------+---------+---------+---------+
     |     s4     |  NUMA0  |  NUMA1  |  NUMA0  |  NUMA1  |
     +----------------------+---------+---------+---------+
Figure 15: NUMA Alignment Scenarios in Multi-Pods scenario

C.2. Benchmarking Results

o Networking Model Combination Performance

The results in Figure 16 confirm that combining Smart-NIC model (SR-IOV) and user-space model (VPP) can enhance the network throughput performance. SR-IOV can improve VPP in terms of north-south traffic as it directly forwards traffic from the NIC to the pod interface. Meanwhile, VPP is better for east-west traffic between pods as packet forwarding is directly handled at user-space.

       +------------+-------------------------+
       |            |          Model          |
       | Frame Size +-------------------------+
       |   (bytes)  |  Userspace  | Combined  |
       |            |    (VPP)    |(SRIOV-VPP)|
       +------------+-------------+-----------+
       |    64      |    7.23     |   9.62    |
       +------------+-------------+-----------+
       |    128     |    13.38    |   15.71   |
       +------------+-------------+-----------+
       |    256     |    19.23    |   23.91   |
       +------------+-------------+-----------+
       |    512     |    25.58    |   31.76   |
       +------------+-------------+-----------+
       |    1024    |    30.07    |   39.15   |
       +------------+-------------+-----------+
       |    1280    |    31.16    |   39.33   |
       +------------+-------------+-----------+
       |    1518    |    31.25    |   39.32   |
       +------------+-------------+-----------+
Figure 16: Networking Model Combination Zero Packet Loss Throughput Test Results (Gbps)

o Different NUMA Alignments Performance in Multi-pod scenario

The results in Figure 17 show that aligning both pods, vSwitch and NIC to the same NUMA node can optimize the network performance (scenario 1). Spliting pods between different NUMA nodes might degrade the performance (scenario 2, 3, 5, 6). Besides, aligning vSwitch and the pod that forward the packet out of the worker node to the same NUMA node might generate better performance (scenario 3, 6).

       +-------------+-----------------------------------------------+
       |             |             NUMA Alignment Scenarios          |
       |             +-------+-------+-------+-------+-------+-------+
       |             |  s1   |  s2   |  s3   |  s4   |  s5   |  s6   |
       +-------------+-------+-------+-------+-------+-------+-------+
       |  Throughput | 39.31 | 23.67 | 29.23 | 37.25 | 23.58 | 29.36 |
       +-------------+-------+-------+-------+-------+-------+-------+

Figure 17: Different resource configurations 1518-byte packet size's zero packet loss throughput test result in single pod scenario (Gbps)

Appendix D. Change Log (to be removed by RFC Editor before publication)

D.1. Since draft-dcn-bmwg-containerized-infra-10

Updated Benchmarking Experience appendixes with latest results from Hackathon events.

Re-orgianized Benchmarking Experience appendixes to match with the the proposed benchmarking consideration inside the draft (Networking Models and Resources Configuration)

Minor enhancement changes to Introduction and Resource Configuration consideration sections such as general description for container network plugin, which resource can also be applied for VM-VNF.

D.2. Since draft-dcn-bmwg-containerized-infra-09

Removed Additional Deployment Scenarios (section 4.1 of version 09). We agreed with reviews from VinePerf that performance difference between with-VM and without-VM scenarios are negligible

Removed Additional Configuration Parameters (section 4.2 of version 09). We agreed with reviews from VinePerf that these parameters are explained in Performance Impacts/Resources Configuration section

As VinePerf suggestion to categorize the networking models based on how they can accelerate the network performances, rename titles of section 4.3.1 and 4.3.2 of version 09: Kernel-space vSwitch model and User-space vSwitch model to Kernel-space non-Acceleration model and User-space Acceleration model. Update corresponding explanation of kernel-space non-Acceleration model

VinePerf suggested to replace the general architecture of eBPF Acceleration model with 3 seperate architecture for 3 different eBPF Acceleration model: non-AFXDP, using AFXDP supported CNI, and using user-space vSwitch which support AFXDP PMD. Update corresponding explanation of eBPF Acceleration model

Renamed Performance Impacts section (section 4.4 of version 09) to Resources Configuration.

We agreed with VinePerf reviews to add "CPU Cores and Memory Allocation" consideration into Resources Configuration section

D.3. Since draft-dcn-bmwg-containerized-infra-08

Added new Section 4. Benchmarking Considerations. Previous Section 4. Networking Models in Containerized Infrastructure was moved into this new Section 4 as a subsection

Re-organized Additional Deployment Scenarios for containerized network benchmarking contents from Section 3. Containerized Infrastructure Overview to new Section 4. Benchmarking Considerations as the Addtional Deployment Scenarios subsection

Added new Addtional Configuration Parameters subsection to new Section 4. Benchmarking Considerations

Moved previous Section 5. Performance Impacts into new Section 4. Benchmarking Considerations as the Deployment settings impact on network performance section

Updated eBPF Acceleration Model with AFXDP deployment option

Enhanced Abstract and Introduction's description about the draft's motivation and contribution.

D.4. Since draft-dcn-bmwg-containerized-infra-07

Added eBPF Acceleration Model in Section 4. Networking Models in Containerized Infrastructure

Added Model Combination in Section 4. Networking Models in Containerized Infrastructure

Added Service Function Chaining in Section 5. Performance Impacts

Added Troubleshooting and Results for SRIOV-DPDK Benchmarking Experience

D.5. Since draft-dcn-bmwg-containerized-infra-06

Added Benchmarking Experience of Multi-pod Test

D.6. Since draft-dcn-bmwg-containerized-infra-05

Removed Section 3. Benchmarking Considerations, Removed Section 4. Benchmarking Scenarios for the Containerized Infrastructure

Added new Section 3. Containerized Infrastructure Overview, Added new Section 4. Networking Models in Containerized Infrastructure. Added new Section 5. Performance Impacts

Re-organized Subsection Comparison with the VM-based Infrastructure of previous Section 3. Benchmarking Considerations and previous Section 4.Benchmarking Scenarios for the Containerized Infrastructure to new Section 3. Containerized Infrastructure Overview

Re-organized Subsection Container Networking Classification of previous Section 3. Benchmarking Considerations to new Section 4. Networking Models in Containerized Infrastructure. Kernel-space vSwitch models and User-space vSwitch models were presented as seperate subsections in this new Section 4.

Re-organized Subsection Resource Considerations of previous Section 3. Benchmarking Considerations to new Section 5. Performance Impacts as 2 seperate subsections CPU Isolation / NUMA Affinity and Hugepages. Previous Section 5. Additional Considerations was moved into this new Section 5 as the Additional Considerations subsection.

Moved Benchmarking Experience contents to Appendix

D.7. Since draft-dcn-bmwg-containerized-infra-04

Added Benchmarking Experience of SRIOV-DPDK.

D.8. Since draft-dcn-bmwg-containerized-infra-03

Added Benchmarking Experience of Contiv-VPP.

D.9. Since draft-dcn-bmwg-containerized-infra-02

Editorial changes only.

D.10. Since draft-dcn-bmwg-containerized-infra-01

Editorial changes only.

D.11. Since draft-dcn-bmwg-containerized-infra-00

Added Container Networking Classification in Section 3.Benchmarking Considerations (Kernel Space network model and User Space network model).

Added Resource Considerations in Section 3.Benchmarking Considerations(Hugepage, NUMA, RX/TX Multiple-Queue).

Renamed Section 4.Test Scenarios to Benchmarking Scenarios for the Containerized Infrastructure, added 2 additional scenarios BMP2VMP and VMP2VMP.

Added Additional Consideration as new Section 5.

Contributors

Kyoungjae Sun - ETRI - Republic of Korea

Email: kjsun@etri.re.kr

Hyunsik Yang - KT - Republic of Korea

Email: yangun@dcn.ssu.ac.kr

Acknowledgments

The authors would like to thank Al Morton for their valuable ideas and comments for this work.

Authors' Addresses

Tran Minh Ngoc
Soongsil University
369, Sangdo-ro, Dongjak-gu
Seoul
06978
Republic of Korea
Phone: +82 28200841
Sridhar Rao
The Linux Foundation
B801, Renaissance Temple Bells, Yeshwantpur
Bangalore 560022
India
Jangwon Lee
Soongsil University
369, Sangdo-ro, Dongjak-gu
Seoul
06978
Republic of Korea
Younghan Kim
Soongsil University
369, Sangdo-ro, Dongjak-gu
Seoul
06978
Republic of Korea