Internet-Draft | bier-rbs | October 2022 |
Eckert, et al. | Expires 27 April 2023 | [Page] |
This memo introduces a compact data-structure representation of multicast trees called "Recursive Bitstring Structure" (RBS) and its use for (stateless) forwarding of packets that include this structure in their header. It is intended as an alternative to "flat" bitstring addresses as used in BIER and BIER-TE or possible forwarding plane variations such as MSR6. RBS aims to improve performance and scalability over flat bitstrings and simplify operations.¶
Operations is simplified because RBS does not require the use, management and optimization of network-wide configured address spaces BIFT-ID and SI and because one common RBS mechanism can replace flat bitstring addresses for both shortest-path forwarding and tree engineered forwarding. It intends to improve performance by allowing multicast to sparse set of receivers in larger networks with fewer packets and it intends to improve scalability by requiring less BIFT state on routers.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 27 April 2023.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This memo introduces a compact data-structure representation of multicast trees called "Recursive Bitstring Structure" (RBS) and its use for (stateless) forwarding of packets that include this structure in their header. It is intended as an alternative to "flat" bitstring addresses in BIER and BIER-TE or their possible variations such as MSR6. RBS aims to improve performance and scalability over flat bitstrings and simplify operations.¶
Operations is simplified because RBS does not require the use, management and optimization of network-wide configured address spaces BIFT-ID and SI and because one common RBS mechanism can replace flat bitstring addresses for both shortest-path forwarding and tree engineered forwarding.¶
This document calls the bitstring addressing used today in BIER and BIER-TE "flat" solely as simple to remember distinction to the "recursive" bitstring addressing used by RBS.¶
The document is structured as follows:¶
The introduction reviews the aspect of BIER and BIER-TE that RBS intends to improve on to then give an introduction to RBS.¶
The architecture section describes the models how RBS can integrate into comprehensive forwarding architectures such as those defined by BIER/BIER-TE.¶
The Overview section explains RBS address encoding and forwarding based on an example¶
The Specification section defines normative requirements of RBS including forwarding Pseudocode.¶
The section on using RBS with BIER and RFC8296 encapsulation describes proposed normative aspects when applying RBS to BIER.¶
Appendices discuss High-speed implementation considerations and current insight into how well RBS can help to reducing the number of of packet required to be sent with RBS.¶
In BIER, each bit of a bitstring indicates a BIER egress router (BFER). When using [RFC8296] encapsulation, BitString can have a BitStringLength (BSL) of 2^N, N=6...12. Routers may not be able to support up to 2^12=496 long BitStrings. The most common BSL assumed to be supported is 256.¶
When a network has a number M of BFER, M >> BSL support by routers in the network, it it necessary to use multiple "sets" of BitStrings across the network to address all BFER. Each set has a SetIdentifier (SI). BFER are identified in BIER via their BFR-id which is (SI << N | 2^BP ), where BP is the BitPosition, the bit in the BitString used for this BFER: the lower N bits of the BFR-id are the BP of the BFER and the high order bits the SI. In [RFC8279] this is also shown as (SI:BitString), where the BitString has only the BP of the BFER set.¶
When a network requires k SI to address all BFER, then a message that needs to be sent to k arbitrary BFER in the network may require to send as many as k BIER packets - when each of the k BFER has a different SI. The total number of packets required for any possible set of receiver BFER is a stochastic matter. At best, BIER can reduce the number of packet required to reach M BFER to M/BSL.¶
Intelligent allocation of BFR-id can lead to a more efficient delivery of BIER traffic. Consider a network requiring k SI and random allocation of BFR-id so that every edge area of the network has at least one BFR in each of the k BFR. This makes it more likely that up to k BIER packets need to be sent into such an area to reach subsets of BFR in it. Compare this to an allocation that attempts to minimize the number of SI used by BFR in each individual area. This will result in fewer BIER packets required to reach subsets of BFR in such an area.¶
Whereas BIER relies on hop-by-hop routing to direct its traffic, Tree Engineering for BIER (BIER-TE, [RFC9262]) is based on explicit source routing by encoding the whole delivery tree in a BitString. This is done to support similar type of requirements as those that require explicit source routing in IP unicast, also called traffic steering, such as SRH in IPv6, but also multicast specific ones, such as lowest cost trees (so-called Steiner trees).¶
BIER-TE was designed to reuse the packet encodings of BIER and as much as feasible of the BIER forwarding plane. It therefore relies on the same type of flat BitStrings (and their addressing via SI) as BIER. In BIER-TE, each bit of a BitString indicates an adjacency. In the most simple case those adjacencies are subnet adjacent BFR for the edges of the multicast tree which are called forward_connected() in BIER-TE, and local_decap() adjacencies for the leaves of the tree - effectively its BFER.¶
Because BIER-TE needs to represent the whole delivery tree and not only its leaves/BFER in the BitString, intelligent and efficient allocation of BP is even more important than in BIER, and a significant number of BP in every SI must be allocated to transit hops of the network to allow defining BIER-TE trees across those transit hops. In large networks this may also lead to the need to allocate BP across multiple SI for the same transit hops and thus a much larger total number of BP required to represent a smaller number of BFER and transit hop adjacencies - and in result also more BIER-TE packets required for the same message to send to the larger number of different SI required.¶
One way to understand the Recursive BitString Structure address is to think of it as an evolution of BIER-TE flat bitstrings. Every single BFR processing a BIER-TE bitstring only needs to look at a small subset of the BP in it: those BP that indicate adjacencies of this BFR. All the other bits are ignored because they are adjacencies on other BFR.¶
Consider we decompose a BIER-TE BitString into separate smaller bitstrings - one each for every BFR on the multicast tree that we want to encode. The BitString for each BFR now only needs to have a BP for each subnet adjacent (neighbor) BFR. And an equivalent to the local_decap() BP to indicate to the BFR itself to be a leaf/BFER on the multicast tree itself.¶
With this step, RBS eliminates the complex optimization problems resulting from the flat BitStrings: There is no need anymore for a network wide SI address space and optimizing which adjacencies and BFR-id to put into which SI. There is no hard partitioning by SI: A tree can span an arbitrary subset of BFR. Just the total encoded size of the tree needs to fit into an appropriate maximum header field size. And if the target tree is too large, then it can arbitrarily be broken up into overlapping subtrees - all originating at the sender, but each only delivering to a subset of BFER small enough so the encoded tree fits into the target packet header size. And none of these optimization have to happen at network configuration time by seeding optimized BIFT, but it happens when building an RBS address on on ingress router or with the help of a controller.¶
The RBS encoding is called recursive, because it consists of such a local BitString for the first BFR of the tree (BFIR), followed by a sequence of RBS sub-trees, one for each adjacent BFR whose BP is set in the first BFR BitString. Whenever a packet is forwarded to such an adjacent BFR, the RBS addressing information is appropriately updated such that that BFR will only have to examine the local BitString for that BFR. In result, every BFR in RBS only has to examine - like in BIER and BIER-TE a single BitString. What really changes is that instead of clearing bits in a flat bitstring as done in BIER/BIER-TE, every hop advances the decoding offset into the RBS address structure to look at a different, small local BitString.¶
RBS may simply use the same network architecture as BIER-TE as shown in Figure 1, and operations of the Controller is significantly simplified because the complex allocation of BP across SI, especially the allocation of BP for transit adjacencies is eliminated.¶
Instead of a controller centric network architecture, RBS also lends itself to a distributed/decentralized model, similar to the typical deployment model of BIER, with extensions to a link-state IGP routing protocol.¶
Every BFR can automatically allocate its BFR neighbors and derive the size of its local BitString and allocation of BP to each neighbor from it. This is then signaled via appropriate link-state IGP extensions.¶
BFIR can derive not only the hop-by-hop paths towards BFER from this IGP information, but also the necessary local BitString for each BFR on a tree. In the most simple csae, these paths are the shorted-paths normally calculated by link-state IGP, but for traffic-engineering purposes, this can easily include all type of constrained path calculations.¶
It is this model that would be attractive, when there are no tree engineering / traffic engineering requirements, but RBS is simply used to replace flat bitstrings for BIER to simplify its operations and (depending on size / topology of network) improve its scale / performance.¶
To eliminate the need for an IP Multicast flow overlays and allow utilization of benefits of bitstring addressing at the application level (e.g.: eliminating group membership management for the network), the RBS domain may extend all the way into Sender/Receiver hosts. This is possible with BIER/BIER-TE as well, but when the total number of sender/receiver hosts is for example a factor 10 larger than the number of BFR in BIER, then the elimination of the network wide SI/BP allocation issue of BIER/BIER-TE could help to make this model easier deployable with RBS than with BIER-TE/BIER.¶
To avoid dependencies against initial operating system level network stack upgrades on hosts, such deployment option could for example be introduced by tunneling the RBS packets over UDP to first-hop BFIR/BFER as well as appropriate routing / controller plane extensions.¶
This section gives a more thorough run run-through of the life of a packet forwarded with an RBS address.¶
Figure 2 shows the example network topology. R1 has network connections to R2, R3, R4, R5 (not shown) and R6. R3 and R4 have connections to R1, R7, R8, R9 and R10. R9 has connections to R3, R4, and further, not shown routers. For the purpose of explaining RBS, it is irrelevant whether those connections are separate L2 point-to-point links or adjacencies on shared LANs.¶
The example multicast tree encoded as an RBS address utilizing topology information as explained in Section 2 is R1 -> (R2, R3 -> (R7), R4 -> (R9 -> (...), R10), R6): The packet originates in the trees root R1, which needs to form the appropriate packet header with this trees RBS address and replicate the packet to R2, R3, R4 and R6. R2, R4 and R6 should receive the packet as domain egress routers. R3 should only replicate the packet to R7, and R7 should replicate the packet to R9 and R10. R10 should only receive the packet, R9 should receive the packet and further replicate it to further routers not shown.¶
R7, R10 and some MSER behind R9. Given how R7, R8, R8, R10 and the router behind R9 can be reached via both either R3 and R4, this tree involves an explicit packet steering and replication (tree engineering) choice of using R3 instead of R4 to reach R7, and R4 instead of R3 to reach R9, R10 (and routers below R9).¶
Every router has an RBS "Bit Index Forwarding Table" (RBS-BIFT) that defines which BitPosition (BP) (1..N) indicates which adjacency. Figure 3, shows the example RBS-BIFT for R1.¶
The "receive" adjacency is the BP indicating that the packet is to be received by the router itself. The (R)ecursive flag indicates whether the adjacency when set in the BitString of an RBS address will have a subtree (Recursive Unit, see below) associated with it.¶
The absence of the R flag allows for more compact RBS encodings or adjacencies that for the purpose of RBS are not used for transit. In the example, R2, R5 and R6 are connected to R1 but also leaf router in the topology. Hence they have R=0 in the RBS-BIFT of R1.¶
The RBS address as shown in Figure 4 consists of RU-Length, RU-Offset and RecursiveUnit0. Depending on packet header encoding, these fields do not need to be encoded sequentially.¶
A RecursiveUnit (RU) is the unit of data processed by a particular router Rx on the multicast tree encoded by the RBS address. RU0 is the RU processed by the root of the tree. An RU consists of the BitString whose length is the length of the RBS-BIFT of Rx, followed by (N-1) AddressFields and N RUs. N is the number of BP set in BitString with R=1 flag set - e.g. which do need an RU in the RBS address. Each AddressField indicates the length of one RU. There are only N-1 AF for N RU because the length of the N'th RU can be derived by calculation, saving for every router on the tree one AF field, and therefore resulting in a more compact encoding.¶
RU-Offset indicates the offset of the current RU from the start of RU0. '$' in Figure 4 is the first bit of RU0, and a value of RU-Offset=0 indicates that the RU starts at '$' - and is therefore RU0.¶
For every copy of an RBS packet made by a router, RU-Offset and RU-Length are updated. None of the other fields of the RBS-Address are modified for RBS forwarding.¶
In the example, the root of the tree is is R1, so the BitString of RU0 is as shown in Figure 5¶
When RBS forwarding in a router processes the RBS address, the length of the BitString is known from the length of the RBS-BIFT. In the case of R1 it is therefore known to be 6 bits long.¶
Two of the BP set in the BitString, BP3 for R3 and for R4 have R=1 in the RBS-BIFT of R1, therefore (N-1)=1 AF field must follow and N=2 RU must follow in the RBS address for RU0 - one for R3, one for R4.¶
When R1 creates packet copies to R3 and R4, it will rewrite RU-Length and RU-Offset accordingly, so that RU-offset will point to RU1 for the packet towards R3 and to RU2 for the packet towards R4, and RU-Length indicates the length of RU1 or RU2.¶
This forwarding process repeats on every hop along the tree. When a packet copy is made on a BP with R=0, RU-Length is set to 0. When such a packet copy is received, it indicates that no further RU lookup is required, and the packet is only received - same as processing for a receive BU.¶
Any RBS router MUST support to parse its RU with AF entries that are 8 bit in size. Any RBS routers SHOULD support to decode a variable length AF encoding, where 0XXXXXXX (8-bit length AF field) is used to encode a 7-bit XXXXXXX (0..127) values, and where 1XXXXXXXXXXXX is used to encode an 12-bit value XXXXXXXXXXX. All values indicate the size of an RU in bits, therefore allowing up to 4096 bit long RU.¶
An RBS router MUST support processing the BitString size of its configured RBS-BIFT (see Section 4.2).¶
RBS routers MUST support RU-Length and RU-Offset encodings of 12 bits.¶
An Router must support for its RBS-BIFT to be configured with a number of entries ranging between min(k,1024), where k is an implementation specific number, no less than the number of physical interfaces on the router.¶
The leftmost bit in an RBS RU Bitstrings is RBS-BIFT entry 1.¶
The type of adjacencies to be supported depend on the encapsulation and are out of scope.¶
Upon creation of the RBS header with an RBS-Address, RU-Length MUST be set to the length of RU0 and RU-offset is set to 0.¶
Whether a header with an RBS address is created/imposed on the root of an RBS tree or received from another RBS router, encapsulation independent processing of the packet by RBS forwarding is the same.¶
Parsing RBS-Address, generating copies and rewriting RU-Length and RU-Offset for each copy is formally described in Section 4.7¶
When a packet copy is received with RU-Length=0, the packet is "received" - it is passed up the stack to an appropriate receiving entity based on the encapsulation parameters.¶
When a packet copy is made for a receive BP, its RU-Length is set to 0 and the packet is processed as if it was received with RU-Length=0.¶
The total length of an RBS address is not included in the definition of an RBS address here. This length is assumed to be provided by some other packet header field, because it is not required to parse an RBS address itself, but is only required to parse beyond an RBS address in a packet header by an RBS unaware parser. The field that carries RU0 may be larger (for example due to padding) than RU0 itself without problems for the RBS parsing/processing described here.¶
Additional forwarding rules may be established by specific encapsulations such as BIER OAM processing steps when using BIER with RFC8296 encapsulation.¶
Given how the processing of the RBS address describes a naturally loop-free rewrite operation, no further loop-prevention mechanism is required in packet processing with RBS addresses, but no harm is done if this is still performed (on appropriate header TTL fields independent of RBS).¶
The following RBS forwarding (derived from C language) pseudocode assumes all pointers (and de-referencing them) are using bit-accurate addresses so that of calculation of starting bit addresses of address fields and RU in RU0 can be shown with as simple code as if byte addressing for pointers was used. byte addressing of pointers was used. This is NOT supported by C language!¶
ForwardRBS(Packet) processes the RBS address independent of its encapsulation. ParseRBSAddress(Packet) parses the header of Packet to create a list of bit-accurate pointers to the elements of an RBS address: RBS.{RULength,RUOffset,RU0}.¶
BitStringA is the address of the RBS address BitString in Packet. Other variables use names matching those from the packet header figures (without " -_").¶
The BFR local BIFT has a total number of BIFT.entries addressable BP 1...BIFTentries. The BitString therefore has BIFT.entries bits.¶
BIFT.RecursiveBits is a BitString pre-filled by the control plane with all the BP with the recursive flag set. This is constructed from the Recursive flag setting of the BP of the BIFT. The code starting at [1] therefore counts the number of recursive BP in the packets BitString.¶
Because the AddressingField does not have an entry for the last (potentially only) RecursiveUnit, its length has to be calculated By subtracting the length of the prior N-1 RecursiveUnits from RULength as received. This is done via variable RemainLength.¶
For every PacketCopy that is to be forwarded, the RU-Length and RU-Offset fields are updated. SendTo(packet,adjacency) takes care of any encapsulation/architecture specific further rewrites of the packet headers based on the adjcency of the BP.¶
RBS can be used in a BIER domain by introducing as a per-subdomain mode of forwarding, exactly the same as [RFC8279] (non-TE) BIER and [RFC9262] BIER-TE can co-exist in a BIER.¶
In BIER deployments, RBS can most easily replace BIER-TE, and using a centralized controller and therefore simplify and easier scale deployment of tree engineering. RBS should also be able to replace BIER in networks with link-state routing protocols and reduce the number of replicated packets in large networks. This requires as aforementioned the change from hop-by-hop routing to source-routing.¶
When using BIER, RBS routers are BFR, RBS ingress routers are BFIR, RBS egress routers are BFER. Routers may support multiple RBS-BIFT through different BIFT-ID or SI. This may be useful when specific constructs such as slices of the network are only allowed to use a subset of the adjacencies of the network.¶
The RBS address is encoded as a binary string concatenating {RULenth,RUOffset,RU0} into the BitString field in [RFC8296] packet headers. Without changes to [RFC8296], the length of this field has to be a power of 2 sized. The RBS address SHOULD be zero-padded to the size used.¶
In BIER, the BitStringLength (BSL) expects to indicate different BIFT. When using RBS addresses, it SHOULD be possible for all supported BSL to refer to the same RBS-BIFT, so that upon imposition of an RBS-Address the smallest power of 2 BitString size can be used without duplication of BIFT state on routers.¶
SendTo() of RBS forwarding pseudocode Section 4.7 needs to take care of any BIER specific rewrites of the [RFC8296] BIER header, specifically the BIFT-id derived from the RBS-BIFT adjacency.¶
TBD: This description does not outline, how much of existing BIER IGP extensions could be reused with RBS and how.¶
Solutions for stateless IP multicast have been proposed to the IETF under the name Multicast Source Routing for IPv6 (MSR6). They are based on putting the stateless multicast structures into an IPv6 routing extension headers and using per-steering-hop rules according to or derived from [RFC8200], Section 4.4 rules. IPv6 forwarding") for source routing.¶
SendTo() of RBS forwarding pseudocode Section 4.7 needs to take care of any IPv6 specific rewrites of the IPv6 header (IPv6 Destination address) and IPv6 routing extension header.¶
For complete support of IPv6 multicast with [RFC8200] compliant source routing, it is necessary for the IPv6 routing extension header to not only carry the RBS address information but also an IPv6 multicast destination address field.¶
[I-D.eckert-msr6-rbs] is a proposed IPv6 extension header design for MSR6 using RBS that is supporting IPv6 multicast.¶
This work is based on the design published by Sheng Jiang, Xu Bing, Yan Shen, Meng Rui, Wan Junjie and Wang Chuang {jiangsheng|bing.xu|yanshen|mengrui|wanjunjie2|wangchuang}@huawei.com, see [CGM2Design]. Many thanks for Bing Xu (bing.xu@huawei.com) for editorial work on the prior variation of this work [I-D.xu-msr6-rbs].¶
[RFC-editor: please remove]¶
This document is written in https://github.com/cabo/kramdown-rfc2629 markup language. This documents source is maintained at https://github.com/toerless/multicast-rbs, please provide feedback to the bier@ietf.org and/or msr6@ietf.org mailing list and submit an Issue to the GitHub.¶
This draft is derived from and supercedes [I-D.eckert-bier-cgm2-rbs] as follows:¶
RBS was designed with high-speed, low-cost forwarding hardware and possible backward compatibility with potentially existing flat-bitstring look-up and replication hardware in mind.¶
Because RBS requires to only perform replication on each router on a single bitstring, it could be possible to reuse existing bitstring replication hardware, or design future hardware such that it supports BIER, BIER-TE and RBS bitstrings.¶
The calculations required to process an RBS header are the added complexity of processing RBS packets are the additional new cost of RBS. It has to be seen whether / how these are feasible at the low-end of high-speed forwarding plane hardware, especially P4. Further optimizations to reduce calculations are possible, but at the expense of compression of the RBS address.¶
RBS also minimizes write-cycles to packet memory by only requiring per-packet-copy rewrites of the RU-Length and RU-Offset fields. With mandatory encoding of 12 bits each, these are 24 bits to rewrite and should therefore be causing minimal cost with today's high-speed forwarding hardware.¶
TBD: Need to rewrite more elaborate multi-hop example from [I-D.eckert-bier-cgm2-rbs] with RU-Offset, RU-Length.¶
This section discusses in more detail the number of packets required to reach a set of receivers when using flat bitstrings vs. RBS addresses. The first sub-section gives a hopefully simple to understand theoretical topology example, the second sub-section presents initial results of a real-world, large-network simulation.¶
If the total size of an RBS encoded delivery tree is larger than a supported maximum RBS header size, then the controller simply needs to divide the tree into multiple subtrees, each only addressing a part of the BFER (leaves) of the target tree and pruning any unnecessary branches.¶
Consider the simple topology in Figure 7 and a multicast packet that needs to reach all BFER B7...B300. Assume that the desired maximum RBM header size is such that a RBS address size of <= 256 bits is desired. The controller could create an RBS address B1=>B2=>B4=>(B7..B99), for a first packet, an RBS address B1=>B3=>B5=>(B100..B200) for a second packet and a third RBS address B1=>B3=>B6=>B201...B300.¶
The elimination of larger BIFT state in BFR through multiple SI in BIER/BIER-TE does come at the expense of replicating initial hops of a tree in RBS addresses, such as in the example the encoding of B1=>B3 in the example.¶
Consider that the assignment of BFIR-ids with BIER in the above example is not carefully engineered. It is then easily possible that the BFR-ids for B7..B99 are not sequentially, but split over a larger BFIR-id space. If the same is true for all BFER, then it is possible that each of the three BFR B4,B5 and B6 has attached BFER from three different SI and one may need to send for example three multiple packets to B7 to address all BFER B7..B99 or to B5 to address all B100..B200 or B6 to address all B201...B300. These unnecessary duplicate packets across B4, B5 or B6 are because of the addressing principle in BIER and are not necessary in RBS, as long as the total length of an RBS address does not require it.¶
TBD: Comparison of number of packets/header sizes required in large real-world operator topology between BIER/BIER-TE and RBS. Analysis: Gain in dense topology¶
Topology description: 1. Typical topology of Beijing Mobile in China. 2. All zones dual homing access to backbone. 3. Core layer: 4 nodes full mesh connected 4. Aggregation layer: 8 nodes are divided into two layers, with full interconnection between the layers and dual homing access to the core layer on the upper layer. 5. Aggregation rings: 8 rings, 6 nodes per ring 6. Access rings: 3600 nodes, 18 nodes per ring¶
Comparison notes:¶
Results are shown in the following image: https://user-images.githubusercontent.com/92767820/153332926-defe38e4-1b63-4b16-852f-feaae487d307.png¶
Conclusions:¶