Internet-Draft | mvpn-evpn-aggregation-label | October 2023 |
Zhang, et al. | Expires 6 April 2024 | [Page] |
The MVPN specifications allow a single Point-to-Multipoint (P2MP) tunnel to carry traffic of multiple IP VPNs (abbreviated as VPNs). The EVPN specifications allow a single P2MP tunnel to carry traffic of multiple Broadcast Domains (BDs). These features require the ingress router of the P2MP tunnel to allocate an upstream-assigned MPLS label for each VPN or for each BD. A packet sent on a P2MP tunnel then carries the label that is mapped to its VPN or BD (in some cases, a distinct upstream-assigned label is needed for each flow.) Since each ingress router allocates labels independently, with no coordination among the ingress routers, the egress routers may need to keep track of a large number of labels. The number of labels may need to be as large (or larger) than the product of the number of ingress routers times the number of VPNs or BDs. However, the number of labels can be greatly reduced if the association between a label and a VPN or BD is made by provisioning, so that all ingress routers assign the same label to a particular VPN or BD. New procedures are needed in order to take advantage of such provisioned labels. These new procedures also apply to Multipoint-to-Multipoint (MP2MP) tunnels. This document updates RFCs 6514, 7432 and 7582 by specifying the necessary procedures.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 6 April 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Familiarity with MVPN/EVPN protocols and procedures is assumed. Some terminologies are listed below for convenience.¶
MVPN can use P2MP tunnels (set up by RSVP-TE, mLDP, or PIM) to transport customer multicast traffic across a service provider's backbone network. Often, a given P2MP tunnel carries the traffic of only a single VPN. There are however procedures defined that allow a single P2MP tunnel to carry traffic of multiple VPNs. In this case, the P2MP tunnel is called an "aggregate tunnel". The PE router that is the ingress node of an aggregate P2MP tunnel allocates an "upstream-assigned MPLS label" [RFC5331] for each VPN, and each packet sent on the P2MP tunnel carries the upstream-assigned MPLS label that the ingress PE has bound to the packet's VPN.¶
Similarly, EVPN can use P2MP tunnels (set up by RSVP-TE, mLDP, or PIM) to transport BUM traffic (Broadcast traffic, Unicast traffic with an Unknown address, or Multicast traffic), across the provider network. Often a P2MP tunnel carries the traffic of only a single BD. However, there are procedures defined that allow a single P2MP tunnel to be an "aggregate tunnel" that carries traffic of multiple BDs. The procedures are analogous to the MVPN procedures -- the PE router that is the ingress node of an aggregate P2MP tunnel allocates an upstream-assigned MPLS label for each BD, and each packet sent on the P2MP tunnel carries the upstream-assigned MPLS label that the ingress PE has bound to the packet's BD.¶
MVPN and EVPN can also use BIER [RFC8279] to transmit VPN multicast traffic or EVPN BUM traffic [RFC8556] [I-D.ietf-bier-evpn]. Although BIER does not explicitly set up P2MP tunnels, from the perspective of MVPN/EVPN, the use of BIER transport is very similar to the use of aggregate P2MP tunnels. When BIER is used, the PE transmitting a packet (the "BFIR" [RFC8279]) must allocate an upstream-assigned MPLS label for each VPN or BD, and the packets transmitted using BIER transport always carry the label that identifies their VPN or BD. (See [RFC8556] and [I-D.ietf-bier-evpn] for the details.) In the remainder of this document, we will use the term "aggregate tunnels" to include both P2MP tunnels and BIER transport.¶
When an egress PE receives a packet from an aggregate tunnel, it must look at the upstream-assigned label carried by the packet, and must interpret that label in the context of the ingress PE. Essentially, for each ingress PE, the egress PE has a context-specific label space [RFC5331] that matches the default label space from which the ingress PE assigns the upstream-assigned labels. When an egress PE looks up the upstream-assigned label carried by a given packet, it looks it up in the context-specific label space for the ingress PE of the packet. How an egress PE identifies the ingress PE of a given packet depends on the tunnel type.¶
Note that the upstream-assigned label procedures may require a very large number of labels. Suppose an MVPN or EVPN deployment has 1001 PEs, each hosting 1000 VPN/BDs. Each ingress PE has to assign 1000 labels, and each egress PE has to be prepared to interpret 1000 labels from each of the ingress PEs. Since each ingress PE allocates labels from its own label space and does not coordinate label assignments with others, each egress PE must be prepared to interpret 1,000,000 upstream-assigned labels (across 1000 context-specific label spaces - one for each ingress PE). This is an evident scaling problem.¶
So far, few if any MVPN/EVPN deployments use aggregate tunnels, so this problem has not surfaced. However, the use of aggregate tunnels is likely to increase due to the following two factors:¶
A similar problem also exists with EVPN ESI labels used for multi-homing. A PE attached to a multi-homed Ethernet Segment (ES) advertises an ESI label in its Ethernet A-D per Ethernet Segment Route. The PE imposes the label when it sends frames received from the ES to other PEs via a P2MP/BIER tunnel. A receiving PE that is attached to the source ES will know from the ESI label that the packet originated on the source ES, and thus will not transmit the packet on its local attachment circuit to that ES. From the receiving PE's point of view, the ESI label is (upstream-)assigned from the source PE's label space, so the receiving PE needs to maintain context-specific label tables, one for each source PE, just like the VRF/BD label case above. If there are 1,001 PEs, each attached to 1,000 ESes, this can require each PE to understand 1,000,000 ESI labels. Notice that the issue exists even when no P2MP tunnel aggregation (i.e. one tunnel used for multiple BDs) is used.¶
The number of labels could be greatly reduced if a central entity in the provider network assigned a label to each VPN, BD, or ES, and if all PEs used that same label to represent a given VPN , BD, or ES. Then the number of labels needed would just be the sum of the number of VPNs, BD, and/or ESes.¶
One method of achieving this is to reserve a portion of the default label space for assignment by a central entity. We refer to this reserved portion as the "Domain-wide Common Block" (DCB) of labels. This is analogous to the identical "Segment Routing Global Block" (SRGB) on all nodes that is described in [RFC8402]. A PE that is attached (via L3VPN VRF interfaces or EVPN Access Circuits) would know by provisioning which label from the DCB corresponds to which of its locally attached VPNs, BDs, or ESes.¶
For example, all PEs could reserve a DCB [1000, 2000] and they are all provisioned that label 1000 maps to VPN 0, 1001 to VPN 1, and so forth. Now only 1000 labels instead of 1,000,000 labels are needed for 1000 VPNs.¶
The definition of "domain" is loose - it simply includes all the routers that share the same DCB. In this document, it only needs to include all PEs of an MVPN/EVPN network.¶
The "domain" could also include all routers in the provider network, making it not much different from a common SRGB across all the routers. However, that is not necessary as the labels used by PEs for the purposes defined in this document will only rise to the top of the label stack when traffic arrives at the PEs. Therefore, it is better to not include internal P routers in the "domain". That way they do not have to set aside the same DCB used for the purposes in this document.¶
In some deployments, it may be impractical to allocate a DCB that is large enough to contain labels for all the VPNs/BDs/ESes. In this case, it may be necessary to allocate those labels from one or a few separate context-specific label spaces independent of each PE. For example, if it is too difficult to have a DCB of 10,000 labels across all PEs for all the VPNs/BDs/ESes that need to be supported, a separate context-specific label space can be dedicated to those 10,000 labels. Each separate context-specific label space is identified in the forwarding plane by a label from the DCB (which does not need to be large). Each PE is provisioned with the label-space-identifying DCB label and the common VPN/BD/ES labels allocated from that context-specific label space. When sending traffic, an ingress PE imposes all necessary service labels (for the VPN/BD/ES) first, then imposes the label-space-identifying DCB label. From the label-space-identifying DCB label an egress PE can determine the label space where the inner VPN/BD/ES label is looked up.¶
The MVPN/EVPN signaling defined in [RFC6514] and [RFC7432] assumes that certain MPLS labels are allocated from a context-specific label space for a particular ingress PE. In this document, we augment the signaling procedures so that it is possible to signal that a particular label is from the DCB, rather than from a context-specific label space for an ingress PE. We also augment the signaling so that it is possible to indicate that a particular label is from an identified context-specific label space that is not for an ingress PE.¶
Notice that, the VPN/BD/ES-identifying labels from the DCB or from those few context-specific label spaces are very similar to VNIs in VXLAN. Allocating a label from the DCB or from a context-specific label spaces and communicating them to all PEs is not different from allocating VNIs, and is feasible especially with controllers.¶
MP2MP tunnels present the same problem (Section 2.1) that can be solved the same way (Section 2.2), with the following additional requirement.¶
Per RFC 7582 ("MVPN: Using Bidirectional P-tunnels"), when MP2MP tunnels are used for MVPN, the root of the MP2MP tunnel may need to allocate and advertise "PE Distinguisher Labels" (section 4 of [RFC6513]. These labels are assigned from the label space used by the root node for its upstream-assigned labels.¶
It is REQUIRED by this document that the PE Distinguisher labels allocated by a particular node come from the same label space that the node uses to allocate its VPN-identifying labels.¶
There are some additional issues to be considered when MVPN or EVPN is using "tunnel segmentation" (see [RFC6514], [RFC7524], and [I-D.ietf-bess-evpn-bum-procedure-updates] Sections 5 and 6).¶
For "selective tunnels" that instantiate S-PMSIs (see [RFC6513] Sections 2.1.1 and 3.2.1, and [I-D.ietf-bess-evpn-bum-procedure-updates] Section 4), the procedures outlined above work only if tunnel segmentation is not used.¶
A selective tunnel carries one or more particular sets of flows to a particular subset of the PEs that attach to a given VPN or BD. Each set of flows is identified by a Selective PMSI A-D route [RFC6514]. The PTA of the S-PMSI route identifies the tunnel used to carry the corresponding set of flows. Multiple S-PMSI routes can identify the same tunnel.¶
When tunnel segmentation is applied to an S-PMSI, certain nodes are "segmentation points". A segmentation point is a node at the boundary between two "segmentation regions". Let's call these "region A" and "region B". A segmentation point is an egress node for one or more selective tunnels in region A, and an ingress node for one or more selective tunnels in region B. A given segmentation point must be able to receive traffic on a selective tunnel from region A, and label switch the traffic to the proper selective tunnel in region B.¶
Suppose one selective tunnel (call it T1) in region A is carrying two flows, Flow-1 and Flow-2, identified by S-PMSI route Route-1 and Route-2, respectively. However, it is possible that, in region B, Flow-1 is not carried by the same selective tunnel that carries Flow-2. Let's suppose that in region B, Flow-1 is carried by tunnel T2 and Flow-2 by tunnel T3. Then, when the segmentation point receives traffic from T1, it must be able to label switch Flow-1 from T1 to T2, while also label switching Flow-2 from T1 to T3. This implies that Route-1 and Route-2 must signal different labels in the PTA. For comparison, when segmentation is not used, they can all use the common per-VPN/BD DCB label.¶
In this case, it is not practical to have a central entity assign domain-wide unique labels to individual S-PMSI routes. To address this problem, all PEs can be assigned disjoint label blocks in those few context-specific label spaces, and each will independently allocate labels for segmented S-PMSI from its assigned label block that is different from any other PE's. For example, PE1 allocates from label block [101~200], PE2 allocates from label block [201~300], and so on.¶
Allocating from disjoint label blocks can be used for VPN/BD/ES labels as well, though it does not address the original scaling issue, because there would be one million labels allocated from those few context label spaces in the original example, instead of just one thousand common labels.¶
Similarly, for segmented per-PE (MVPN (C-*,C-*) S-PMSI or EVPN IMET) or per-AS/region (MVPN Inter-AS I-PMSI or EVPN per-Region I-PMSI) tunnels ([RFC6514] [RFC7432] [I-D.ietf-bess-evpn-bum-procedure-updates]), labels need to be allocated per PMSI route. In case of per-PE PMSI route, the labels should be allocated from the label block allocated to the advertising PE. In case of per-AS/region PMSI route, different ASBR/RBRs (Regional Border Routers) attached to the same source AS/region will advertise the same PMSI route. The same label could be used when the same route is advertised by different ASBRs/RBRs, though that requires coordination and a simpler way is for each ASBR/RBR to allocate a label from the label block allocated to itself (see Section 2.2.2.1).¶
In the rest of the document, we call the label allocated for a particular PMSI a (per-)PMSI label, just like we have (per-)VPN/BD/ES labels. Notice that using per-PMSI label in case of per-PE PMSI still has the original scaling issue associated with the upstream-assigned label, so per-region PMSIs is preferred. Within each AS/region, per-PE PMSIs are still used though they do not go across border and per-VPN/BD labels can still be used.¶
Note that, when a segmentation point re-advertises a PMSI route to the next segment, it does not need to re-advertise a new label unless the upstream or downstream segment uses Ingress Replication.¶
The per-PMSI label allocation in case of segmentation, whether for S-PMSI or for per-PE/Region I-PMSI, is for the segmentation points to be able to label switch traffic without having to do IP or MAC lookup in VRFs (the segmentation points typically do not have those VRFs at all). If the label scaling becomes a concern, alternatively the segmentation points could use (C-S,C-G) lookup in VRFs for flows identified by the S-PMSIs. This allows the S-PMSIs for the same VPN/BD to share a VPN/BD-identifying label that leads to look up in the VRFs. That label needs to be different from the label used in the per-PE/region I-PMSIs though, so that the segmentation points can label switch other traffic (not identified by those S-PMSIs). However, this moves the scaling problem from the number of labels to the number of (C-S/*,C-G) routes in VRFs on the segmentation points.¶
In summary, labels can be allocated and advertised in the following ways:¶
Option 1 is simplest, but it requires that all the PEs set aside a common label block for the DCB that is large enough for all the VPNs/BDs/ESes combined. Option 3 is needed only for segmented selective tunnels that are set up dynamically. Multiple options could be used in any combination depending on the deployment situation.¶
Context-Specific Label Space ID Extended Community (EC) is a new Transitive Opaque EC with the following structure:¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x03 or 0x43 | 8 | ID-Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ID-Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+¶
This document introduces a DCB flag (Bit 47 as assigned by IANA) in the "Additional PMSI Tunnel Attribute Flags" BGP Extended Community [RFC7902].¶
In the remainder of the document, when we say a BGP-MVPN/EVPN A-D route "carries DCB-flag" or "has DCB-flag attached" we mean the following:¶
The protocol and procedures specified in this section MAY be used when BIER, or P2MP/MP2MP tunnel aggregation is used for MVPN/EVPN, or BIER/P2MP/MP2MP tunnels are used with EVPN multi-homing. When these procedures are used, all PE routers and segmentation points MUST support the procedures. It is outside the scope of this document how that is ensured.¶
By means outside the scope of this document, each VPN/BD/ES is assigned a label from the DCB or one of those few context-specific label spaces, and every PE that is part of the VPN/BD/ES is aware of the assignment. The ES label and the BD label MUST be assigned from the same label space. If PE Distinguisher labels are used [RFC7582], they MUST be allocated from the same label space as well.¶
In case of tunnel segmentation, each PE is also assigned a disjoint label block from one of those few context-specific label spaces and it allocates labels for its segmented PMSI routes from its assigned label block.¶
When a PE originates/re-advertises an x-PMSI/IMET route, the route MUST carry a DCB-flag if and only if the label in its PTA is assigned from the DCB.¶
If the VPN/BD/ES/PMSI label is assigned from one of those few context-specific label spaces, a Context-Specific Label Space ID Extended Community MUST be attached to the route. The ID-Type in the EC is set to 0 and the ID-Value is set to a label allocated from the DCB and identifies the context-specific label space. When an ingress PE sends traffic, it imposes the DCB label that identifies the context-specific label space after it imposes the label (that is advertised in the Label field of the PTA in the x-PMSI/IMET route) for the VPN/BD and/or the label (that is advertised in the ESI Label EC) for the ESI, and then imposes the encapsulation for the transport tunnel.¶
When a PE receives an x-PMSI/IMET route with the Context-Specific Label Space ID EC, it MUST place an entry in its default MPLS forwarding table to map the label in the EC to a corresponding context-specific label table. That table is used for the next label lookup for incoming data traffic with the label signaled in the EC.¶
Then, the receiving PE MUST place an entry for the label in the PTA or ESI Label EC into either the default MPLS forwarding table (if the route carries the DCB-flag) or the context-specific label table (if the Context-Specific Label Space ID EC is present) according to the x-PMSI/IMET route.¶
An x-PMSI/IMET route MUST NOT both carry the DCB-flag and the Context-Specific Label Space ID EC. A received route with both the DCB-flag set and the Context Label Space ID EC attached MUST be treated as withdrawn. If neither the DCB-flag nor the Context-Specific Label Space ID EC is attached, the label in the PTA or ESI Label EC MUST be treated as the upstream-assigned from the label space of the source PE, and procedures in [RFC6514][RFC7432] MUST be followed.¶
If a PE originates two x-PMSI/IMET routes with the same tunnel, it MUST ensure one of the following so that the PE receiving the routes can correctly interpret the label that follows the tunnel encapsulation of data packets arriving via the tunnel.¶
Otherwise, a receiving PE MUST treat the routes as if they were withdrawn.¶
This document allows three methods (Section 2.2.3) of label allocation for MVPN [RFC6514] or EVPN [RFC7432] PEs and specifies corresponding signaling and procedures. The first method is the equivalent of using common SRGBs [RFC8402] from the regular per platform label space. The second one is the equivalent of using common SRGBs from a third party label space [RFC5331]. The third method is a variation of the second, in that the third party label space is divided into disjoint blocks for use by different PEs, who will use labels from their respective block to send traffic. In all cases, a receiving PE is able to identify one of a few label forwarding tables to forward incoming labeled traffic.¶
None of the [RFC6514], [RFC7432], [RFC8402] and [RFC5331] specifications lists any security concerns related to label allocation methods, and this document does not introduce new security concerns either.¶
IANA has made the following assignments:¶
Bit 47 (DCB) from the "Additional PMSI Tunnel Attribute Flags" registry¶
Bit Flag Name Reference -------- ---------------------- ------------- 47 DCB (temporary) This document¶
Sub-type 0x08 for "Context-Specific Label Space ID Extended Community" from the "Transitive Opaque Extended Community Sub-Types" registry¶
Sub-Type Value Name Reference -------------- ---------------------- ------------- 0x08 Context-Specific Label Space ID This document Extended Community¶
IANA is requested to create a "Context-Specific Label Space ID Type" registry in the "Border Gateway Protocol (BGP) Extended Communities" group. The registration procedure is First Come First Served. The initial assignment is as follows:¶
ID Type Name Reference ------- ---------------------- ------------- 0 MPLS Label This document 1-254 unassigned 255 reserved¶
The authors thank Stephane Litkowski, Ali Sajassi and Jingrong Xie for their review of, comments on and suggestions for this document.¶
The following also contributed to this document.¶
Selvakumar Sivaraj Juniper Networks Email: ssivaraj@juniper.net¶