Internet-Draft | MP-BGP Extension for 4map6 Advertisement | September 2023 |
Xie, et al. | Expires 29 March 2024 | [Page] |
This document defines MP-BGP extension and the procedures for IPv4 service delivery in multi-domain IPv6-only underlay networks. It defines a new BGP path attribute known as the "4map6" to be used in conjunction with the existing AFI/SAFI for IPv4 and IPv6. This attribute with associate an IPv4/IPv6 address mapping rule that will allow IPv4 traffic to cross IPv6-only domains. The behavior of each type of network (IPv4 and IPv6) also illustrated.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 29 March 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The document [I-D./draft-ietf-v6ops-framework-md-ipv6only-underlay] proposes a framework for deploying IPv6-only as the underlay in multi-domain networks, in which IPv4 packets will be stateless translated or encapsulated into IPv6 ones for transmission across IPv6-only underlay domains. To achieve this goal, this framework introduces a specific data structure called IPv4/IPv6 address mapping rule to support stateless IPv4-IPv6 packet conversion at the edge of the network. For brevity, in the rest of the document, we will refer to the IPv4/IPv6 address mapping rule as mapping rule. For an incoming IPv4 packet, the mapping rules are used by the ingress PE to generate corresponding IPv6 source and destination addresses from the IPv4 source and destination address of the original IPv4 packet, and vice versa. Since the mapping rule for the destination IPv4 address can identify the right PE egress by providing the IPv6 mapping prefix, it gives the direction of IPv4 service data transmission throughout the IPv6-only network. It is obvious that the exchange of the mapping rule corresponding to the destination IPv4 address in a packet should precede to the process of IPv4 data transmission in IPv6-only network, otherwise, the data originated from IPv4 network will be dropped due to the absence of the IPv6 mapping prefix corresponding to its destination address.¶
When an ingress PE processes the incoming IPv4 packets, the mapping rule for the source address can be obtained locally, but for the mapping rule of the destination address, since it is not generated locally by the ingress PE, it needs corresponding methods to be obtained remotely. This document defines MP-BGP extension in which BGP update message contains the mapping rule for IPv4 service delivery. The extensions include new BGP Path Attribute known as the "4map6" corresponding to the NLRI and a set of related procedures.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
In the context of this document, multi-domain underlay networks refer to a network system composed of multiple autonomous systems (i.e., AS) interconnected, each AS can serve different scenarios. Multi-domain networks can be operated by one or more network operators. Consider the following scenarios, the network shown in figure 1 is typical multi-domain IPv6-only underlay networks, it is used as a basic scenario to illustrate the extension of the MP-BGP and its related procedures in this document. The whole network comprises of AS1, AS2 and AS3, it provides IPv4 services communications between IPv4 network N1 and IPv4 network N2, which have IPv4 address block IPv4 A1 and A2 respectively. It is consistent with section 6 of draft [I-D.ietf-v6ops-framework-md-ipv6only-underlay].¶
IPv4 A1 +-------+ +-+ +-----+ IPv4 A2 +---------+ / AS1 \ /AS2\ / AS3 \ +---------+ | IPv4 | |+--++ +---+ | |+--+ | | +--+ +--+ | | IPv4 | | network N1|---||PE1|--| P1|-|--||P2|-|--|-|P3|-|PE2|-|---|network N2| +---------+ |+---+ +---+ | |+--+ | | +--+ +--+ | +---------+ \ / \ / \ / +-------+ +-+ +------+ Figure 1.Topology of Typical Multi-domain IPv6-only Networks¶
PE and P routers are network devices which constitute the IPv6-only underlay. The definition of PE and P is consistent with that in draft [I-D.ietf-v6ops-framework-md-ipv6only-underlay]. It should be noted that in multi-domain networks, some ASBRs are not at the edge of the network. In this case, they run as P routers. On each PE router that the IPv4 address prefix is reachable through, there is a locally configured IPv6 virtual interface (VIF) address. The VIF address, as an ordinary global IPv6 /128 address, must also be injected into the IPv6 IGP so that it is reachable across the multi-domain transit core.¶
The following term will be used in this document,¶
• Distance metric, the distance to the egress PE in terms of the number of ASes.¶
The extension of MP-BGP for mapping rule processing and transmission across domains in this document will involve PE and P routers. Each PE or P router maintains a Mapping rule Database (MD) as depicted in figure 2. The entry in the MD database consists of an IPv4 address prefix, IPv4 address prefix length, IPv6 mapping prefix of the PE, IPv6 mapping prefix length and the distance to the egress. It should be noted that the database here is just an example, and developers can design the structure of database according to the actual situation.¶
+----------+----------------+----------+---------------+------------+ | IPv4 | IPv4 | IPv6 | IPv6 | Distance | | Address | Address | Mapping | Mapping | to the | | Prefix | Prefix Length | Prefix | Prefix Length | Egress | +----------+----------------+----------+---------------+------------+ Figure 2: Entry of Mapping Rule Database¶
The IPv4 packet sent from IPv4 network N1 will traverse the IPv6-only network and reach the destination network, i.e., IPv4 network N2. Its source address and destination address are within IPv4 address block A1 and A2 repectively. Its ingress in the IPv6-only network is PE1 and the egress is PE2. Before the data packet is transmitted, the address mapping rules corresponding to IPv4 address block A2 should be transmitted from PE2 to PE1. During the mapping rule announcement and transmission process, it may pass through the intermediate nodes, such as P3, P2 and P1, and finally reaches PE1. For a given intermediate P node, it may receive advertisement messages of this mapping rule from multiple upstream intermediate nodes. In order to reduce the overall quantity of advertisement message, it needs to select and update the local MD database, generates new advertisement messages based on the selected mapping rule information and transmit them to downstream intermediate nodes or PE routers.¶
This mechanism is also in line with the requirements of emerging scenarios such as DCN for AI infra fabric, as described in Appendix A.¶
This document specifies a way in which BGP protocol can be used by a given PE to tell other PE, "If you need to send IPv4 packet whose destination address is within a given IPv4 address block, please send them to me, here's the information you need to properly transform the IPv4 packets into IPv6 ones". Multiprotocol BGP (MP-BGP) [RFC4760] specifies that the set of usable next-hop address families is determined by the Address Family Identifier (AFI) and the Subsequent Address Family Identifier (SAFI). [RFC8950] specifies the extensions to allow advertisement of IPv4 NLRI or VPN IPv4 NLRI with a next-hop address that belongs to the IPv6 protocol. This document specifies the extensions necessary to support the transmission of mapping rule from any egress PE to any ingress PE within and across domains. Since it is based on IPv6-only routing paradigm, it leverages the combination of AFI and SAFI, with the value of 2 and 1 respectively, which identifies Network Layer Reachability Information (NLRI) used for unicast forwarding in IPv6 network. In addition, in order to identify that this BGP update message is used for the transmission of the mapping rule, it needs to contain a newly defined BGP path attribute type -- the 4map6 attribute. With this attribute, the IPv6 mapping prefix and IPv4 address block can be identified from NLRI,other information can also be obtained to properly transform the IPv4 packets. The BGP update whose MP_REACH_NLRI attribute contains the AFI/SAFI combinations and 4map6 BGP path attribute specified above is called as 4map6 routing information. The use and meaning of the fields of MP_REACH_NLRI in this case are as follows:¶
– AFI = 2 (IPv6)¶
– SAFI = 1 (Unicast)¶
– Length of Next Hop¶
– Network Address of Next Hop = When a BGP speaker advertises the 4map6 NLRI via BGP, it uses its own address as the BGP next hop in the MP_REACH_NLRI.¶
– NLRI = Composite IPv6 address prefix, which is composed of a IPv6 mapping prefix, the original IPv4 address prefix, and the remaining bits are zero.¶
The NLRI field is encoded as shown in figure 3:¶
+----------------------------+ | Length 1 octet | +----------------------------+ | Prefix variable | +----------------------------+ Figure 3: Format of NLRI Field¶
As a new BGP path attribute defined in this document, 4map6 attribute is optional and transitive, it requires IANA to assign a new BGP path attribute value. The attribute is composed of a set of fields as below,¶
+---------------------------------------------------+ | Length of IPv6 Mapping Prefix(1 octet) | +---------------------------------------------------+ | Forwarding Type(1 octet) | +---------------------------------------------------+ | Address Origin Type(1 octet) | +---------------------------------------------------+ | IPv4 Original ASN (4 octets) | +---------------------------------------------------+ Figure 4:Encoding of the 4map6 attribute¶
The use and meaning of these fields are as follows:¶
a) Length of IPv6 Mapping Prefix¶
This is a 1-octet field whose value indicates the length of IPv6 mapping prefix.¶
b) Forwarding Type¶
This field identifies the IPv4/IPv6 forwarding capability of the egress PE, the data octet can assume the following values:¶
Value Meaning¶
0 Translation and encapsulation¶
1 Encapsulation¶
2 Translation¶
c) Address Origin Type¶
The data octet can assume the following value:¶
Value Meaning¶
0 Local¶
1 Relay¶
d) IPv4 Original ASN¶
This field is the copy of the Origin AS number in BGP update message received from IPv4 domain. The value of this field exists only when the value of "Address Origin Type" is 1, otherwise it is NULL.¶
In addition, when the value of IPv4 Original ASN is set, ATTR_ SET attribute(type code 128), defined in [RFC 6368], can be used to transfer the routing information of the IPv4 network in multi-domain IPv6-only networks.¶
When a PE ceases to provide egress service for a given IPv4 address block, it may explicitly withdraw the mapping rules associated with it. Suppose a PE has announced, on a given BGP session, the mapping rule of a given IPv4 address prefix and it now wishes to withdraw that mapping rule. To do so, it may send a BGP UPDATE message with an MP_UNREACH_NLRI attribute.¶
This encoding of MP_UNREACH_NLRI attribute is used for explicitly withdrawing the mapping rule for a given IPv4 prefix (on a given BGP session). Note that IPv4 address prefix/IPv6 mapping prefix bindings that were not advertised on the given session can not be withdrawn by this method.¶
When using an MP_UNREACH_NLRI attribute to withdraw a IPv4 route whose NLRI was previously specified in an MP_REACH_NLRI attribute, the lengths and values of the respective prefixes must match, and the respective AFI/SAFIs must match. An explicit withdrawal in an AFI/SAFI UPDATE on a given BGP session not only withdraws the binding between the IPv4 address prefix and the IPv6 mapping prefix, it also withdraws the path to that prefix that was previously advertised in an UPDATE on that session.¶
When a PE router learns IPv4 routing information from the locally attached IPv4 access networks, the control plane of the PE should process the information as follows:¶
1. Install and maintain local IPv4 routing information in the IPv4 routing database.¶
2. Install and maintain new entries in the MD database. Each entry should consist of the IPv4 address prefix and the local IPv6 mapping prefix.¶
3. Advertise the content of each entry in the local MD database in the form of BGP update advertisement to IPv6 peer routers. The process to generate IPv6 route advertisement with 4map6 attribute based on IPv4 route advertisement messages is as follows:¶
a) Set the values of AFI and SAFI in MP_REACH_NLRI to 2 and 1 respectively;¶
b) The IPv6 mapping prefix of the egress PE splices IPv4 address blocks in IPv4 routing advertisements to form a composite IPv6 address prefix with the length value denoted by L1. The composite IPv6 address prefix is copied to address prefix field of the NLRI structure in the MP_ REACH_NLRI, and the length field of the NLRI is set to L1, the structure of the composite IPv6 address prefix in NLRI is shown in figure 5. L2 is used to denote the length of the IPv6 mapping prefix of PE2, i.e. Pref6-2. When the value of L2 is available, the field of Length of IPv6 Mapping Prefix in the 4map6 attribute is set to L2.¶
c) The value of Origin ASN in the original IPv4 route advertisement is copied to the field of IPv4 Original ASN of 4map6 attribute, the values of Length of AS_ Path, AS_Path are copied to the corresponding fields of ATTR_ SET attribute respectively.¶
|--------L2--------| +------------------+------------------+-------------+ | IPv6 Mapping | IPv4 | ...0000... | | Prefix of PE2 | address prefix | | +------------------+------------------+-------------+ |-----------------L1------------------| Figure 5:Structure of IPv6 prefix in NLRI¶
When a P router receives BGP update advertisement from neighboring P or PE routers and uses that information to populate the local MD database, the following procedures are used to update the MD database and send mapping rule advertisement to next equipment:¶
1. Validate the received BGP update advertisement as 4map6 routing information by finding the 4map6 attribute.¶
2. Extract the IPv4 address prefix which is encoded in positions L2 to L1-1 of the NLRI field and lookup its local MD database, if an entry which matches the IPv4 address prefix is found, then,¶
– Compare the distance metric in the 4map6 attribute of BGP advertisement and that of the entry found, if the former is less than the latter, then¶
• Update the entry found in the MD database with the attributes of BGP advertisement by extracting the IPv6 address prefix from the IPv6 mapping prefix field and place that as an associated entry next to the IPv4 network index.¶
• Advertise the updated content of the entry found in the form of MP_REACH_NLRI update information to IPv6 peer routers.¶
else then¶
• Keep the entry in the MD database unchanged.¶
• Advertise the content of the entry found in the form of BGP update advertisement to IPv6 peer routers.¶
else then¶
– Install and maintain a new entry in the MD database with the extracted IPv4 prefix, its corresponding IPv6 mapping prefix and distance metric to the egress.¶
– Advertise the content of the entry found in the form of BGP update advertisement to IPv6 peer routers.¶
It should be noted that this process does not change or affect the IPv6 FIB table of the P router.¶
When a PE router receives BGP advertisement from neighboring P or PE routers and uses that information to populate the local MD database and the BGP routing database, the following procedures are used to update the MD database and send IPv4 routing information to its IPv4 peers.¶
1. Validate the received BGP update advertisement as 4map6 routing information by finding the 4map6 attribute.¶
2. Extract the IPv6 Mapping Prefix which is encoded in positions 0 to L2-1 of the NLRI field and compare the obtained IPv6 Mapping Prefix with its own IPv6 Mapping Prefix, and if the two match, proceed to the next step. Otherwise, this update will be announced to its other BGP Peers.¶
3. Extract the IPv4 address prefix which is encoded in positions L2 to L1-1 of the NLRI field and lookup in the MD database, if an entry which matches the IPv4 address prefix is found, then,¶
– Compare the distance metric in the BGP advertisement and that of the entry found, if the former is less than the latter, then¶
• Update the entry found in the MD database with the 4map6 attributes of BGP advertisement by extracting the IPv6 address prefix from the IPv6 mapping prefix field and place that as an associated entry next to the IPv4 network index.¶
• Redistribute the new 4map6 routing information to the local IPv4 routing table. Set the destination network prefix as the extracted IPv4 address prefix, set the Next Hop as Null, and set the OUTPUT Interface as the 4map6 VIF on the local PE router.¶
else then¶
• Keep the entry in the MD database unchanged.¶
else then¶
– Install and maintain a new entry in the MD database with the extracted IPv4 prefix, its corresponding IPv6 mapping prefix and distance metric to the egress.¶
– Redistribute the new 4map6 routing information to the local IPv4 routing table. Set the destination network prefix as the extracted IPv4 address prefix, set the Next Hop as Null, and set the OUTPUT Interface as the 4map6 VIF on the local PE router.¶
As mentioned in [I-D./draft-ietf-v6ops-framework-md-ipv6only-underlay], multi-domain IPv6-only networks support both translation and encapsulation technologies for IPv4 data delivery at the data forwarding layer. Take the encapsulation as an example, the reachability to the egress endpoint of tunnel may change over time, directly impacting the feasibility of the IPv4 service delivery. A tunnel that is not feasible at some moment may become feasible at later time when its egress endpoint address is reachable. The router may start using the newly feasible tunnel instead of an existing one. This may happen for translation-based data-path as well. How this decision is made is outside the scope of this document.¶
[RFC5492]defines a Capabilities Optional Parameter and processing rules. The Capabilities Optional Parameter is a triple that includes a one-octet Capability Code, a one-octet Capability length, and a variable-length Capability Value. A BGP speaker can include a Capabilities Optional Parameter to communicate capabilities in a BGP OPEN message. A PE or P router that wishes to exchange mapping rule information must use the Multiprotocol Extensions Capability Code as defined in [RFC4760], to advertise the corresponding (AFI, SAFI) pair.¶
When a BGP speaker encounters an error while parsing the 4map6 path attribute, the speaker must treat the update as a withdrawal of existing routes to the included 4map6 SAFI NLRIs, or discard the update if no such routes exist. A log entry should be raised for local analysis.¶
With this document IANA is requested to allocate the following codes,¶
1)A code for 4map6 path attribute in the BGP “BGP Path Attributes” registry¶
2)Value xx for 4map6 in the BGP "Capability Codes" registry¶
All the codes above use this document as the reference.¶
This extension to MP-BGP does not change the underlying security issues inherent in the existing MP-BGP.¶
There is enormous "East-West" traffic inside the data center network, which are the flows between DC devices and applications. Upgrading the DCN network firstly to dual-stack, then IPv6-only is nontrivial. One exmaple is building AI-infra fabric on IPv6 only fabric which reduce data plane encapsulation overhead, simplify forwarding chip's feature and improve data plane performance.¶
When DCN plans to transits from dual stack to IPv6-only, it is impossible to be done overnight. Considerations and plans should be made supporting legacy IPv4 servers and applications when the DCN is IPv6-only. The IPv6-only framework proposed in this memo provide availability for IPv4 service when the underlay Networks upgraded to IPv6-only.¶
As shown in Figure 6, Host 1 and Host 2 are legacy servers with only IPv4 capability. Traffic between Host 1 and Host 2 are carried by IPv6 network in the DCN. The access switch(ASW) have the function of ADPT which learns IPv4/IPv6 mapping rules and delivers the IPv4 service in IPv6-only network.¶
Internet ^ | ^ +----------------+------------------+ | | Data Center Network | | +----+-------------------------+----+ | | | | +----+-------------------------+----+ | | | IPv6-only | PSW/R1 |AS2 | +----+--------------------------+---+ | | | | | | v +----+---+ +----+---+ ------- | | | | ^ |ASW/PE1 |AS1 |ASW/PE2 |AS1 | +----+---+ +----+---+\ dualstack | | \ | +-+-+ +-+-+ +---+ v | H1|IPv4 IPv4| H2| | H3| IPv6 +---+ +---+ +---+ Figure 6:IPv6-only DCN for AI infra fabric¶