Internet-Draft | BGP-SPF Selection Rules | October 2023 |
Dong, et al. | Expires 25 April 2024 | [Page] |
For network scenarios such as Massively Scaled Data Centers (MSDCs), BGP is extended for Link-State (LS) distribution and the Shortest Path First (SPF) algorithm based calculation. BGP-LS-SPF leverages the mechanisms of both BGP protocol and BGP-LS protocol extensions, with new selection rules defined for BGP-LS-SPF NLRI. This document proposes some update to the BGP-LS-SPF NLRI selection rules, so as to ensure a deterministic selection result. The proposed update can also help to mitigate some issues in BGP-LS-SPF route convergence. This document updates the NLRI selection rules in I-D.ietf-lsvr-bgp-spf.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 25 April 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
For network scenarios such as Massively Scaled Data Centers (MSDCs), BGP is extended for Link-State (LS) distribution and the Shortest Path First (SPF) algorithm based calculation. BGP-LS-SPF leverages the mechanisms of both BGP protocol and BGP-LS protocol extensions, with new selection rules for BGP-LS-SPF NLRI defined in [I-D.ietf-lsvr-bgp-spf]. For all BGP-LS-SPF NLRIs, the NLRI selection rules are defined as below:¶
NLRI originated by directly connected BGP SPF peers are preferred.¶
The NLRI with the most recent Sequence Number TLV, i.e., highest sequence number is selected.¶
The NLRI received from the BGP SPF speaker with the numerically larger BGP Identifier is preferred.¶
In some cases, these rules may not be enough to provide deterministic selection result. And in some failure cases, these rules may cause the distribution of the latest link-state information be delayed, which would result in delayed route convergence in the network.¶
This document firstly describes the network scenarios in which the existing NLRI selection rules are considered not enough. Then some updates to the BGP-LS-SPF NLRI selection rules are proposed.¶
Section 6.5.2 of [I-D.ietf-lsvr-bgp-spf] describes the NLRI advertisement in case of node failures. While in some cases, route convergence can be delayed due to the current NLRI selection rules.¶
+-----+ +-----+ link down +-----+ +-----+ | R1 +---------+ R2 +------X------+ R3 +--------+ R5 | +-----+ +--\--+ +--/--+ +-----+ \ / R1-R2: down to up \ / \ / \ / \ / \+-----+/ | R4 | +--+--+ | | | | +--+--+ | R6 | +-----+¶
As shown in the example in Figure 1, a failure of BGP session between R2 and R3 is detected by R3, using either BFD or other detection mechanisms. Since R2 cannot distinguish whether it is a node failure of R2, or a link failure of R2-R3, in order to avoid unnecessary route flaps, according to the description in Section 6.5.2 of [I-D.ietf-lsvr-bgp-spf], R3 will hold all the NLRIs received from R1 for the period of NLRIImplicitWithdrawalDelay. During this period, if the state of link R1-R2 change from down to up, an updated link NLRI of R1-R2 with a greater sequence number would be originated by R2 and advertised to its neighboring nodes. Due to the failure of R2-R3, R3 cannot receive the updated link NLRI directly from R2, while R3 can receive the updated link NLRI of R1-R2 with a greater sequence number from R4. However, according to the NLRI selection rule, R3 would prefer the link NLRI of R1-R2 directly received from R2, thus R3 would not consider the link NLRI R1-R2 received from R4 as the latest one. Consequently, R3 will not use the latest link NLRI of R1-R2 for SPF computation, nor it will advertise the latest link NLRI of R1-R2 to its neighbors. This would cause delayed convergence of the network.¶
According to the rules in [I-D.ietf-lsvr-bgp-spf], for the BGP-LS-SPF NLRIs with the same sequence number, the NLRI received from the numerically larger BGP ID is preferred. While in some cases, this may cause unnecessary redundant advertisement of the same NLRI.¶
+----+ new +----+ +----+ +----+ | R6 +-------+ R1 +---------+ R2 +-------+ R5 | +----+ +-+--+ +-+--+ +----+ | | | | | | | | | | +-+--+ +-+--+ | R3 +---------+ R4 | +----+ +----+¶
As shown in the example in Figure 2, a new BGP session is established between R1 and R6, and R1 advertise the link NLRI of R1-R6 to its neighboring nodes (R2 and R3). R2 firstly receives the link NLRI R1-R6 from R1 directly, and advertise it further to its neighbors (R4 and R5). R4 receives the link NLRI of R1-R6 with the same sequence number from both R3 and R2, and according to the NLRI selection rules, R4 would prefer the NLRI received from R3 according to the rule of numerically larger BGP ID, then R4 advertises this link NLRI of R1-R6 to R2. R2 would also prefer the NLRI received from R4 according to the rule of numerically larger BGP ID, and further advertises this link NLRI to R5, which is a redundant advertisement of its previous advertisement of the same link NLRI.¶
In some scenarios, BGP single-hop peering model is used between directly connected BGP nodes. When two or more parallel links exists between the BGP nodes, multiple BGP sessions are established between the peering nodes, and each session will be used for the distribution of BGP-LS-SPF NLRIs.¶
parallel BGP sessions +----+ +----+ +----+ +----+ | | | +---------+ | | | | R3 +-------+ R1 +---------+ R2 +-------+ R4 | +----+ +-+--+ +-+--+ +----+¶
As shown in the example of Figure 3, there are two parallel links between R1 and R2, and a separate BGP session is established on each link. Based on the existing BGP-LS-SPF NLRI selection rules, from R2's perspective, for the same NLRI with the same sequence number, either the route received from peer R1.1, or the route received from peer R1.2 may be selected as the best. To facilitate network operation and troubleshooting, it is preferable to have a deterministic result of NLRI selection once the network enters relative stable state. Thus some rules to select the preferred NLRI among parallel peering sessions is needed.¶
This document proposes to update the selection rules for all BGP-LS-SPF NLRI as follows:¶
NLRI originated by directly connected BGP SPF peers SHOULD be preferred.¶
The NLRI with the most recent Sequence Number TLV, i.e., highest sequence number SHOULD be selected.¶
For NLRIs received from EBGP peers, the NLRI with smaller number of AS numbers in the AS_PATH attribute SHOULD be preferred.¶
For NLRIs received from IBGP peers, the NLRI with smaller number of Cluster IDs in the CLUSTER_LIST attributes SHOULD be preferred.¶
The NLRI received from the BGP SPF speaker with the numerically larger BGP Identifier SHOULD be preferred.¶
NLRI received from the BGP SPF peer with the smaller peer address SHOULD be preferred.¶
The new rule 3 and 4 is to solve the duplicated advertisement problem as described in section 2.2. The new rule 6 is to solve the indeterministic selection problem as described in section 2.3.¶
For the problem illustrated in Section 2.1, there are several options to solve it, the details will be discussed further and documented in a future version of this document.¶
This document makes no request of IANA.¶
The mechanism described in this document provide updates to the NLRI selection rules for BGP-LS-SPF. It does not introduce any additional security considerations than those described in [RFC4271] and [RFC4272].¶
The authors would like to thank Haibo Wang, Jun Ge and Li Zhang for the valuable discussion and suggestions.¶