Internet-Draft | NNHN | December 2023 |
Wang & Haas | Expires 16 June 2024 | [Page] |
BGP speakers learn their next hop addresse for NLRI in [RFC4271] in the NEXT_HOP field and in [RFC4760] in the "Network Address of Next Hop" field. Under certain circumstances, it might be desirable for a BGP speaker to know both the next hops and the next-next hops of NLRI to make optimal forwarding decisions. One such example is global load balancing (GLB) in a Clos network.¶
[I-D.ietf-idr-entropy-label] defines the "Next Hop Dependent Capabilities Attribute" (NHC) which allows a BGP speaker to signal the forwarding capabilities associated with a given next hop.¶
This document defines a new NHC capability, the Next-next Hop Nodes (NNHN) capability, which can be used to advertise the next-next hop nodes associated with a given next hop.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 16 June 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
BGP speakers learn their next hop addresse for NLRI in [RFC4271] in the NEXT_HOP field and in [RFC4760] in the "Network Address of Next Hop" field. Under certain circumstances, it might be desirable for a BGP speaker to know both the next hops and the next-next hops of NLRI to make optimal forwarding decisions. One such example is the global load balancing (GLB) in a Clos network.¶
When a route's ECMP has multiple next hops, packets forwarded using that ECMP are hashed to the member next hops for load balancing purposes. If one of the member next hop links is congested due to uneven hashing, dynamic load balancing (DLB) allows the node to adjust the hashing so that the congestion on that link can be mitigated. When all next hop link(s) are congested, DLB on the local node will not help to mitigate the congestion. Such nodes will require help from the previous hop(s) to shift the traffic towards alternative nodes to mitigate such congestion. This process is called global load balancing.¶
In a Clos network, a congested link will affect the load balancing decisions of the previous layer nodes equally. Because of this, the previous previous layer nodes do not need to change their load balancing decisions towards the previous layer nodes to mitigate this link congestion. This means we only need to know the link congestion status of the next-next hops of given BGP route in order to make GLB decisions. The combined link quality of each next hop and its corresponding next-next hops can be used as the feedback for DLB.¶
The purpose of this document is to provide a method for BGP to learn the next-next hops - or more specifically, the next-next hop nodes. When a next hop node has more than one next-next hops towards a next-next hop node, DLB helps to balance the load between the multiple next-next hops by locally adjusting the volume of traffic hashed over a given ECMP member link. Thus, only the overall link congestion between the next hop node and the next-next hop node is important for GLB.¶
Note that the mechanism for detecting link congestion and communicating them to the previous hop nodes is out of the scope of this document.¶
This document defines a new NHC capability, the Next-next Hop Nodes (NNHN) capability, for the BGP Next Hop Dependent Capabilities Attribute (NHC) defined in [I-D.ietf-idr-entropy-label]. A downstream BGP speaker can use the NNHN to advertise the next-next hop nodes corresponding to the next hop of an NLRI. This allows the upstream BGP speaker to learn the next-next hop nodes corresponding to each of its next hop nodes.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
[I-D.ietf-idr-entropy-label] defines NHC as a container for capability TLVs. Next-next Hop Nodes is one such capability. It specifies the next-next hop nodes corresponding to the next hop field in the NHC.¶
The NNHN TLV has the NHC capability code TBD. The NHC capability length specifies the remaining number of octets in the NNHN TLV. The NNHN capability format is shown in Figure 1:¶
All procedures from Section 2.2 of [I-D.ietf-idr-entropy-label] apply.¶
When a BGP speaker S has a BGP route R it wishes to advertise with next hop self to its peer, it MAY choose to originate an NNHN capability. The "Next-hop BGP ID" field MUST be set to the BGP Identifier this BGP speaker uses with the peer.¶
For all the ECMP paths of route R which are used for forwarding, the BGP Identifiers of those BGP peers MUST be encoded as the "Next-next-hop BGP IDs". When more than one paths are from the same BGP peer, the capability MUST have only one BGP Identifier of that peer.¶
When there are more than one "Next-next-hop BGP IDs" in the capability, they MUST be encoded in the numerically ascending order treating the BGP Identifier as a network byte order encoded 32-bit unsigned integer.¶
An NNHN with no "Next-next-hop BGP IDs" MUST NOT be sent.¶
When a BGP speaker S has a BGP route R it wishes to advertise with next hop self to its peer, it MUST NOT forward the NNHN capability received from downstream peers. It either originates its own NNHN capability as described above or does not send one.¶
When a BGP speaker S has a BGP route R it wishes to advertise with the next hop that has not been set to self, it MUST NOT originate an NNHN capability. However, if a NNHN capability has been received for route R and passed the NHC validation as defined in [I-D.ietf-idr-entropy-label], the NNHN capability SHOULD be forwarded.¶
All procedures from Section 2.3 of [I-D.ietf-idr-entropy-label] apply.¶
When a BGP speaker wishes to enforce hop-by-hop eBGP propagation of the NNHN, if the received NNHN capability's Next-hop BGP Identifier does not match the BGP Identifier of the BGP speaker the UPDATE was received from, it MUST BE ignored and discarded.¶
The receiver of the NNHN capability MUST be able to handle any order of the "Next-next-hop BGP IDs".¶
Duplicate BGP Identifiers in the "Next-next-hop BGP IDs" MUST BE silently ignored.¶
The details for the use of the NNHN capability for global load balancing is out of the scope of this document.¶
The NNHN capability length MUST be at least 8 and MUST be divisible by 4, otherwise it is malformed. Malformed NNHN capabilities MUST be discarded and SHOULD be logged.¶
Since BGP Identifiers are used to identify the next-next hop nodes, we need to make sure they are unique across the network where NNHN capability is sent.¶
A new capability code, TBD, will be requested from the "BGP Next Hop Dependent Capability Codes" registry of the Border Gateway Protocol (BGP) Parameters group for the NNHN capability defined in this document.¶
Insertion of a syntactically valid but bogus NNHN capability by an attacker could potentially make the forwarding behavior of the route non-optimal.¶
An alternative way to carry next-next hops is via a separate path attribute. We evaluated both approaches and choose the NNHN capability approach for several reasons:¶
TBD.¶
TBD.¶