Internet-Draft | BGP Route Broker | April 2024 |
Xu, et al. | Expires 27 October 2024 | [Page] |
This document describes an optimized BGP route reflector mechanism, referred to as a BGP route broker, so as to use BGP-based IP VPN as an overlay routing protocol in a scalable way for hyperscale data center network virtualization environments, also known as Software-Defined Network (SDN) environments.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 27 October 2024.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
BGP/MPLS IP VPN has been successfully deployed in world-wide service provider networks for two decades and therefore it has been proved to be scalable enough in large-scale networks. Here, the BGP/MPLS IP VPN means both BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659] . In addition, BGP/MPLS IP VPN-based data center network virtualization approaches as described in [RFC7814], especially in the virtual PE model as described in [I-D.ietf-bess-virtual-pe] have been widely deployed in small to medium-sized data centers for network virtualization purpose, also known as Software Defined Network (SDN). Examples include but not limited to OpenContrail.¶
Hyperscale cloud data centers usually have tens of thousands of servers, which are virtualized as Virtual Machines (VMs) or containers. This means that there are at least tens of thousands of virtual PEs, millions of VPNs, and tens of millions of VPN routes from the network virtualization perspective, assuming the virtual PE model is used. However, this poses a significant challenge on the BGP session capacity and the VPN routing table capacity of any given BGP router.¶
The route reflection (RR) mechanism is crucial to address BGP scaling issues. If a one-level route reflector architecture is used, all the VPN routes supported by a data center could be divided among multiple route reflectors by preconfiguring each route reflector with a block of route targets associated with partial VPNs. This means that any single route reflector does not need to maintain all the VPN routes supported by the data center. For redundancy, more than one route reflectors should be preconfigured with the same block of route targets to form a RR cluster.¶
If each virtual PE is attached to at least one VPN corresponding to a given route reflector, that route reflector would have to establish BGP sessions with all virtual PEs, which can create a huge BGP session pressure on route reflectors. To solve this scaling issue, another level (i.e, bottom-level) of route reflectors can be introduced between the existing level (i.e., top-level) route reflectors and the virtual PEs. Each top-level route reflector would establish BGP sessions with all bottom-level route reflectors, rather than all virtual PE routers. Additionally, bottom-level route reflectors would only need to establish BGP sessions with a subset of all virtual PEs respectively. Therefore, the above partition mechanism solves the scaling issue of the BGP session capacity as mentioned above.¶
In a two-level RR hierarchy within hyperscale data centers, using the Route Target Constrain (RTC) mechanism [RFC4684] may have two drawbacks. Firstly, it can be difficult to partition all the VPN routes supported by the data center among multiple top-level RRs. Secondly, virtual PEs may have to receive RT membership NLRIs corresponding to all route targets supported by the data center, which would unnecessarily consume the CPU and RAM resources of virtual PEs.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The bottom-level route reflectors, also known as route brokers, are designed based on the high-performance message queuing mechanisms such as RabbitMQ. These route brokers maintain the route target membership information of their IBGP peers and reflect VPN routes among them on demand. Essentially, route brokers function as the message brokers or exchanges of the message queuing system. On the other hand, top-level route reflectors, known as route collection servers, and virtual PEs, known as route broker clients, act as both message publishers or producers and subscribers or consumers of the message queuing system.¶
Route collection servers advertise route target membership information according to the preconfigured block of route targets on each of them. As a result, route brokers know which VPNs are partitioned to each of them.¶
Route brokers advertise default route target membership information to their own route broker client so as to collect VPN routes from their clients and then reflect them to route collection servers.¶
Route broker clients advertise route target membership information according to the block of route targets which are dynamically configured on each of them. Upon receiving the above advertisement, route brokers would dispatch the received route target memembership information towards the corresponding route collection servers whose preconfigured block of route target cover the advertised route targets.¶
The advertisement of route target membership information is based on Route Target Outbound Route Filtering (ORF) as defined in [I-D.xu-idr-route-target-orf] .¶
Upon receiving a route update message from a route collection server which contains VPN routes for a given VPN, if those VPN routes contained in the route update message are selected as best routes, route brokers would store those VPN routes in their local RIBs and then reflect them to their route broker clients which are associated with that VPN. Meanwhile, the cluster ID of route brokers SHOULD be prepended when reflecting the above VPN routes.¶
Upon receiving a route update message from a route broker client which contains VPN routes for a given VPN, if those VPN routes are selected as best routes, route brokers would store those routes in their local RIBs and then reflect them to the other iBGP peers (including route collection servers and other route broker clients) which are associated with that VPN. Meanwhile, the cluster ID of route brokers SHOULD be prepended when reflecting the above VPN routes.¶
Upon receiving an implicit route request for all the VPN routes for one or more VPNs (via the route target membership information advertisement) from a route broker client, route brokers SHOULD respond with the corresponding VPN routes stored in its local RIBs to that route broker.¶
Upon receiving an implicit route request for all the VPN routes for one or more VPNs (via the route target membership information advertisement) from a route collection server, route brokers SHOULD respond with the corresponding VPN routes stored in its local RIBs which are learnt from their own route broker clients to that route collection server.¶
To simplify the VPN route distribution control, each VPN SHOULD be assigned with a globally unique export route target value.¶
Since the advertisement of multiple paths for a given VPN prefix is needed in the data center SDN environments, virtual PEs SHOULD be assigned with different RDs.¶
Virtual PEs SHOULD NOT establish BGP session with more than one cluster of route brokers which are configured with the same cluster ID.¶
There is no need for IANA to do any action.¶
The authors would like to thank Robert Raszuk for their valuable comments and suggestions on this document.¶