Internet-Draft BGP Route Broker June 2023
Xu Expires 1 January 2024 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-xu-idr-bgp-route-broker-00
Published:
Intended Status:
Standards Track
Expires:
Author:
X. Xu
China Mobile

BGP Route Broker for Hyper-scale SDN

Abstract

This document describes an optimized BGP route reflector mechanism, referred to as a BGP route broker, so as to use BGP-based IP VPN as an overlay routing protocol for hyper-scale data center network virtualization environments, also known as Software-Defined Network (SDN) environments.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 1 January 2024.

Table of Contents

1. Problem Statement

BGP/MPLS IP VPN has been successfully deployed in world-wide service provider networks for two decades and therefore it has been proved to be scalable enough in large-scale networks. Here, the BGP/MPLS IP VPN means both BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659] . In addition, BGP/MPLS IP VPN-based data center network virtualization approaches described in [RFC7814], especially in the virtual PE model described in [I-D.ietf-bess-virtual-pe] have been widely deployed in small to medium-sized data centers for network virtualization purpose, also known as Software Defined Network (SDN). Examples include but not limited to OpenContrail.

When it comes to hyperscale cloud data centers typically housing tens of thousands of servers which in turn are virtualized as Virtual Machines (VMs) or containers, it usually means there would be at least tens of thousands of virtual PEs, millions of VPNs and tens of millions of VPN routes from the network virtualization perspective provided the virtual PE model as mentioned above (a.k.a., a host-based network virtualization model) is used. That means a significant challenge on both the BGP session capacity and the VPN routing table capacity of any given BGP router.

It’s no doubt that the route reflection mechanism should be considered in order to address the BGP scaling issues as mentioned above. Assume a typical one-level route reflector architecture is used, it's straightforward to divide all the VPNs supported by a data center into multiple route reflectors with each route reflector being preconfigured with a block of route targets associated with partial VPNs. In other words, there is no need to have any one route reflector maintain all the VPN routes for all the VPNs supported by the data center. For redundancy, more than one route reflector may be preconfigured with the same block of route targets.

Provided each virtual PE had been attached with at least one VPN corresponding to a given route reflector, that particular route reflector would have to establish BGP sessions with all virtual PEs, it would become a huge BGP session pressure on route reflectors.Now assume that another level (bottom-level) of route reflectors is introduced between the existing level (top-level) of router reflectors and the virtual PEs. Each top-level route reflectors would establish BGP sessions with all bottom-level route reflectors rather than all virtual PE routers. In addition, bottom-level just need to establish BGP sessions with a subset of all virtual PEs respectively. As a result, the scaling issue of the BGP session capacity is solved through the above partition mechanism. However, if the collection of VPNs attached to those route reflector clients (i.e., virtual PEs) belonging to a given bottom-level route reflector covers the all VPNs supported by the data center, that particular bottom-level route reflector would have to hold all the VPNs and all the VPN routes. It means a huge challenge on that particular route reflector.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Solution Overview

Assume the number of BGP sessions to be established on each bottom-level route reflectors can not be reduced further due to some reasons (e.g., it becomes unacceptable to manage too many route-reflectors), the number of VPN routes to be maintained on each bottom-level route reflectors should be alleviated by some means.

By learning from the message queue mechanisms (e.g., RabbitMQ and RocketMQ), those bottom-level route reflectors, referred to as route brokers in the following text, work as follows: they just need to maintain the route target membership information of their BGP peers and reflect VPN routes on demands without the requirement of maintaining VPN routes permanently.

3. Route Target Membership Advertisement Process

Top-level route reflectors, referred to as route servers, advertise route target membership information according to the preconfigured block of Route Targets. As such, route brokers know the VPNs associated with each of them. The route target membership information received form route servers SHOULD NOT be reflected by route brokers to any other iBGP peers further.

Virtual PEs, referred to as route broker clients, advertise route target membership information according to the block of Route Targets which are dynamically configured. The route target membership information received from route broker clients would be deemed by route brokers as an implicit route request for all the VPN routes for the VPNs associated to the corresponding route targets, and only need to be reflected towards the corresponding route servers which are associated with the VPNs associated with the advertised route targets.

4. Proactive Route Distribution Process

Upon receiving a route update message from an iBGP peer (e.g., a route server or a route broker client) which contains VPN routes for a given VPN, route brokers would reflect the received routes to the other iBGP peers which are associated with that VPN. Once the route reflection is finished, the above routes would be deleted.

5. Route Request and Response Process

Upon receiving an implicit route request for all the VPN routes for one or more VPNs (via the route target membership information advertisement) from a route broker client, route brokers SHOULD reflect that request to the corresponding route servers which are associated with the VPNs pertaining to the advertised route targets respectively.

Upon receiving the implicit route request reflected from the BGP broker, route servers SHOULD respond with the corresponding VPN routes to that broker which in turn reflects the received VPN routes to the route broker client. Once route reflection is finished, the received VPN routes would be deleted.

To alleviate the route request processing pressure on route servers, route brokers COULD optionally cache the VPN routes returned from route servers as a response to an implicit route request for a period of time which is configurable. The cached routes could be directly used when responding to the forthcoming route request for those routes.

6. BGP Session Failure Notification

When a route broker loses the BGP connection with a given route broker client, it SHOULD send a Notification message towards all route servers to indicate the failure of the BGP connection with that route broker client.

Upon receiving the above Notification message, route servers would withdraw all VPN routes with the BGP next-hop address being the failed route broker client.

The BGP router ID of the failed route broker client could be carried in a TLV, which in turn is carried in a Notification message with error code of TBD.

7. IANA Considerations

TBD

8. Security Considerations

TBD

9. Acknowledgements

The authors would like to thank Jie Dong for the discussion and review of this document.

10. Normative References

[I-D.ietf-bess-virtual-pe]
Fang, L., Fernando, R., Napierala, M., Bitar, N. N., and B. Rijsman, "BGP/MPLS VPN Virtual PE", Work in Progress, Internet-Draft, draft-ietf-bess-virtual-pe-00, , <https://datatracker.ietf.org/doc/html/draft-ietf-bess-virtual-pe-00>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4364]
Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, , <https://www.rfc-editor.org/info/rfc4364>.
[RFC4659]
De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, "BGP-MPLS IP Virtual Private Network (VPN) Extension for IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, , <https://www.rfc-editor.org/info/rfc4659>.
[RFC7814]
Xu, X., Jacquenet, C., Raszuk, R., Boyes, T., and B. Fee, "Virtual Subnet: A BGP/MPLS IP VPN-Based Subnet Extension Solution", RFC 7814, DOI 10.17487/RFC7814, , <https://www.rfc-editor.org/info/rfc7814>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

Author's Address

Xiaohu Xu
China Mobile
China