Internet-Draft | Hierarchical RR RT-Constraints | November 2023 |
Mohanty, et al. | Expires 13 May 2024 | [Page] |
Route Target Constraints (RTC) is used to build a VPN route distribution graph such that routers only receive VPN routes corresponding to specified route-targets (RT) that they are interested in. This is done by exchanging the route-targets as routes in the RTC address-family and a corresponding "RT filter" is installed that influences the VPN route advertisement. In networks employing hierarchical Route Reflectors (RR) the use of RTC can lead to incorrect VPN route distribution and loss in connectivity as detailed in an earlier draft . Two solutions were provided to overcome the problem.¶
This draft presents a method with suggested modifications to the RTC RFC in order to solve the hierarchical RR RTC problem in an efficient manner.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 13 May 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].¶
Hierarchical RR [RFC4456] deployments with VPN [RFC4364] working in conjunction with RTC [RFC4684] may result in sub-optimal and incorrect VPN route distribution that is nicely described in [I-D.ietf-idr-rtc-hierarchical-rr]. The root reason for this is the way the RR rules for RTC are defined in [RFC4684]. The authors of [I-D.ietf-idr-rtc-hierarchical-rr] furnish two solutions for the problem, one based on add-paths and the other based on diverse-paths constructs. In this memo, we present another another solution to the very same problem.¶
When advertising RT membership NLRI to a route-reflector client, Section 3.2 of [RFC4684] advocates the advertising RR to set the ORIGINATOR_ID attribute [RFC4456] to its own router-id, and the Next-hop attribute to be set to the local address for that session. However, this creates the issue in hierarchical RR setups as explained in [I-D.ietf-idr-rtc-hierarchical-rr]. Fig. 1 represents the same Figure as in [I-D.ietf-idr-rtc-hierarchical-rr]. When RR-2 and RR-3 advertise RT-1 to RR-1, the latter will choose one of the routes to be best and will advertise the same to RR-2 and RR-3 respectively after setting the ORIGINATOR_ID and next-hop to itself. Note that RR-1 will also add its own CLUSTER_ID [RFC4456]to the CLUSTER_LIST but importantly not overwrite the CLUSTER_ID of the sender. This leads to the issue explained in [I-D.ietf-idr-rtc-hierarchical-rr].¶
In the Fig 1, when RR-1 chooses the route from RR-2 as the best route, and formats the next-hop and ORIGINATOR_ID as explained above and then advertises the route to RR-2, RR-2 will drop the route reflected from RR-1 because of the CLUSTER_ID check.¶
RR-2 will therefore not form the outbound filter of RT-1 towards RR-1 which means that after convergence RR-2 will not advertise VPN routes to RR-1 anymore. This leads to an incorrect VPN route distribution across the network.¶
In the scenario of Fig 2. CE-1 is multi-homed to PE-1 and PE-2 and wants to communicate with CE-2 which is behind PE-4. As explained earlier, because RR-1 chooses RR-2 path as best in the RTC family, RR-1 is only receiving the VPN route from RR-3 (and not RR-2) in the steady state.¶
Notice that even though the link between between RR-3 and RR-1 comes down, The RR-2 PATH still remains as best in the RTC address-family at RR-1 and the VPN route advertisements to RR-1 from RR-2 still continue to be blocked. Thus even though there is an alternative connectivity from CE-1 to PE-4 via PE-1, RR-2 and RR-1, the BGP VPN routes cannot be sent. In fact CE-1 is completely cut-off from rest of the network. Generalizing, it means that in a hierarchical RR with only a single first-level RR as its client, the solution is completely broken. Notice that without RTC, RR-1 would have both VPN paths and the loss of connectivity to RR-3 would just result in local convergence at RR-1 subject to the time when the path from RR-2 becomes best.¶
The solutions presented in [I-D.ietf-idr-rtc-hierarchical-rr] are based on¶
Addpath, RR-1 will advertise both the paths from RR-2 and RR-3 to RR-2 and RR-3 so that each of the first level RRS will accept at least one of the routes and install the filter¶
When RR-1 will advertise the best-path to a client or non-client speaker, and that speaker is the one whose path is the best, the advertising router will use the most "diverse" path (different next-hop and ORIGINATOR_ID than the best-path) to accomplish the same goal, i.e. the path will be accepted at the receiving speaker¶
One of the problems of solution 1 are a higher management burden (higher level RR need to be identified, add-paths need to be configured) and therefore an increase in the number of paths to be advertised. The decision on what paths to be advertised also increase management burden (1 extra path, as suggested, may not be enough – there are scenarios where the CLUSTER_LIST of the second best path will contain the cluster-id of the peer). Even advertising all the paths, a NPR scheme cannot be guaranteed, as it can be inferred from some of the examples we’ll present below.¶
For solution 2, a measure of how disjoint are the paths is not well defined. But suffers of the same problems than solution 1. In addition, the new requirement is sending a different update for every client. This effectively breaks the shared peer update-formatting implementation than most vendors use.¶
In the next section, we provide a solution, that does not require add-path and also improves upon [RFC4684] while solving this hierarchical RR issue in RTC.¶
By the rules of [RFC4456], route-reflector client is a property defined by a given BGP speaker to each of its peering session (independently on whether the BGP peer defines it as well or not). This flexible definition can be used to configure non-canonical RR networks (for instance, two peer BGP speakers defining each other as route-reflector clients). Regardless of the recommendation of using this non-canonical networks, they can be used in a RR network without loss of connectivity.¶
Within the scope of RTC, only RR canonical networks are supported. By a RR canonical network, we define a network where each speaker can have the role of a given level within the hierarchy (e.g. RR 1st tier, RR 2nd tier, client), and a higher level can only have as a client a speaker of a lower level. In a RR canonical network, a speaker advertising a route to a client, will never receive this route back. The requirement for a canonical network to propagate RTC routs is implicit in [RFC4684], but is hereby formalized.¶
An additional consideration, as we will see in some of the examples below, it’s also desirable for VPN routes to fully propagate (.e. equivalent to not having RTC routes at all).¶
To solve the problem described, a given client needs to use the RTC route to be create a VPN filter towards the RR, also when the RR is sending back the RTC route advertised by the client. Loop prevention is avoided in [RFC4684] by overwriting attributes that could trigger it. But, as described, this overwriting is only effective when there is only one level of RRs.¶
Two solutions are proposed, one for the sender of RTC routes, that generalizes [RFC4684], and one for the receiver of RTC routes, that uses a different paradigm than the one described on [RFC4684]. Only one need to be implemented. Implementing both, one at the receiver and one at the sender, allows easier interoperability with non-compliant implementations. If sender option is implemented, it will have preference over receiver option (that will become a NOOP).¶
This rule is to be used by the sender of RTC routes.¶
When a RR reflects RTC route from RR client to RR client, some attributes of the route may be overwritten when advertising the best RTC route. This overwrite is particular for RTC address family and will not happen for other address-families. It disables loop detection via those attributes when the best RTC route routes are advertised back to its originators. This is needed in case there are other non-best RTC routes; it allows the originator of the best RTC route to receive a RTC for the route-target of interest and to create its own VPN RT filter towards the RR.¶
The above is a described in [RFC4684], by overwriting ORIGINATOR_ID and NEXT_HOP attributes ((section 3.2, rule (i)). The proposed new rules are a generalization of this concept by the means of overwriting replacing CLUSTER_LIST as well. This new behavior allows the correct propagation of RTC routes at higher level RR.¶
When reflecting the (best-path) RTC route from RR client to RR client, the following rules will apply:¶
When RTC route has CLUSTER_LIST, overwrite all CLUSTER_ID of CLUSTER_LIST to local CLUSTER_ID. Note that when advertising that RTC route, the local CLUSTER_LIST will still be prepending per usual rules.¶
ORIGINATOR_ID is set or overwritten with local router-id.¶
NEXT_HOP is overwritten with local peering address (next-hop-self).¶
A RTC route will be always advertised to the client we received it from.¶
Note that the rules above only exposes RTC routes to routing loops (by overwriting attributes) in the client to client top to down direction (i.e. from client to client). Thus, this draft restricts RFC4684 into disallowing attribute overwrite into non-client to client direction.¶
In Figure 3, consider a case similar to the case in Figure 1 but with 3 levels of RR. Assume there is one physical link for each BGP peering, each with the same IGP cost. Both PE-4 and PE-5 originate a RTC route. Propagation of RTC routes is PE-4->RR-4->RR-2->RR-1 and PE-5->RR-5->RR-3->RR-1. RR-1 choses as best the RTC route from RR-2. It reflects it back to RR-2 and RR-3 with ORIGINATOR_ID=router-id-of-RR-1 and CLUSTER_LIST ={ Clu-1, Clu-1, Clu-1}. RR-2 still prefers the route from RR-4, but it accepts the route received from RR-1. Thus RR-2 creates a VPN filter towards RR-1 to propagate the VPN route. In this case, the RTC route received from RR-1 stops at RR-2, so only the overwriting of the first cluster-id of the CLUSTER_LIST was strictly necessary.¶
Consider a similar scenario in Figure 4. In this case, tier II and tier III of RRs have each the same cluster-id. IGP costs are not exactly defined but assume that they are the cause of the route-propagation that follows. Both PE-4 and PE-5 originate a RTC route. One propagation is PE-5->RR-5->RR-3->RR-1. The IGP costs are such that RR-2 prefers the route received from RR-1. RR-2 reflects the route from RR-1 to RR-4, and RR-4 accepts it because it receives CLUSTER_LIST = {Clu-2, Clu-1, Clu-1, Clu-1} (after RR-1 overwrote and RR-2 prepended). Similarly, RR-3 reflects the route received from RR-5 to RR-4, and RR-4 accepts it because it receives CLUSTER_LIST = {Clu-2, Clu-2} (after RR-3 overwrote it).¶
Consider now a different set rule: only the first cluster-id of the CLUSTER_LIST is overwritten. In this case, then RR-4 would have received CLUSTER_LIST = {Clu-2, Clu-1, Clu-1, Clu-3}. RR-4 would have discarded the update. The end result is that RR-4 would not install the VPN filter towards RR-2 and it would not advertise VPN routes towards RR-2. This becomes a network where the VPN routes are not fully propagated (i.e. the propagation of VPN routes is different than if there were no RTC routes at all). In this kind of network, VPN routes still reach PE-6. However, if RR-3/RR-5 went down, VPN routes would not immediately reach RT-1. RTC routes would have to reconverge and then a filter would be installed to allow RR-4 to advertise routes to RR-2. Thus, convergence would suffer.¶
It can be seen that for the general case it’s necessary to overwrite all the cluster-id of the CLUSTER_LIST.¶
RFC4684 is not explicit about it, but the underlying assumption is that a route received from a route-reflector-client MUST be reflected back to that client. Hereby, this is made explicit.¶
The following recommended (NEXT_HOP-IGNORE) rules can be implemented:¶
When reflecting a RTC route, NEXT_HOP overwrite is disabled.¶
When receiving A RTC route, it is not discarded even if the received NEXT_HOP is one of the IP addresses of the speaker.¶
The NEXT_HOP-IGNORE rules effectively allow using the same the same NEXT_HOP across the network. They are a change respect [RFC4684] even for a single level of RR. Note that disabling NEXT_HOP check doesn’t create any more loop conditions in a canonical network.¶
An advantage of using the NEXT_HOP-IGNORE rules is that the selection of best-path RTC route is now determined by the IGP cost to the original next-hop. Otherwise, propagation of RTC routes is more unforeseeable and it depends on the IGP costs towards the peering address of each individual peer.¶
This rule is to be used by the receiver of RTC routes.¶
When receiving a RTC route, the following rules will apply:¶
CLUSTER_ID, ORIGINATOR_ID and NEXT_HOP checks will be considered, but instead of discarding the routes, the route will be kept in Adj-RIB-IN as a Received-only route.¶
A route in Received-only state will not be considered for best -path nor advertised to any peer¶
A route in Received-only state will be considered to install a VPN filter.¶
The rules above apply also to just one level of RR, and it’s a solution not contemplated in RFC4684.¶
The rules above will allow propagation of RTC routes in a different way than using the sender option rules (with sender option, non-client to client propagation will not be stopped). But the creation of VPN filters will be the same in a standard RR topology.¶
An additional optional route is defined to optimize the propagation of RTC routes to the RR when unnecessary.¶
When reflecting the (best-path) RTC route from RR client to RR client, the following rule will apply:¶
-When the RR best RTC route is from a client and that RTC route is not being received from any other peer, the RR MAY skip the advertisement towards that client.¶
The rule above can be used as an optimization even if only the receiver rule is implemented.¶
With the procedures it is not necessary for the RR to know in which level it is operating. The above rules are compatible. We always advertise best-path for any rule and it is easily seen that RR-2 will accept the RT Constraint path advertised from RR-1 . Since the path is accepted, the RT Filter at RR-2 will pass the VPN routes, and the problem scenarios are resolved accordingly.¶
With this specification in the RT-Constraint address-family, we solve both the incorrect and sub-optimal issues as mentioned above. There is no need for add-paths. We can also optimize over [RFC4684] on RTC advertisements based on diversity of ORIGINATOR_ID and CLUSTER_ID so that a higher level RR does not have to be populated with VPN routes with a specific RT if that RT is not present in other clusters.¶
None.¶
This document raises no new security issues for RT Constraints.¶
The authors would like to thank Swadesh Agrawal and M. Mirza for useful discussions related to hierarchical RR RTC.¶