Internet-Draft | Preference-based EVPN DF Election | July 2023 |
Rabadan, et al. | Expires 7 January 2024 | [Page] |
The Designated Forwarder (DF) in Ethernet Virtual Private Networks (EVPN) is defined as the PE responsible for sending Broadcast, Unknown unicast and Broadcast traffic (BUM) to a multi-homed device/network in the case of an all-active multi-homing Ethernet Segment (ES), or BUM and unicast in the case of single-active multi-homing. The Designated Forwarder is selected out of a candidate list of PEs that advertise the same Ethernet Segment Identifier (ESI) to the EVPN network, according to the Default Designated Forwarder Election algorithm. While the Default Algorithm provides an efficient and automated way of selecting the Designated Forwarder across different Ethernet Tags in the Ethernet Segment, there are some use cases where a more 'deterministic' and user-controlled method is required. At the same time, Service Providers require an easy way to force an on-demand Designated Forwarder switchover in order to carry out some maintenance tasks on the existing Designated Forwarder or control whether a new active PE can preempt the existing Designated Forwarder PE.¶
This document proposes a Designated Forwarder Election algorithm that meets the requirements of determinism and operation control.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 7 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
[RFC7432] defines the Designated Forwarder (DF) in EVPN networks as the PE responsible for sending Broadcast, Multicast and Unknown unicast traffic (BUM) to a multi-homed device/network in the case of an all-active multi-homing Ethernet Segment or BUM and unicast traffic to a multi-homed device or network in case of single-active multi-homing. The Designated Forwarder is selected out of a candidate list of PEs that advertise the Ethernet Segment Identifier (ESI) to the EVPN network and according to the Designated Forwarder Election Algorithm, or DF Alg as per [RFC8584].¶
While the Default Designated Forwarder Algorithm [RFC7432] or the Highest Random Weight algorithm (HRW) [RFC8584] provide an efficient and automated way of selecting the Designated Forwarder across different Ethernet Tags in the Ethernet Segment, there are some use-cases where a more 'deterministic' and user-controlled method is required. At the same time, Service Providers require an easy way to force an on-demand Designated Forwarder switchover in order to carry out some maintenance tasks on the existing Designated Forwarder or control whether a new active PE can preempt the existing Designated Forwarder PE.¶
This document proposes two new DF Algorithms (Highest-Preference and Lowest-Preference) which provide the deterministic Designated Forwarder method required, as well as the "Don't Preempt" capability to address the need to control whether a PE can take over an existing Designated Forwarder PE.¶
The procedures described in this document meet the following requirements:¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This solution reuses and extends the Designated Forwarder Election Extended Community defined in [RFC8584] that is advertised along with the Ethernet Segment route. It does so by replacing the last two reserved octets of the DF Election Extended Community when the DF Algorithm is set to Highest-Preference or Lowest-Preference. This document also defines a new capability referred to as "Don't Preempt" capability, that MAY be used with DF Algorithms Highest-Preference or Lowest-Preference. The format of the DF Election Extended Community that is used in this document follows:¶
Where the above fields are defined as follows:¶
DF Algorithm can have the following values:¶
Figure 3 illustrates an example that will be used in the description of the solution.¶
Figure 3 shows three PEs that are connecting EVCs coming from the Aggregation Network to their EVIs in the EVPN network. CE1 is connected to vES1 - that spans PE1 and PE2 - and CE2 is connected to vES2, that is attached to PE1, PE2 and PE3.¶
If the algorithm chosen for vES1 and vES2 is DF Algorithm Highest-Preference or Lowest-Preference, the PEs may become Designated Forwarder irrespective of their IP address and based on the administrative Preference value. The following sections provide some examples of the procedures and how they are applied in the use-case of Figure 3.¶
Assuming the operator wants to control - in a flexible way - what PE becomes the Designated Forwarder for a given virtual Ethernet Segment and the order in which the PEs become Designated Forwarder in case of multiple failures, the following procedure may be used:¶
According to [RFC8584], each PE will run the Designated Forwarder election algorithm upon expiration of the DF Wait timer. Each PE runs the Highest-Preference or Lowest-Preference DF Algorithm for each Ethernet Segment as follows:¶
In case of equal Preference in two or more PEs in the Ethernet Segment, the DP bit and the numerically lowest IP address of the candidate PEs are used as tie-breakers. If more that one PE is advertising itself as the preferred Designated Forwarder, an implementation MUST first select the PE advertising the DP bit set, and then select the PE with the lowest IP address (if the DP bit selection does not yield a unique candidate). The PE's IP address is the address used in the candidate list and it is derived from the Originating Router's IP address of the Ethernet Segment route. In case PEs use Originating Router's IP address of different families, an IPv4 address is always considered numerically lower than an IPv6 address. Some examples of the use of the DP bit and IP address tie-breakers follow:¶
The Preference is an administrative option that MUST be configured on a per-Ethernet Segment basis, and it is normally configured from the management plane. The Preference value MAY also be dynamically changed based on the use of local policies that react to events on the PE. The following examples illustrate the use of local policy to change the Preference value in a dynamic way.¶
The Highest-Preference and Lowest-Preference Algorithms MAY be used along with the AC-DF capability. Assuming all the PEs in the Ethernet Segment are configured consistently with Highest-Preference or Lowest-Preference Algorithm and AC-DF capability, a given PE in the Ethernet Segment is not considered as candidate for Designated Forwarder Election until its corresponding Ethernet A-D per ES and Ethernet A-D per EVI routes are not received, as described in [RFC8584].¶
The Highest-Preference and Lowest-Preference DF Algorithms can be used in different virtual Ethernet Segments on the same PE. For instance, PE1 and PE2 can use Highest-Preference for vES1 and PE1, PE2 and PE3 Lowest-Preference for vES2. The use of one DF Algorithm over the other is the operator's choice. The existence of both provide flexibility and full control to the operator.¶
The procedures in this document can be used in [RFC7432] based Ethernet Segment or virtual Ethernet Segment as in [I-D.ietf-bess-evpn-virtual-eth-segment], and including EVPN networks as in [RFC8214], [RFC7623] or [RFC8365].¶
While the Highest-Preference or Lowest-Preference DF Algorithm described in Section 4.1 is typically used in virtual Ethernet Segment scenarios where there is normally an individual Ethernet Tag per virtual Ethernet Segment, the existing [RFC7432] definition of an Ethernet Segment allows potentially up to thousands of Ethernet Tags on the same Ethernet Segment. If this is the case, if Highest-Preference or Lowest-Preference Algorithm is configured in all the PEs of the Ethernet Segment, the same PE will be the elected Designated Forwarder for all the Ethernet Tags of the Ethernet Segment. A potential way to achive a more granular load balancing is decribed below.¶
The Ethernet Segment is configured with an administrative Preference value and an administrative DF Algorithm, i.e., Highest-Preference or Lowest-Preference Algorithm. However, the administrative DF Algorithm (which is used to signal the DF Algorithm for the Ethernet Segment) MAY be overridden to a different operational DF Algorithm for a range of Ethernet Tags. With this option, the PE builds a list of candidate PEs ordered by Preference, however the Designated Forwarder for a given Ethernet Tag will be determined by the local overridden DF Algorithm.¶
For instance:¶
For Ethernet Segments attached to three or more PEs, any other logic that provides a fair distribution of the Designated Forwarder function among the PEs is valid, as long as that logic is consistent in all the PEs in the Ethernet Segment. It is important to note that, when a local policy overrides the Highest-Preference or Lowest-Preference signaled by all the PEs in the Ethernet Segment, this local policy MUST be consistent in all the PEs of the Ethernet Segment. If the local policy is inconsistent for a given Ethernet Tag in the Ethernet Segment, black-holes or packet duplication may occur on that Ethernet Tag.¶
As discussed in Section 1.2 (d), a capability to NOT preempt the existing Designated Forwarder (for all the Ethernet Tags in the Ethernet Segment) is required and therefore added to the Designated Forwarder Election extended community. This option allows a non-revertive behavior in the Designated Forwarder election.¶
Note that, when a given PE in an Ethernet Segment is taken down for maintenance operations, before bringing it back, the Preference may be changed in order to provide a non-revertive behavior. The DP bit and the mechanism explained in this section will be used for those cases when a former Designated Forwarder comes back up without any controlled maintenance operation, and the non-revertive option is desired in order to avoid service impact.¶
In Figure 3, we assume that based on the Highest-Preference Algorithm, PE3 is the Designated Forwarder for ESI2.¶
If PE3 has a link, EVC or node failure, PE2 would take over as Designated Forwarder. If/when PE3 comes back up again, PE3 will take over, causing some unnecessary packet loss in the Ethernet Segment.¶
The following procedure avoids preemption upon failure recovery (please refer to Figure 3). The procedure supports a non-revertive mode that can be used along with:¶
The procedure is described assuming Highest-Preference Algorithm in the Ethernet Segment, where local policy overrides the tie-breaker for a given Ethernet Tag, since this is the most complex case. The other two cases above are a sub-set of this one and the differences will be explained later.¶
When PE3's vES2 comes back up, PE3 will start a boot-timer (if booting up) or hold-timer (if the port or EVC recovers). That timer will allow some time for PE3 to receive the Ethernet Segment routes from PE1 and PE2. This timer is applied between the INIT and the DF_WAIT states in the Designated Forwarder Election Finite State Machine described in [RFC8584]. PE3 will then:¶
Select two "reference-PEs" among the Ethernet Segment routes in the virtual Ethernet Segment, the "Highest-PE" and the "Lowest-PE":¶
Check its own administrative Pref and compare it with the one of the Highest-PE and Lowest-PE that have the DP capability set in their Ethernet Segment routes. Depending on this comparison PE3 will send the Ethernet Segment route with a (Pref,DP) that may be different from its administrative (Pref,DP):¶
For any subsequent received update/withdraw in the Ethernet Segment, the PEs will go through the process described in (5) to select Highest and Lowest-PEs, now considering themselves as candidates. For instance, if PE2 fails, upon receiving PE2's Ethernet Segment route withdrawal, PE3 and PE1 will go through the selection of new Highest and Lowest-PEs (considering their own active Ethernet Segment route) and then they will run the Designated Forwarder Election.¶
If the Ethernet Segment uses Highest-Preference Algorithm (for all the Ethernet Tags, no local policy), the PEs only need to select the "Highest-PE" as the "reference-PE" (i.e., no need to select the "Lowest-PE"). If the Ethernet Segment uses Lowest-Preference Algorithm for all the Ethernet Tags, the PEs only need to select the "Lowest-PE" as the "reference-PE". The rest of the procedure remains the same.¶
Note that, irrespective of the DP bit, when a PE or Ethernet Segment comes back and the PE advertises a Designated Forwarder Election Algorithm different than the one configured in the rest of the PEs in the Ethernet Segment, all the PEs in the Ethernet Segment MUST fall back to the Default [RFC7432] Algorithm.¶
This document does not modify the use of the P and B bits in the Ethernet A-D per EVI routes [RFC8214] advertised by the PEs in the Ethernet Segment after running the Designated Forwarder Election, irrespective of the revertive or non-revertive behavior in the PE.¶
This document describes a Designated Forwarder Election Algorithm that provides absolute control (by configuration) over what PE is the Designated Forwarder for a given Ethernet Tag. While this control is desired in many situations, a malicious user that gets access to the configuration of a PE in the Ethernet Segment may change the behavior of the network. In other DF Algorithms such as HRW, the Designated Forwarder Election is more automated and cannot be determined by configuration. With Highest-Preference or Lowest-Preference as DF Algorithm, an attacker may change the configuration of the Preference value on a PE and Ethernet Segment, and impact the traffic going through that PE and Ethernet Segment.¶
The non-revertive capability described in this document may be seen as a security improvement over the regular EVPN revertive Designated Forwarder Election: an intentional link (or node) "flapping" on a PE will only cause service disruption once, when the PE goes to Non-Designated Forwarder state.¶
The document also describes how a local policy can override the Highest-Preference Algorithm for a range of Ethernet Tags in the Ethernet Segment. If the local policy is not consistent across all PEs in the Ethernet Segment and there is an Ethernet Tag that ends up with an inconsistent use of Highest-Preference or Lowest-Preference in different PEs, black-holing or packet duplication may occur for that Ethernet Tag.¶
This document solicits:¶
The allocation of two new values in the "DF Alg" registry created by [RFC8584] as follows:¶
Alg Name Reference ---- ----------------------------- ------------- 2 Highest-Preference Algorithm This document TBD Lowest-Preference Algorithm This document¶
The allocation of a new value in the "DF Election Capabilities" registry created by [RFC8584] for the 2-octet Bitmap field in the DF Election Extended Community (Border gateway Protocol (BGP) Extended Communities registry), as follows:¶
Bit Name Reference ---- ----------------------------- ------------- 0 D (Don't Preempt) Capability This document¶
The authors would like to thank Kishore Tiruveedhula and Sasha Vainshtein for their review and comments. Also thank you to Luc Andre Burdet and Stephane Litkowski for their thorough review and suggestions for a new DF Algorithm for lowest-preference.¶
In addition to the authors listed, the following individuals also contributed to this document:¶
Tony Przygienda, Juniper¶
Satya Mohanty, Cisco¶
Kiran Nagaraj, Nokia¶
Vinod Prabhu, Nokia¶
Selvakumar Sivaraj, Juniper¶
Sami Boutros, VMWare¶