Internet-Draft | CS-SR Policies | October 2023 |
Schmutzer, et al. | Expires 25 April 2024 | [Page] |
This document describes how Segment Routing (SR) policies can be used to satisfy the requirements for bandwidth, end-to-end recovery and persistent paths within a segment routing network. SR policies satisfying these requirements are called "circuit-style" SR policies (CS-SR policies).¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 25 April 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Segment routing does allow for a single network to carry both typical IP (connection-less) services and connection-oriented transport services commonly referred to as "private lines". IP services typically require ECMP and TI-LFA, while transport services delivered via pseudowires (defined by the PWE3 and PALS workgroups) do require:¶
Persistent end-to-end traffic engineered paths that provide predictable and identical latency in both directions¶
A requested amount of bandwidth per path to ensure no impact on the Service Level Agreement (SLA) due to changing network load from other services¶
Fast end-to-end protection and restoration mechanisms¶
Monitoring and maintenance of path integrity¶
Data plane remaining up while control plane is down¶
Such a "transport centric" behavior is referred to as "circuit-style" in this document.¶
This document describes how SR policies [I-D.ietf-spring-segment-routing-policy] and the use of adjacency-SIDs defined in the SR architecture [RFC8402] together with a stateful Path Computation Element (PCE) [RFC8231] can be used to satisfy those requirements. It includes how end-to-end recovery and path integrity monitoring can be implemented.¶
SR policies that satisfy those requirements are called "circuit-style" SR policies (CS-SR policies).¶
BSID : Binding Segment Identifier¶
CS-SR : Circuit-Style Segment Routing¶
ID : Identifier¶
LSP : Label Switched Path¶
LSPA : LSP attributes¶
OAM : Operations, Administration and Maintenance¶
OF : Objective Function¶
PCE : Path Computation Element¶
PCEP : Path Computation Element Communication Protocol¶
PT : Protection Type¶
SID : Segment Identifier¶
SLA : Service Level Agreement¶
SR : Segment Routing¶
STAMP : Simple Two-Way Active Measurement Protocol¶
TI-LFA : Topology Independent Loop Free Alternate¶
TLV : Type Length Value¶
The reference model for CS-SR policies is following the Segment Routing Architecture [RFC8402] and SR Policy Architecture [I-D.ietf-spring-segment-routing-policy] and is depicted in Figure 1.¶
By nature of CS-SR policies, paths will be computed and maintained by a centralized entity providing a consistent simple mechanism for initializing the co-routed bidirectional end to end paths, performing bandwidth allocation control, as well as monitoring facilities to ensure SLA compliance for the live of the CS-SR Policy.¶
When using PCEP as the communication protocol on the endpoints, the centralized entity is a stateful PCE defined in [RFC8231]. When using a MPLS data plane [RFC8660], PCEP extensions defined in [RFC8664] will be used. When using a SRv6 data plane [RFC8754], PCEP extensions defined in [I-D.ietf-pce-segment-routing-ipv6] will be used.¶
When using BGP as the communication protocol on the endpoints, the centralized entity does perform the same role. BGP extensions defined in [I-D.ietf-idr-segment-routing-te-policy] will be used.¶
In order to satisfy the requirements of CS-SR policies, each link in the topology MUST have:¶
An adjacency-SID which is:¶
The bandwidth available for CS-SR policies specified¶
A per-hop behavior ([RFC3246] or [RFC2597]) that ensures that the specified bandwidth is available to CS-SR policies at all times independent of any other traffic¶
When using a MPLS data plane [RFC8660] existing IGP extensions defined in [RFC8667] and [RFC8665] and BGP-LS defined in [RFC9085] can be used to distribute the topology information including those persistent and unprotected adjacency-SIDs.¶
When using a SRv6 data plane [RFC8754] the IGP extensions defined in [I-D.ietf-lsr-isis-srv6-extensions] and [I-D.ietf-lsr-ospfv3-srv6-extensions] and BGP-LS extensions in [I-D.ietf-idr-bgpls-srv6-ext] apply.¶
In a network, resources are represented by links of certain bandwidth. In a circuit switched network such as SONET/SDH, OTN or DWDM resources (timeslots or a wavelength) are allocated for a provisioned connection at the time of reservation even if no communication is present. In a packet switched network resources are only allocated when communication is present, i.e. packets are to be sent. This allows for the total reservations to exceed the link bandwidth as well in general for link congestion.¶
To satisfy the bandwidth requirement for CS-SR policies it must be ensured that packets carried by CS-SR policies can be at all times sent up to the reserved bandwidth on each hop along the path. This is done by:¶
Firstly, CS-SR policy bandwidth reservations per link must be limited to euqal or less than the physical link bandwidth.¶
Secondly, ensuring traffic for each CS-SR policy is limited to the bandwidth reserved for that CS-SR policy by traffic policing or shaping¶
Thirdly, ensuring that during times of link congestion only non-CS-SR policy traffic is being buffered or dropped.¶
For the later several approaches can be considered:¶
Allocate a dedicated physical link of bandwidth P to CS-SR policies and allow CS-SR reservations up to bandwidth C. Consider bandwidth N allocated for network control, ensure that P - N >= C¶
Allocate a dedicate logical link (i.e. 801.q VLAN on ethernet) to CS-SR policies on a physical link of bandwidth P. Limit the total utilization across all other logical links to bandwidth O by traffic policing or shaping and ensure that P - N - O >= C¶
Allocate a dedicated Diffserv codepoint and queue to CS-SR policies and limit the total utilization across all other queues to bandwidth O by traffic policing or shaping and ensure that P -N - O >= C¶
Allocate a dedicate Diffserv codepoint and strict priority queue to CS-SR policies and limit the total utilization across all priority queues of higher or equal priority to bandwidth O by traffic policing or shaping and ensure that P - N - O >= C¶
Allocate a dedicate Diffserv codepoint and a strict priority queue with a priority higher than all other queues to CS-SR policies and limit the utilization of that priority queue by traffic policing to C <= P - N¶
In addition CS-SR policy telemetry collection can be used to raise alarms when bandwidth utilization thresholds are passed or to request the reserved bandwidth to be adjusted.¶
A CS-SR policy has the following characteristics:¶
Requested bandwidth : bandwidth to be reserved for the CS-SR policy¶
Bidirectional co-routed : a CS-SR policy between A and Z is an association of an SR-Policy from A to Z and an SR-Policy from Z to A following the same path(s)¶
Deterministic and persistent paths : segment lists with strict hops using unprotected adjacency-SIDs¶
Not automatically recomputed or reoptimized : the SID list of a candidate path must not change automatically to a SID list representing a different path (for example upon topology change)¶
Multiple candidate paths in case of protection/restoration:¶
Connectivity verification and performance measurement is activated on each candidate path (Section 7)¶
Considering the scenario illustrated in Figure 1 a CS-SR policy between A and Z is configured both on A (with Z as endpoint) and Z (with A as endpoint).¶
Both nodes A and Z act as PCC and delegate path computation to the PCE using PCEP with the extensions defined in [RFC8664] and the procedure described in Section 5.7.1 of [RFC8231]. The PCRpt message sent from the headends to the PCE contains the following parameters:¶
BANDWIDTH object (Section 7.7 of [RFC5440]) : to indicate the requested bandwidth¶
LSPA object (section 7.11 of [RFC5440]) : to indicate that no local protection requirements¶
L flag set to 0 : no local protection¶
E flag set to 1 : protection enforcement (section 5 of [I-D.ietf-pce-local-protection-enforcement])¶
If the SR-policies are configured with more than one candidate path, a PCEP request is sent per candidate path. Each PCEP request does include the "SR Policy Association" object (type 6) as defined in [I-D.ietf-pce-segment-routing-policy-cp] to make the PCE aware of the candidate path belonging to the same policy.¶
The signaling extensions described in [I-D.sidor-pce-circuit-style-pcep-extensions] are used to ensure that¶
Path determinism is achieved by the PCE only using segment lists representing a strict hop by hop path using unprotected adjacency-SIDs.¶
Path persistency across node reloads in the network is achieved by the PCE only including manually configured adjacency-SIDs in its path computation response.¶
Persistency across network changes is achieved by the PCE not performing periodic nor network event triggered re-optimization.¶
Bandwidth adjustment can be requested after initial creation by signaling both requested and operational bandwidth in the BANDWIDTH object but the PCE is not allowed to respond with a changed path.¶
As discussed in section 3.2 of [I-D.ietf-pce-multipath] it may be necessary to use load-balancing across multiple paths to satisfy the bandwidth requirement of a candidate path. In such a case the PCE will notify the PCC to install multiple segment lists using the signaling procedures described in section 5.3 of [I-D.ietf-pce-multipath].¶
Again considering the scenario illustrated in Figure 1, there is no CS-SR policy configuration required on A nor Z in order to create the CS-SR policy between A and Z.¶
The centralized controller is instructed (i.e. by an application via a API call) to create the CS-SR policy, for which the controller does perform path computation and is requesting A via BGP to instante a SR-policy (with Z as endpoint) and requesting Z via BGP to instantiate a SR-policy (with Z as endpoint).¶
To instantiate the SR-policies in A and Z the BGP extensions defined in [I-D.ietf-idr-segment-routing-te-policy] are used.¶
No signaling extensions are required for the following:¶
Path determinism is achieved by the controller only using segment lists representing a strict hop by hop path using unprotected adjacency-SIDs.¶
Path persistency across node reloads in the network is achieved by the controller only including manually configured adjacency-SIDs in its path computation response.¶
Persistency across network changes is achieved by the controller not performing periodic nor network event triggered re-optimization.¶
If there are more than one candidate paths per SR-policy required, multiple NLRIs with different distinguisher values (see section 2.1 of [I-D.ietf-idr-segment-routing-te-policy]) have to be included in the BGP UPDATE message.¶
To achieve load-balancing across multiple paths to satisfy the bandwidth requirement of a candidate path, multiple Segment List Sub-TLVs have to be included in the SR Policy Sub-TLV. See section 2.1 of [I-D.ietf-idr-segment-routing-te-policy]¶
The endpoints A and Z report the SR-policy states back to the centralized controller via BGP-LS using the extension defined in [I-D.ietf-idr-bgp-ls-sr-policy].¶
A Segment Routed path defined by a segment list is constrained by maximum segment depth (MSD), which is the maximum number of segments a router can impose onto a packet. [RFC8491], [RFC8476], [RFC8814] and [RFC8664] provide the necessary capabilities for a PCE to determine the MSD capability of a router. The MSD constraint is typically resolved by leveraging a label stack reduction technique, such as using Node SIDs and/or BSIDs (SR architecture [RFC8402]) in a segment list, which represents one or many hops in a given path.¶
As described in Section 4, adjacency-SIDs without local protection are to be used for CS-SR policies to ensure no ECMP, no rerouting due to topological changes nor localized protection is being invoked on the traffic, as the alternate path may not be providing the desired SLA.¶
If a CS-SR Policy path requires SID List reduction, a Node SID cannot be utilized as it is eligible for traffic rerouting following IGP re-convergence. However, a BSID can be programmed to a transit node, if the following requirements are met:¶
The BSID is unprotected, hence only has one candidate path¶
The BSID follows the rerouting and optimization characteristics defined in Section 4 which implies the SID list of the candidate path MUST only use unprotected adjacency-SIDs.¶
This ensures that any CS-SR policies in which the BSID provides transit for do not get rerouted due to topological changes or protected due to failures. A BSID may be pre-programmed in the network or automatically injected in the network by a PCE.¶
Various protection and restoration schemes can be implemented. The terms "protection" and "restoration" are used with the same subtle distinctions outlined in section 1 of [RFC4872], [RFC4427] and [RFC3386] respectively.¶
Protection : another candidate path is computed and fully established in the data plane and ready to carry traffic¶
Restoration : a candidate path may be computed and may be partially established but is not ready to carry traffic¶
The term "failure" is used to represent both "hard failures" such complete loss of connectivity detected by Section 7.1 or degradation, a packet loss ratio, beyond a configured acceptable threshold.¶
In the most basic scenario no protection nor restoration is required. The CS-SR policy has only one candidate path configured. This candidate path is established, activated and is carrying traffic.¶
When using PCEP, a PCRpt message is sent from the PCC to the PCE with the O field in the LSP object set to 2.¶
When using BGP, a BGP-LS update with a SP Policy Candidate Path NLRI is sent from the endpoint to the centralized controller having¶
C flag set to 1 to indicate the candidate path was provisioned by the controller¶
A flag set to 1 to indicate the candidate path is active and carrying traffic¶
In case of a failure along the path the CS-SR policy will go down and traffic will not be recovered.¶
Typically two CS-SR policies are deployed either within the same network with disjoint paths or in two completely separate networks and the overlay service is responsible for traffic recovery.¶
For fast recovery against failures the CS-SR policy has two candidate paths. Both paths are established but only the candidate with higher preference is activated and is carrying traffic.¶
When using PCEP, the PCRpt message for the candidate path with higher preference will have the O field in the LSP object set to 2. For the candidate path with the lower preference the O field in the LSP object is set to 1.¶
Appropriate routing of the protect path diverse from the working path can be requested from the PCE by using the "Disjointness Association" object (type 2) defined in [RFC8800] in the PCRpt messages. The disjoint requirements are communicated in the "DISJOINTNESS-CONFIGURATION TLV"¶
L bit set to 1 for link diversity¶
N bit set to 1 for node diversity¶
S bit set to 1 for SRLG diversity¶
T bit set to enforce strict diversity¶
The P bit may be set for first candidate path to allow for finding the best working path that does satisfy all constraints without considering diversity to the protect path.¶
The "Objective Function (OF) TLV" as defined in section 5.3 of [RFC8800] may also be added to minimize the common shared resources.¶
When using BGP, the controller is already aware of the disjoint requirements and does consider them while computing both paths. Two NLRIs with different distinguisher values and different preference values are included in the BGP UPDATE sent to the headend routers.¶
A BGP-LS update is sent to the controller with a SP Policy Candidate Path NLRI for the candidate path with higher preference with¶
C flag set to 1 to indicate that candidate path was provisioned by the controller¶
A flag set to 1 to indicate the candidate path is active and carrying traffic¶
and another SP Policy Candidate Path NLRI for the candidate path with lower preference with¶
C flag set to 1 to indicate the candidate path was provisioned by the controller¶
B flag set to 1 to indicate the role of backup path¶
Upon a failure impacting the candidate path with higher preference carrying traffic, the candidate path with lower preference is activated immediately and traffic is now sent across it.¶
Protection switching is bidirectional. As described in Section 7.1, both headends will generate and receive their own loopback mode test packets, hence even a unidirectional failure will always be detected by both headends without protection switch coordination required.¶
Two cases are to be considered when the failure impacting the candidate path with higher preference is cleared:¶
Revertive switching : re-activate the candidate path, change O field from 0 to 2 and start sending traffic over it¶
Non-revertive switching : do not activate the candidate path, change O field from 0 to 1, keep the second candidate path active with O field set to 2 and continue sending traffic over it¶
Compared to 1:1 protection described in Section 6.2, this restoration scheme avoids pre-allocating protection bandwidth in steady state, while still being able to recover traffic flow in case of a network failure in a deterministic way (maintain required bandwidth commitment)¶
When using PCEP, the CS-SR policy is configured with two candidate paths. The candidate path with higher preference is established, activated (O field in LSP object is set to 2) and is carrying traffic.¶
The second candidate path with lower preference is only established and activated (O field in LSP object is set to 2) upon a failure impacting the first candidate path in order to send traffic over an alternate path through the network around the failure with potentially relaxed constraints but still satisfying the bandwidth commitment.¶
The second candidate path is generally only requested from the PCE and activated after a failure, but may also be requested and pre-established during CS-SR policy creation with the downside of bandwidth being set aside ahead of time.¶
As soon as failure(s) that brought the first candidate path down are cleared, the second candidate path is getting deactivated (O field in LSP object is set to 1) or torn down. The first candidate path is activated (O field in LSP object is set to 2) and traffic sent across it.¶
When using BGP, the controller does compute one path and does include one NLRI in the BGP UPDATE message sent to the headend routers to instantiate the CS-SR policy with one candidate path active and carrying traffic.¶
A BGP-LS update with a SP Policy Candidate Path NLRI is sent to the controller with¶
C flag set to 1 to indicate the candidate path was provisioned by the controller¶
A flag set to 1 to indicate the candidate path is active and carrying traffic¶
Upon the controller detecting the failure of the CS-SR policy's candidate path, another path is computed and added as second candidate path to the CS-SR policy by sending a BGP UPDATE message to the headend routers with a NLRI distinguisher value being different and preference being lower compared to the first candidate path.¶
A BGP-LS update with a SP Policy Candidate Path NLRI for the candidate path with higher preference is sent to the controller with¶
A flag is cleared to indicate the candidate path is no longer active and not carrying traffic anymore¶
and another SP Policy Candidate Path NLRI for the candidate path with lower preference with¶
C flag set to 1 to indicate the candidate path was provisioned by the controller¶
A flag set to 1 to indicate the candidate path is active and carrying traffic¶
The second candidate path is generally only instantiated by the controller and activated after a failure, but may also be instantiated and pre-established during CS-SR policy creation with the downside of bandwidth being set aside ahead of time. If so, a BGP-LS update with a SP Policy Candidate Path NLRI is sent to the controller with¶
C flag set to 1 to indicate the candidate path was provisioned by the controller¶
B flag set to 1 to indicate the role of backup path¶
Once the controller has detected the failure(s) that brought the first candidate path down are cleared, the second candidate path is getting removed by a BGP UPDATE message withdrawing the NLRI of the second candidate path.¶
A BGP-LS update with a SP Policy Candidate Path NLRI for the remaining candidate path is sent to the controller with¶
A flag set to 1 to indicate the candidate path became active and is carrying traffic again¶
Restoration and reversion behavior is bidirectional. As described in Section 7.1, both headends use connectivity verification in loopback mode and therefore even in case of unidirectional failures both headends will detect the failure or clearance of the failure and switch traffic away from the failed or to the recovered candidate path.¶
For further resiliency in case of multiple concurrent failures that could affect both candidate paths of 1:1 protection described in Section 6.2, a third candidate path with a preference lower than the other two candidate paths is added to the CS-SR policy to enable restoration.¶
When using PCEP, the third candidate path will generally only be established, activated (O field in LSP object is set to 2) and carry traffic after failure(s) have impacted both the candidate path with highest and second highest preference.¶
The third candidate path may also be requested and pre-computed already whenever either the first or second candidate path went down due to a failure with the downside of bandwidth being set aside ahead of time.¶
As soon as failure(s) that brought either the first or second candidate path down are cleared the third candidate path is getting deactivated (O field in LSP object is set to 1), the candidate path that recovered is activated (O field in LSP object is set to 2) and traffic sent across it.¶
When using BGP, the third candidate path will generally only be instantiated by the controller and activated after failure(s) have impacted both the candidate path with highest and second highest preference, but may also be instantiated and pre-established during CS-SR policy creation with the downside of bandwidth being set aside ahead of time.¶
Assuming the case where both candidate paths are down, a BGP-LS update is sent with SP Policy Candidate Path NLRIs for the first and second candidate path with¶
A flag cleared¶
and a SP Policy Candidate Path NLRI for the third candidate path with¶
C flag set to 1 to indicate the candidate path was provisioned by the controller¶
A flag set to 1 to indicate the candidate path is active and carrying traffic¶
Assuming the case where only one candidate path is down, a BGP-LS update is sent with a SP Policy Candidate Path NLRI for the failed candidate path with¶
A flag cleared¶
a SP Policy Candidate Path NLRI for the second candidate path with¶
A flag set to 1 to indicate it is active and carrying traffic network¶
and another SP Policy Candidate Path NLRI for the newly installed third candidate path with¶
C flag set to 1 to indicate the candidate path was provisioned by the controller¶
B flag set to 1 to indicate the role of backup path¶
Again restoration and reversion behavior is bidirectional. As described in Section 7.1, both headends use connectivity verification in loopback mode and therefore even in case of unidirectional failures both headends will detect the failure or clearance of the failure and switch traffic away from the failed or to the recovered candidate path.¶
The proper operation of each segment list is validated by both headends using STAMP in loopback measurement mode as described in section 4.2.3 of [I-D.ietf-spring-stamp-srpm].¶
As the STAMP test packets are including both the segment list of the forward and reverse path, standard segment routing data plane operations will make those packets get switched along the forward path to the tailend and along the reverse path back to the headend.¶
When using PCEP, the headend forms the bidirectional SR Policy association using the procedure described in [I-D.ietf-pce-sr-bidir-path] and receives the information about the reverse segment list from the PCE as described in section 4.5 of [I-D.ietf-pce-multipath]¶
When using BGP, the controller does inform the headend routers about the reverse segment list using the Reverse Segment List Sub-TLV defined in section 4.1 of [I-D.ietf-idr-sr-policy-path-segment].¶
The same STAMP session is used to estimate round-trip loss as described in section 5 of [I-D.ietf-spring-stamp-srpm].¶
The same STAMP session used for connectivity verification can be used to measure delay. As loopback mode is used only round-trip delay is measured and one-way has to be derived by dividing the round-trip delay by two.¶
A stateful PCE/controller is in sync with the network topology and the CS-SR Policies provisioned on the headend routers. As described in Section 4 a path must not be automatically recomputed after or optimized for topology changes. However there may be a requirement for the stateful PCE/controller to tear down a path if the path no longer satisfies the original requirements, detected by stateful PCE/controller, such as insufficient bandwidth, diversity constraint no longer met or latency constraint exceeded.¶
The headend may measure the actual bandwidth utilization of a CS-SR policy to take local action and/or report it as requested bandwidth via PCEP or BGP-LS to the stateful PCE/controller. Typical actions are raising alarms or adjusting the reserved bandwidth.¶
For a CS-SR policy configured with multiple candidate paths, a headend may switch to another candidate path if the stateful PCE/controller decided to tear down the active candidate path.¶
It is very common to allow operators to trigger a switch between candidate paths even if no failure is present. I.e. to proactively drain a resource for maintenance purposes. Operator triggered switching between candidate paths is unidirectional and has to be requested on both headends.¶
While no automatic re-optimization or pre-computation of CS-SR policy candidate paths is allowed as specified in Section 4, network operators trying to optimize network utilization may explicitly request a candidate path to be re-computed at a certain point in time.¶
TO BE ADDED¶
This document has no IANA actions.¶
The author's want to thank Samuel Sidor, Mike Koldychev, Rakesh Gandhi and Tarek Saad for providing their review comments and all contributors for their inputs and support.¶