Internet-Draft | CS-SR Policies | May 2023 |
Schmutzer, et al. | Expires 17 November 2023 | [Page] |
This document describes how Segment Routing (SR) policies can be used to satisfy the requirements for strict bandwidth guarantees, end-to-end recovery and persistent paths within a segment routing network. SR policies satisfying these requirements are called "circuit-style" SR policies (CS-SR policies).¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 17 November 2023.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Segment routing does allow for a single network to carry both typical IP (connection-less) services and connection-oriented transport services commonly referred to as "private lines". IP services typically require ECMP and TI-LFA, while transport services that normally are delivered via dedicated circuit-switched SONET/SDH or OTN networks do require:¶
Such a "transport centric" behavior is referred to as "circuit-style" in this document.¶
This document describes how SR policies [I-D.ietf-spring-segment-routing-policy] and the use of adjacency-SIDs defined in the SR architecture [RFC8402] together with a stateful Path Computation Element (PCE) [RFC8231] can be used to satisfy those requirements. It includes how end-to-end recovery and path integrity monitoring can be implemented.¶
SR policies that satisfy those requirements are called "circuit-style" SR policies (CS-SR policies).¶
The reference model for CS-SR policies is following the Segment Routing Architecture [RFC8402] and SR Policy Architecture [I-D.ietf-spring-segment-routing-policy] and is depicted in Figure 1.¶
By nature of CS-SR policies, paths will be computed and maintained by a stateful PCE defined in [RFC8231]. The stateful PCE provides a consistent simple mechanism for initializing the co-routed bidirectional end to end paths, performing bandwidth allocation control, as well as monitoring facilities to ensure SLA compliance for the live of the CS-SR Policy. When using a MPLS data plane [RFC8660], PCEP extensions defined in [RFC8664] will be used. When using a SRv6 data plane [RFC8754], PCEP extensions defined in [I-D.ietf-pce-segment-routing-ipv6] will be used.¶
In order to satisfy the requirements of CS-SR policies, each link in the topology MUST have:¶
An adjacency-SID which is:¶
When using a MPLS data plane [RFC8660] existing IGP extensions defined in [RFC8667] and [RFC8665] and BGP-LS defined in [RFC9085] can be used to distribute the topology information including those persistent and unprotected adjacency-SIDs.¶
When using a SRv6 data plane [RFC8754] the IGP extensions defined in [I-D.ietf-lsr-isis-srv6-extensions] and [I-D.ietf-lsr-ospfv3-srv6-extensions] and BGP-LS extensions in [I-D.ietf-idr-bgpls-srv6-ext] apply.¶
In a network, resources are represented by links of certain bandwidth. In a circuit switched network such as SONET/SDH, OTN or DWDM resources (timeslots or a wavelength) are allocated for a provisioned connection at the time of reservation even if no communication is present. In a packet switched network resources are only allocated when communication is present, i.e. packets are to be sent. This allows for the total reservations to exceed the link bandwidth as well in general for link congestion.¶
To satisfy the strict bandwidth commitment for CS-SR policies it must be ensured that packets carried by CS-SR policies can be at all times sent up to the reserved bandwidth on each hop along the path. This is done by:¶
For the later several approaches can be considered:¶
In addition CS-SR policy telemetry collection can be used to raise alarms when bandwidth utilization thresholds are passed or to request the reserved bandwidth to be adjusted.¶
A CS-SR policy has the following characteristics:¶
Multiple candidate paths in case of protection/restoration:¶
A CS-SR policy between A and Z is configured both on A (with Z as endpoint) and Z (with A as endpoint) as shown in Figure 1.¶
Both nodes A and Z act as PCC and delegate path computation to the PCE using the extensions defined in [RFC8664]. The PCRpt message sent from the headends to the PCE contains the following parameters:¶
LSPA object (section 7.11 of [RFC5440]) : to indicate that no local protection requirements¶
If the SR-policies are configured with more than one candidate path, a PCEP request is sent per candidate path. Each PCEP request does include the "SR Policy Association" object (type 6) as defined in [I-D.ietf-pce-segment-routing-policy-cp] to make the PCE aware of the candidate path belonging to the same policy.¶
The signaling extensions described in [I-D.sidor-pce-circuit-style-pcep-extensions] are used to ensure that¶
Bandwidth adjustment can be requested after initial creation by signaling both requested and operational bandwidth in the BANDWIDTH object but the PCE is not allowed to respond with a changed path.¶
As discussed in section 3.2 of [I-D.ietf-pce-multipath] it may be necessary to use load-balancing across multiple paths to satisfy the bandwidth requirement of a candidate path. In such a case the PCE will notify the PCC to install multiple segment lists using the signaling procedures described in section 5.3 of [I-D.ietf-pce-multipath].¶
A Segment Routed path defined by a segment list is constrained by maximum segment depth (MSD), which is the maximum number of segments a router can impose onto a packet. [RFC8491], [RFC8476], [RFC8814] and [RFC8664] provide the necessary capabilities for a PCE to determine the MSD capability of a router. The MSD constraint is typically resolved by leveraging a label stack reduction technique, such as using Node SIDs and/or BSIDs (SR architecture [RFC8402]) in a segment list, which represents one or many hops in a given path.¶
As described in Section 4, adjacency-SIDs without local protection are to be used for CS-SR policies to ensure no ECMP, no rerouting due to topological changes nor localized protection is being invoked on the traffic, as the alternate path may not be providing the desired SLA.¶
If a CS-SR Policy path requires SID List reduction, a Node SID cannot be utilized as it is eligible for traffic rerouting following IGP re-convergence. However, a BSID can be programmed to a transit node, if the following requirements are met:¶
This ensures that any CS-SR policies in which the BSID provides transit for do not get rerouted due to topological changes or protected due to failures. A BSID may be pre-programmed in the network or automatically injected in the network by a PCE.¶
Various protection and restoration schemes can be implemented. The terms “protection” and “restoration” are used with the same subtle distinctions outlined in section 1 of [RFC4872], [RFC4427] and [RFC3386] respectively.¶
The term "failure" is used to represent both "hard failures" such complete loss of connectivity detected by Section 7.1 or degradation, a packet loss ratio, beyond a configured acceptable threshold.¶
In the most basic scenario no protection nor restoration is required. The CS-SR policy has only one candidate path configured. This candidate path is established, activated (O field in LSP object is set to 2) and is carrying traffic.¶
In case of a failure the CS-SR policy will go down and traffic will not be recovered.¶
Typically two CS-SR policies are deployed either within the same network with disjoint paths or in two completely separate networks and the overlay service is responsible for traffic recovery.¶
For fast recovery against failures the CS-SR policy is configured with two candidate paths. Both paths are established but only the candidate with higher preference is activated (O field in LSP object is set to 2) and is carrying traffic. The candidate path with lower preference has its O field in LSP object set to 1.¶
Appropriate routing of the protect path diverse from the working path can be requested from the PCE by using the “Disjointness Association” object (type 2) defined in [RFC8800] in the PCRpt messages. The disjoint requirements are communicated in the “DISJOINTNESS-CONFIGURATION TLV”¶
The P bit may be set for first candidate path to allow for finding the best working path that does satisfy all constraints without considering diversity to the protect path.¶
The "Objective Function (OF) TLV" as defined in section 5.3 of [RFC8800] may also be added to minimize the common shared resources.¶
Upon a failure impacting the candidate path with higher preference carrying traffic, the candidate path with lower preference is activated immediately and traffic is now sent across it.¶
Protection switching is bidirectional. As described in Section 7.1, both headends will generate and receive their own loopback mode test packets, hence even a unidirectional failure will always be detected by both headends without protection switch coordination required.¶
Two cases are to be considered when the failure impacting the candidate path with higher preference is cleared:¶
Compared to 1:1 protection described in Section 6.2, this restoration scheme avoids pre-allocating protection bandwidth in steady state, while still being able to recover traffic flow in case of a network failure in a deterministic way (maintain required bandwidth commitment)¶
The CS-SR policy is configured with two candidate paths. The candidate path with higher preference is established, activated (O field in LSP object is set to 2) and is carrying traffic.¶
The second candidate path with lower preference is only established and activated (O field in LSP object is set to 2) upon a failure impacting the first candidate path in order to send traffic over an alternate path through the network around the failure with potentially relaxed constraints but still satisfying the bandwidth commitment.¶
The second candidate path is generally only requested from the PCE and activated after a failure, but may also be requested and pre-established during CS-SR policy creation with the downside of bandwidth being set aside ahead of time.¶
As soon as failure(s) that brought the first candidate path down are cleared, the second candidate path is getting deactivated (O field in LSP object is set to 1) or torn down. The first candidate path is activated (O field in LSP object is set to 2) and traffic sent across it.¶
Restoration and reversion behavior is bidirectional. As described in Section 7.1, both headends use connectivity verification in loopback mode and therefore even in case of unidirectional failures both headends will detect the failure or clearance of the failure and switch traffic away from the failed or to the recovered candidate path.¶
For further resiliency in case of multiple concurrent failures that could affect both candidate paths of 1:1 protection described in Section 6.2, a third candidate path with a preference lower than the other two candidate paths is added to the CS-SR policy.¶
The third candidate path enables restoration and will generally only be established, activated (O field in LSP object is set to 2) and carry traffic after failure(s) have impacted both the candidate path with highest and second highest preference.¶
The third candidate path may also be requested and pre-computed already whenever either the first or second candidate path went down due to a failure with the downside of bandwidth being set aside ahead of time.¶
As soon as failure(s) that brought either the first or second candidate path down are cleared the third candidate path is getting deactivated (O field in LSP object is set to 1), the candidate path that recovered is activated (O field in LSP object is set to 2) and traffic sent across it.¶
Again restoration and reversion behavior is bidirectional. As described in Section 7.1, both headends use connectivity verification in loopback mode and therefore even in case of unidirectional failures both headends will detect the failure or clearance of the failure and switch traffic away from the failed or to the recovered candidate path.¶
The proper operation of each segment list is validated by both headends using STAMP in loopback measurement mode as described in section 4.2.3 of [I-D.ietf-spring-stamp-srpm].¶
As the STAMP test packets are including both the segment list of the forward and reverse path, standard segment routing data plane operations will make those packets get switched along the forward path to the tailend and along the reverse path back to the headend.¶
The headend forms the bidirectional SR Policy association using the procedure described in [I-D.ietf-pce-sr-bidir-path] and receives the information about the reverse segment list from the PCE as described in section 4.5 of [I-D.ietf-pce-multipath]¶
The same STAMP session is used to estimate round-trip loss as described in section 5 of [I-D.ietf-spring-stamp-srpm].¶
The same STAMP session used for connectivity verification can be used to measure delay. As loopback mode is used only round-trip delay is measured and one-way has to be derived by dividing the round-trip delay by two.¶
A stateful PCE is in sync with the network topology and the CS-SR Policies provisioned on the headend routers. As described in Section 4 a path must not be automatically recomputed after or optimized for topology changes. However there may be a requirement for a PCE to tear down a path if the path no longer satisfies the original requirements, detected by PCE, such as insufficient bandwidth, diversity constraint no longer met or latency constraint exceeded.¶
The PCC may measure the actual bandwidth utilization of a CS-SR policy to take local action and/or report it to the PCE. Typical actions are raising alarms or adjusting the reserved bandwidth.¶
For a CS-SR policy configured with multiple candidate paths, a PCC may switch to another candidate path if the PCE decided to tear down the active candidate path.¶
It is very common to allow operators to trigger a switch between candidate paths even if no failure is present. I.e. to proactively drain a resource for maintenance purposes. Operator triggered switching between candidate paths is unidirectional and has to be requested on both headends.¶
While no automatic re-optimization or pre-computation of CS-SR policy candidate paths is allowed as specified in Section 4, network operators trying to optimize network utilization may explicitly request a candidate path to be re-computed at a certain point in time.¶
TO BE ADDED¶
This document has no IANA actions.¶
The author's want to thank Samuel Sidor, Mike Koldychev, Rakesh Gandhi and Tarek Saad for providing their review comments and all contributors for their inputs and support.¶