TOC |
|
By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 22, 2008.
Where the payload carried over a pseudowire carries a number of identifiable flows it can in some circumstances be desirable to carry those flows over the equal cost multiple paths that exist in the packet switched network. This draft describes two methods of achieving that, the one by including an additional label in the label stack, the other by using a block of alternative pseudowire labels.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.) [1].
1.
Introduction
1.1.
ECMP in Label Switched Routers
1.2.
Load Balance Label
1.3.
Pseudowire Label Block
2.
Native Service Processing Function
3.
Pseudowire Forwarder
3.1.
Encapsulation when using LB Label
4.
Load Balance Signaling
4.1.
Signaling Load Balance Label
4.1.1.
Structure of Load Balance Label TLV
4.2.
Signalling Label Block
4.2.1.
Structure of Multiple VC TLV
5.
OAM
6.
Applicability
7.
Comparision of the Approaches
8.
Security Considerations
9.
IANA Considerations
10.
Congestion Considerations
11.
Acknowledgements
12.
References
12.1.
Normative References
12.2.
Informative References
§
Authors' Addresses
§
Intellectual Property and Copyright Statements
TOC |
A pseudowire is defined as a mechanism that carries the essential elements of an emulated service from one provider edge (PE) to one or more other PEs over a packet switched network (PSN) [10] (Bryant, S. and P. Pate, “Pseudo Wire Emulation Edge-to-Edge (PWE3) Architecture,” March 2005.).
A pseudowire is normally transported over one single network path, even if multiple ECMP paths exit between the ingress and egress PEs[2] (Bryant, S., Swallow, G., Martini, L., and D. McPherson, “Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for Use over an MPLS PSN,” February 2006.) [3] (Swallow, G., Bryant, S., and L. Andersson, “Avoiding Equal Cost Multipath Treatment in MPLS Networks,” June 2007.). This is required to preserve the characteristics of the emulated service (e.g. avoid misordering for example for SAToP pseudowire’s [4] (Vainshtein, A. and YJ. Stein, “Structure-Agnostic Time Division Multiplexing (TDM) over Packet (SAToP),” June 2006.)). Except in the extreme case described in Section 6, the new capability proposed in this draft does not change this default property of pseudowires.
Some pseudowire's are used to transport large volumes of IP traffic between routers at two locations. One example of this is the use of an Ethernet pseudowire to create a virtual direct link between a pair of routers. Such pseudowire’s may carry from hundred’s of Mbps to Gbps of traffic. Such pseudowire’s do not require ordering to be preserved between packets of the pseudowire. They only require ordering to be preserved within the context of each individual transported IP flow. Some operators have requested the ability to explicitly configure such a pseudowire to leverage the availability of multiple ECMP paths. This allows for better capacity planning as the statistical multiplexing of a larger number of smaller flows is more efficient than with a smaller set of larger flows. Although Ethernet is used as an example above, the mechanisms decsribed in this draft are general mechanisms that may be applied to any pseudowire type in which there are identifable flows, and in which the there is no requirement to preserve the order between flows.
TOC |
Label switched routers commonly hash the label stack or some elements of the label stack as a method of discrminating between flows, in order to distribute those flows over the available equal cost multiple paths that exist in the network. Since the label at the bottom of stack is usually the label most closely associated with the flow, this normally provides the greatest entropy and hence is normally included in the hash. The describes two methods of setting bottom of stack label in order to facilitate the load balancing of the flows with a pseudowire over the available ECMPs.
The load balance label mechanism is the more general and more powerful method. The pseudowire label block is an OPTINAL method that may be negotiated by PEs unable to support the additional label needed by the load balance label method.
TOC |
In this approach an additional label is interposed between the pseudowire label and the control word, or if the control word is not present, between the pseudowire label and the pseudowire payload. This additional label is called the pseudowire load balancing label (LB label). Indivisible flows within the pseudowire MUST be mapped to the same pseudowire LB label by the ingress PE. The pseudowire load balancing label stimulates the correct ECMP load balancing behaviour in the PSN. On receipt of the pseudowire packet at the egress PE (which knows this additional label is present) the label is discarded without processing.
Note that there is no protocol constraint on the value of a LB label.
TOC |
In this approach a contiguous block of pseudowire labels are allocated to each pseudowire by the egress PE. Flows carried by the pseudowire labels are spread over the set of pseudowire labels, such that Indivisible flows within the pseudowire MUST be mapped to the same pseudowire label by the ingress PE. The use of a multiplicity of alternate pseudowire labels stimulates the correct ECMP load balancing behaviour in the PSN.
Note that the pseudowire labels MUST be allocated as a contiguous block.
Support for this method is OPTIONAL.
TOC |
The Native Service Processing (NSP) function is a component of a PE that has knowledge of the structure of the emulated service and is able to take action on the service outside the scope of the pseudowire. In this case it is required that the NSP in the ingress PE identify flows, or groups of flows within the service, and indicate the flow (group) identity of each packet as it is passed to the pseudowire forwarder. Since this is an NSP function, by definition, the method used to identify a flow is outside the scope of the pseudowire design. Similarly, since the NSP is internal to the PE, the method of flow indication to the pseudowire forwarder is outside the scope of this document
TOC |
The pseudowire forwarder must be provided with a method of mapping flows to load balanced paths. Where the method chosen is the label block method, the forwarder uses the flow information provided by the NSP to allocate a flow to one of the VC labels in the load balancing label block. In all other respects forwarder operation is identical to the normal single VC label case.
When the load balance label method is used the forwarder must generate a label for the flow or group of flows. How the load balance label values are determined is outside the scope of this document, however the load balance label allocated to a flow SHOULD remain constant. It is recommended that the method chosen to generate the load balancing labels introduces a high degree of entropy in their values, to maximise the entropy presented to the ECMP path selection mechanism in the LSRs in the PSN, and hence distribute the flows as evenly as possible over the available PSN ECMP paths. The forwarder at the ingress PE prepends the pseudowire control word (if applicable), then prepends either the pseudowire load balancing label, followed by the pseudowire label. Alternatively it prepends the pseudowire control word (if applicable), then selects and appends one of the allocated pseudowire labels.
The forwarder at the egress PE uses the pseudowire label to identify the pseudowire. If the label block approach is used operation is identical to the current non-load balanced case. Alternatively, from the pseudowire context, the egress PE can determine whether a pseudowire load balancing label is present, and if one is present, the label is discarded.
All other pseudowire forwarding operations are unmodified by the inclusion of the pseudowire load balancing label.
TOC |
The PWE3 Protocol Stack Reference Model modified to include pseudowire LB label is shown in Figure 1 (PWE3 Protocol Stack Reference Model) below
+-------------+ +-------------+ | Emulated | | Emulated | | Ethernet | | Ethernet | | (including | Emulated Service | (including | | VLAN) |<==============================>| VLAN) | | Services | | Services | +-------------+ +-------------+ | Load balance| | Load balance| +-------------+ Pseudowire +-------------+ |Demultiplexer|<==============================>|Demultiplexer| +-------------+ +-------------+ | PSN | PSN Tunnel | PSN | | MPLS |<==============================>| MPLS | +-------------+ +-------------+ | Physical | | Physical | +-----+-------+ +-----+-------+
Figure 1: PWE3 Protocol Stack Reference Model |
The encapsulation of a pseudowire with a pseudowire LB label is shown in Figure 2 (Encapsulation of a pseudowire with a pseudowire load balancing label) below
+-------------------------------+ | MPLS Tunnel label(s) | n*4 octets (four octets per label) +-------------------------------+ | PW label | 4 octets +-------------------------------+ | Load Balance label | 4 octets +-------------------------------+ | Optional Control Word | 4 octets +-------------------------------+ | Payload | | | | | n octets | | +-------------------------------+
Figure 2: Encapsulation of a pseudowire with a pseudowire load balancing label |
TOC |
This section describes the signalling procedures when [5] (Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, “Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP),” April 2006.) is used.
TOC |
When using the signalling procedures in [5] (Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, “Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP),” April 2006.), there is a Pseudowire Interface Parameter Sub-TLV type used to signal the desire to include the load balance label when advertising a VC label.
The presence of this parameter indicates that the egress PE requests that the ingress PE place a load balance label between the pseudowire label and the control word (or is the control word is not present between the pseudowire label and the pseudowire payload).
If the ingress PE recognises load balance label indicator parameter but does not wish to include the load balance label, it need only issue its own label mapping message for the opposite direction without including the load balance label Indicator. This will prevent inclusion of the load balance label in either direction.
If PWE3 signalling [5] (Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, “Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP),” April 2006.) is not in use for a pseudowire, then whether the load balance label is used MUST be identically provisioned in both PEs at the pseudowire endpoints. If there is no provisioning support for this option, the default behaviour is not to include the load balance label.
Note that what is signaled is the desire to include the load balance lable in the label stack. The value of the label is a local matter for the ingress PE, and the label value itself is not signaled.
TOC |
The structure of the load balance label TLV is shown in Figure 3 (Multiple VC TLV).
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LBL | Length | must be zero | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: Multiple VC TLV |
Where:
TOC |
When using the signalling procedures in [5] (Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, “Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP),” April 2006.), there is a Pseudowire Interface load balance sub-TLV type used to signal the desire load balance the pseudowire over a block of pseudowire VC labels.
The presence of this TLV indicates that the egress PE requests that the ingress PE distribute the ingress flows present in the pseudowire over the block of VC labels sent in this TLV.
This mechanism is fully backwards compatible. If the ingress PE does not recognise the load balance TLV or does not wish to use it, it simply ignores this TLV.
TOC |
The field structure is defined in Figure 4 (Multiple VC TLV).
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MultipleVC | Length=12 | must be zero | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | First Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Multiple VC TLV |
Where:
First Label SHOULD be the normal VC for this pseudowire.
TOC |
The following OAM considerations apply to both methods of load balancing.
Where the OAM is only to be used to perform a basic test that the pseudowires have been configured at the PEs, VCCV (Nadeau, T. and C. Pignataro, “Pseudowire Virtual Circuit Connectivity Verification (VCCV) A Control Channel for Pseudowires,” September 2007.) [11] messages may be sent using any load balance pseudowire path, i.e. over any of the multiple pseudowire labels, or using any pseudowire load balance label.
Where it is required to verify that a pseudowire is fully functional for all flows, VCCV (Nadeau, T. and C. Pignataro, “Pseudowire Virtual Circuit Connectivity Verification (VCCV) A Control Channel for Pseudowires,” September 2007.) [11] connection verification message MUST be sent over each ECMP path to the pseudowire egress PE. This problem is difficult to solve and scales poorly. . We believe that this problem is addressed by the following two methods:
To troubleshoot the MPLS PSN, including multiple paths, the techniques described in [7] (Allan, D. and T. Nadeau, “A Framework for Multi-Protocol Label Switching (MPLS) Operations and Management (OAM),” February 2006.) and [8] (Kompella, K. and G. Swallow, “Detecting Multi-Protocol Label Switched (MPLS) Data Plane Failures,” February 2006.) can be used.
TOC |
The requirement to load-balance over multiple PSN paths occurs when the ratio between the PW access speed and the PSN’s core link bandwidth is large (e.g. >= 0.1). ATM and FR are unlikely to meet this property. Ethernet does and this is the reason why this document focuses on ethernet. Applications for other high-access-bandwidth PW’s (fiber-channel) may be defined in the future.
This design applies to MPLS pseudowires where it is meaningful to deconstruct the packets presented to the ingress PE into flows. The mechanism described in this document promotes the distribution of flows within the pseudowire over different network paths. This in turn means that whilst packets within a flow are delivered in order (subject to normal IP delivery perturbations due to topology variation), order is not maintained amongst packets of different flows. It is not proposed to associate a different sequence number with each flow. If sequence number support is required this mechanism is not applicable.
Where it is known that the traffic carried by the ethernet pseudowire is IP the method of identifying the flows are well known and can be applied. Such methods typically include hashing on the source and destination addresses, the protocol ID and higher-layer flow-dependent fields such as TCP/UDP ports, L2TPv3 Session ID’s etc.
Where it is known that the traffic carried by the ethernet pseudowire is non-IP, techniques used for link bundling between Ethernet switches may be reused. In this case however the latency distribution would be larger than is found in the link bundle case. The acceptability of the increased latency is for further study. Of particular importance the Ethernet control frames SHOULD always be mapped to the same PSN path to ensure in-order delivery.
If the payload of an Ethernet PW is made of a single inner flow (i.e. an encrypted connection between two routers), then the functionality described in this document does not give any benefits, though neither does it give any drawbacks. This is unlikely to be a showstopper for two reasons:
A node within the PSN is not able to perform deep-packet-inspection (DPI) of the PW as the PW technology is not self-describing: the structure of the PW payload is only known to the ingress and egress PE devices. The two methods proposed in this document solve this limitation.
The methods describe in this document are transparent to the PSN and as such do not require any new capability from the PSN.
TOC |
There are a number of advantages and disadvantages to each approach:
It is desirable that based on an analysis of the two load balance mechanisms, one will chosen as the method to persue. If consensus is reached that only one method is needed this document will be simplified by removing the dataplane and signaling test that describes the alternative solution.
TOC |
The pseudowire generic security considerations described in [10] (Bryant, S. and P. Pate, “Pseudo Wire Emulation Edge-to-Edge (PWE3) Architecture,” March 2005.) and the security considerations applicable to a specific pseudowire type (for example, in the case of an Ethernet pseudowire [9] (Martini, L., Rosen, E., El-Aawar, N., and G. Heron, “Encapsulation Methods for Transport of Ethernet over MPLS Networks,” April 2006.) apply.
There are no additional security risks introduced by this design.
TOC |
IANA is requested to allocate the next available values from the IETF Consensus range in the Pseudowire Interface Parameters Sub-TLV type Registry as a Load Balance Label indicator.
Parameter Length Description TBD 4 Load Balancing Label TBD 12 Load Balance VC Block
TOC |
The congestion considerations applicable to pseudowires as described in [10] (Bryant, S. and P. Pate, “Pseudo Wire Emulation Edge-to-Edge (PWE3) Architecture,” March 2005.) and any additional congestion considerations developed at the time of publication apply to this design.
The ability to explicitly configure a PW to leverage the availability of multiple ECMP paths is beneficial to capacity planning as, all other parameters being constant, the statistical multiplexing of a larger number of smaller flows is more efficient than with a smaller number of larger flows.
Note that if the classification into flows is only performed on IP packets the behaviour of those flows in the face of congestion will be as already defined by the IETF for packets of that type and no additional congestion processing is required.
Where flows that are not IP are classified pseudowire congestion avoidance must be applied to each non-IP load balance group.
TOC |
The authors wish to thank Joerg Kuechemann, Wilfried Maas, Luca Martini and Mark Townsley for valuable comments and contributions to this design.
TOC |
TOC |
[1] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
[2] | Bryant, S., Swallow, G., Martini, L., and D. McPherson, “Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for Use over an MPLS PSN,” RFC 4385, February 2006 (TXT). |
[3] | Swallow, G., Bryant, S., and L. Andersson, “Avoiding Equal Cost Multipath Treatment in MPLS Networks,” BCP 128, RFC 4928, June 2007 (TXT). |
[4] | Vainshtein, A. and YJ. Stein, “Structure-Agnostic Time Division Multiplexing (TDM) over Packet (SAToP),” RFC 4553, June 2006 (TXT). |
[5] | Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, “Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP),” RFC 4447, April 2006 (TXT). |
[6] | Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci, D., Li, T., and A. Conta, “MPLS Label Stack Encoding,” RFC 3032, January 2001 (TXT). |
[7] | Allan, D. and T. Nadeau, “A Framework for Multi-Protocol Label Switching (MPLS) Operations and Management (OAM),” RFC 4378, February 2006 (TXT). |
[8] | Kompella, K. and G. Swallow, “Detecting Multi-Protocol Label Switched (MPLS) Data Plane Failures,” RFC 4379, February 2006 (TXT). |
[9] | Martini, L., Rosen, E., El-Aawar, N., and G. Heron, “Encapsulation Methods for Transport of Ethernet over MPLS Networks,” RFC 4448, April 2006 (TXT). |
TOC |
[10] | Bryant, S. and P. Pate, “Pseudo Wire Emulation Edge-to-Edge (PWE3) Architecture,” RFC 3985, March 2005 (TXT). |
[11] | Nadeau, T. and C. Pignataro, “Pseudowire Virtual Circuit Connectivity Verification (VCCV) A Control Channel for Pseudowires,” draft-ietf-pwe3-vccv-15 (work in progress), September 2007 (TXT). |
[12] | Swallow, G., “Label Switching Router Self-Test,” draft-ietf-mpls-lsr-self-test-07 (work in progress), May 2007 (TXT). |
TOC |
Stewart Bryant | |
Cisco Systems | |
250 Longwater Ave | |
Reading RG2 6GB | |
United Kingdom | |
Phone: | +44-208-824-8828 |
Email: | stbryant@cisco.com |
Clarence Filsfils | |
Cisco Systems | |
Brussels | |
Belgium | |
Email: | cfilsfil@cisco.com |
Ulrich Drafz | |
Deutsche Telekom | |
Muenster, | |
Germany | |
Phone: | |
Fax: | |
Email: | Ulrich.Drafz@t-com.net |
URI: |
TOC |
Copyright © The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.