Internet-Draft | VPN+ Framework | March 2022 |
Dong, et al. | Expires 8 September 2022 | [Page] |
This document describes the framework for Enhanced Virtual Private Network (VPN+) services. The purpose of enhanced VPNs is to support the needs of new applications, particularly applications that are associated with 5G services, by utilizing an approach that is based on the VPN and Traffic Engineering (TE) technologies and adds characteristics that specific services require over those provided by traditional VPNs.¶
Typically, VPN+ will be used to underpin network slicing, but could also be of use in its own right providing enhanced connectivity services between customer sites.¶
It is envisaged that enhanced VPNs will be delivered using a combination of existing, modified, and new networking technologies. This document provides an overview of relevant technologies and identifies some areas for potential new work.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 8 September 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Virtual private networks (VPNs) have served the industry well as a means of providing different groups of users with logically isolated connectivity over a common network. The common or base network that is used to provide the VPNs is often referred to as the underlay, and the VPN is often called an overlay.¶
Customers of a network operator may request a connectivity services with advanced characteristics such as low latency guarantees, bounded jitter, or isolation from other services or customers so that changes in some other service (such as changes in network load, or events such as congestion or outages) have no or only acceptable effect on the throughput or latency of the services provided to the customer. These services are referred to as "enhanced VPNs" (known as VPN+) in that they are similar to VPN services providing the customer with the required connectivity, but in addition they have enhanced characteristics.¶
The concept of network slicing has gained traction driven largely by needs surfacing from 5G [NGMN-NS-Concept] [TS23501] [TS28530]. According to [TS28530], a 5G end-to-end network slice consists of three major types of network segments: Radio Access Network (RAN), Transport Network (TN), and Mobile Core Network (CN). The transport network provides the connectivity between different entities in RAN and CN segments of a 5G end-to-end network slice, with specific performance commitment.¶
[I-D.ietf-teas-ietf-network-slices] defines the terminologies and the characteristics of IETF network slices. It also discusses the general framework, the components and interfaces for requesting and operating IETF network slices. An IETF Network Slice Service enables connectivity between a set of CEs with specific Service Level Objectives (SLOs) and Service Level Expectations (SLEs) over a common underlay network. An IETF Network Slice can be realized as a logical network connecting a number of endpoints and is associated with a set of shared or dedicated network resources that are used to satisfy the Service Level Objectives (SLOs) and Service Level Expectations (SLEs) requirements. In this document (which is solely about IETF technologies) we refer to an "IETF network slice" simply as a "network slice": a network slice is considered one possible use case of an enhanced VPN.¶
A network slice could span multiple technologies (such as IP or Optical) and multiple administrative domains. Depending on the customer's requirement, a network slice could be isolated from other network slices in terms of data plane, control plane, and management plane resources.¶
Network slicing builds on the concepts of resource management, network virtualization, and abstraction to provide performance assurance, flexibility, programmability, and modularity. It may use techniques such as Software Defined Networking (SDN) [RFC7149], network abstraction [RFC7926] and Network Function Virtualization (NFV) [RFC8172] [RFC8568] to create multiple logical (virtual) networks, each tailored for use by a set of services or by a particular tenant or a group of tenants that share the same or similar requirements. These logical networks are created on top of a common underlay network. How the network slices are engineered can be deployment-specific.¶
The requirements of enhanced VPN services cannot be met by simple overlay networks, as these services require tighter coordination and integration between the underlay and the overlay network. VPN+ is built from a VPN overlay and an underlying Virtual Transport Network (VTN) which has a customized network topology and a set of dedicated or shared resources in the underlay network. The enhanced VPN may also include a set of invoked service functions located within the underlay network. Thus, an enhanced VPN can achieve greater isolation with strict performance guarantees. These new properties, which have general applicability, are also of interest as part of a network slicing solution.¶
VPN+ can be used to instantiate a network slice service, and the technique can also be of use in general cases to provide enhanced connectivity services between customer sites or service end points. [I-D.ietf-teas-ietf-network-slices] introduces the concept Network Resource Partition (NRP) as a set of network resources that are available to carry traffic and meet the SLOs and SLEs. An NRP is associated with a network topology to define the set of links and nodes. Thus VTN and NRP are considered as similar concepts, and NRP can be seen as an instantiation of VTN in the context of network slicing.¶
It is not envisaged that VPN+ services will replace traditional VPN services. Traditional VPN services will continue to be delivered using pre-existing mechanisms and can co-exist with VPN+ services.¶
This document describes a framework for using existing, modified, and potential new technologies as components to provide a VPN+ service. Specifically, we are concerned with:¶
The required layered network structure to achieve this is shown in Section 4.1.¶
In this document, the relationship of the four terms "VPN", "VPN+", "VTN", and "Network Slice" are as follows:¶
The term "tenant" is used in this document to refer to the customers and all of their associated enhanced VPNs.¶
The following terms are also used in this document. Some of them are newly defined, some others reference existing definitions.¶
This section provides an overview of the requirements of an enhanced VPN service.¶
Performance guarantees are made by network operators to their customers in relation to the services provided to the customers. They are usually expressed in SLAs as a set of SLOs.¶
There are several kinds of performance guarantee, including guaranteed maximum packet loss, guaranteed maximum delay, and guaranteed delay variation. Note that these guarantees apply to conformance traffic, out-of-profile traffic will be handled according to a separate agreement with the customer.¶
Guaranteed maximum packet loss is usually addressed by setting packet priorities, queue size, and discard policy. However this becomes more difficult when the requirement is combined with latency requirements. The limiting case is zero congestion loss, and that is the goal of DetNet [DETNET] and TSN [TSN]. In modern optical networks, loss due to transmission errors already approaches zero, but there is the possibility of failure of the interface or the fiber itself. This type of fault can only be addressed by some form of signal duplication and transmission over diverse paths.¶
Guaranteed maximum latency is required by a number of applications particularly real-time control applications and some types of virtual reality applications. DetNet [DETNET] is relevant, however additional methods of enhancing the underlay to better support the delay guarantees may be needed, and these methods will need to be integrated with the overall service provisioning mechanisms.¶
Guaranteed maximum delay variation is a performance guarantee that may also be needed. [RFC8578] calls up a number of cases that need this guarantee, for example in electrical utilities. Time transfer is an example service that needs a performance guarantee, although it is in the nature of time that the service might be delivered by the underlay as a shared service and not provided through different enhanced VPNs. Alternatively, a dedicated enhanced VPN might be used to provide this as a shared service.¶
This suggests that a spectrum of service guarantees need to be considered when deploying an enhanced VPN. As a guide to understanding the design requirements we can consider four types of service:¶
The best effort service is the basic service as provided by current VPNs.¶
An assured bandwidth service is one in which the bandwidth over some period of time is assured. This can be achieved either simply based on a best effort service with over-capacity provisioning, or it can be based on MPLS traffic engineered label switching paths (TE-LSPs) with bandwidth reservations. Depending on the technique used, however, the bandwidth is not necessarily assured at any instant. Providing assured bandwidth to VPNs, for example by using per-VPN TE-LSPs, is not widely deployed at least partially due to scalability concerns. VPN+ aims to provide a more scalable approach for such services.¶
A guaranteed latency service has an upper bound to edge-to-edge latency. Assuring the upper bound is sometimes more important than minimizing latency. There are several new technologies that provide some assistance with this performance guarantee. Firstly, the IEEE TSN project [TSN] introduces the concept of scheduling of delay- and loss-sensitive packets. The DetNet work [DETNET] is also of relevance in assuring an upper bound of end-to-end packet latency. FlexE [FLEXE] is also useful to help provide these guarantees. The use of such underlying technologies to deliver VPN+ services needs to be considered.¶
An enhanced delivery service is one in which the underlay network (at Layer 3) attempts to deliver the packet through multiple paths in the hope of eliminating packet loss due to equipment or media failures. Such a mechanism may need to be used for VPN+ service.¶
One element of the SLA demanded for an enhanced VPN may be a guarantee that the service offered to the customer will not be affected by any other traffic flows in the network. This is termed "isolation" and a customer may express the requirement for isolation as an SLE [I-D.ietf-teas-ietf-network-slices].¶
One way for a network operator to meet the requirement for isolation is simply by setting and conforming to all the SLOs. For example, traffic congestion (interference from other services) might impact on the latency experienced by a VPN+ customer. Thus, in this example, conformance to a latency SLO would be the primary requirement for delivery of the VPN+ service, and isolation from other services might be only a means to that end.¶
Another way for a service provider to meet this SLE is to control the degree to which traffic from one service is isolated from other services in the network.¶
There is a fine distinction between how isolation is requested by a customer and how it is delivered by the service provider. In general, the customer is interested in service performance and not how it is delivered. Thus, for example, the customer wants specific quality guarantees and is not concerned about how the service provider delivers them. However, it should be noted that some aspects of isolation might be directly measurable by a customer if they have information about the traffic patterns on a number services supported by the same service provider. Furthermore, a customer may be nervous about disruption caused by other services, contamination by other traffic, or delivery of their traffic to the wrong destinations. In this way, the customer may want to specify (and pay for) the level of isolation provided by the service provider.¶
Isolation is achieved in the realization of a VPN+ through existing technologies that may be supplemented by new mechanisms. The service provider chooses which processes to use to meet this SLE just as they choose how to meet all other SLOs and SLEs. Isolation may be achieved in the network by various forms of resource partitioning ranging from simple separation of service traffic on delivery (ensuring that traffic is not delivered to the wrong customer), through sharing of resources with some form of safeguards, to dedicated allocation of resources for a specific enhanced VPN. For example, interference avoidance may be achieved by network capacity planning, allocating dedicated network resources, traffic policing or shaping, prioritizing in using shared network resources, etc.¶
The terms hard and soft isolation are used to indicate different levels of isolation. A service has soft isolation if the traffic of one service cannot be received by the customers of another service. The existing IP and MPLS VPNs are examples of services with soft isolation: the network delivers the traffic only to the required customer endpoints. However, with soft isolation, as the network resources are shared, traffic from some services may congest the network, resulting in packet loss and delay for other services. The ability for a service or a group of services to be sheltered from this effect is called hard isolation. Hard isolation may be needed so that applications with exacting requirements can function correctly, despite other demands (perhaps a burst of traffic in another service) competing for the underlying resources. A customer may request different degrees of isolation ranging from soft isolation to hard isolation. In practice isolation may be delivered on a spectrum between soft and hard, and in some cases soft and hard isolation may be used in a hierarchical manner with one enhanced VPN being built on another.¶
To provide the required level of isolation, resources may need to be reserved in the data plane of the underlay network and dedicated to traffic from a specific enhanced VPN or a specific group of enhanced VPNs. This may introduce scalability concerns both in the implementation (as each enhanced VPN would need to be tracked in the network) and in how many resources need to be reserved and may be under-used (see Section 4.4). Thus, some trade-off needs to be considered to provide the isolation between enhanced VPNs while still allowing reasonable resource sharing.¶
An optical underlay can offer a high degree of isolation, at the cost of allocating resources on a long-term and end-to-end basis. On the other hand, where adequate isolation can be achieved at the packet layer, this permits the resources to be shared amongst a group of services and only dedicated to a service on a temporary basis.¶
The next section explores a pragmatic approach to isolation in packet networks.¶
A key question is whether it is possible to achieve hard isolation in packet networks that were designed to provide statistical multiplexing through sharing of data plane resources, a significant economic advantage when compared to a dedicated, or a Time Division Multiplexing (TDM) network. Clearly, there is no need to provide more isolation than is required by the applications, and an approximation to full hard isolation is sufficient in most cases. For example, pseudowires [RFC3985] emulate services that would have had hard isolation in their native form.¶
Figure 1 shows a spectrum of isolation that may be delivered by a network. At one end of the spectrum, we see statistical multiplexing technologies that support traditional VPNs. This is a service type that has served the industry well and will continue to do so. At the opposite end of the spectrum, we have the absolute isolation provided by dedicated transport networks. The goal of enhanced VPNs is "pragmatic isolation". This is isolation that is better than what is obtainable from pure statistical multiplexing, more cost effective and flexible than a dedicated network, but is a practical solution that is good enough for the majority of applications. Mechanisms for both soft isolation and hard isolation are needed to meet different levels of service requirement.¶
The way to achieve the characteristics demanded by an enhanced VPN (such as guaranteed or predictable performance) is by integrating the overlay VPN with a particular set of resources in the underlay network which are allocated to meet the service requirement. This needs be done in a flexible and scalable way so that it can be widely deployed in operators' networks to support a reasonable number of enhanced VPN customers.¶
Taking mobile networks and in particular 5G into consideration, the integration of the network with service functions is likely a requirement. The IETF's work on service function chaining (SFC) [SFC] provides a foundation for this. Service functions can be considered as part of enhanced VPN services. The detailed mechanisms about the integration between service functions and enhanced VPNs are out of the scope of this document.¶
Integration of the overlay VPN and the underlay network resources does not always need to be a direct mapping. As described in [RFC7926], abstraction is the process of applying policy to a set of information about a traffic engineered (TE) network to produce selective information that represents the potential ability to connect across the network. The process of abstraction presents the connectivity graph in a way that is independent of the underlying network technologies, capabilities, and topology so that the graph can be used to plan and deliver network services in a uniform way.¶
Virtual networks can be built on top of an abstracted topology that represents the connectivity capabilities of the underlay TE based network as described in the framework for Abstraction and Control of TE Networks (ACTN) [RFC8453] as discussed further in Section 5.5. [I-D.ietf-teas-applicability-actn-slicing] describes the applicability of ACTN to network slicing and is, therefore, relevant to the consideration of using ACTN to enable enhanced VPNs.¶
Enhanced VPNs need to be created, modified, and removed from the network according to service demands. An enhanced VPN that requires hard isolation (Section 3.2) must not be disrupted by the instantiation or modification of another enhanced VPN. Determining whether modification of an enhanced VPN can be disruptive to that VPN, and whether the traffic in flight will be disrupted can be a difficult problem.¶
The data plane aspects of this problem are discussed further in Section 5.1,Section 5.2, and Section 5.3.¶
The control plane aspects of this problem are discussed further in Section 5.4.¶
The management plane aspects of this problem are discussed further in Section 5.5.¶
Dynamic changes both to the enhanced VPN and to the underlay network need to be managed to avoid disruption to services that are sensitive to changes in network performance.¶
In addition to non-disruptively managing the network during changes such as the inclusion of a new VPN endpoint or a change to a link, VPN traffic might need to be moved because of changes to traffic patterns and volumes.¶
In many cases the customers are delivered with enhanced VPN services without knowing the information about the underlying VTNs. However, depends on the agreement between the operator and the customer, in some cases the customer may also be provided with some information about the underlying VTNs. Such information can be filtered or aggregated according to the operator's policy. This allows the customer of the enhanced VPN to have some visibility and even control over how the underlying topology and resources of the VTN are used. For example, the customers may be able to specify the service paths within the VTN for specific traffic flows of their enhanced VPNs. Depending on the requirements, an enhanced VPN customer may have his own network controller, which may be provided with an interface to the control or management system run by the network operator. Note that such control is within the scope of the customer's enhanced VPN, any additional changes beyond this would require some intervention by the network operator.¶
A description of the control plane aspects of this problem are discussed further in Section 5.4. A description of the management plane aspects of this feature can be found in Section 5.5.¶
The concept of enhanced VPN can be applied to any existing and future multi-tenancy overlay technologies including but not limited to :¶
Where such VPN service types need enhanced isolation and delivery characteristics, the technologies described in Section 5 can be used to provide an underlay with the required enhanced performance.¶
In some scenarios, an enhanced VPN service may span multiple network domains. A domain is considered to be any collection of network elements within a common realm of address space or path computation responsibility [RFC5151] for example, an Autonomous System. In some domains the network operator may manage a multi-layered network, for example, a packet network over an optical network. When VPN+ services are provisioned in such network scenarios, the technologies used in different network planes (data plane, control plane, and management plane) need to provide mechanisms to support multi-domain and multi-layer coordination and integration, so as to provide the required service characteristics for different enhanced VPNs, and improve network efficiency and operational simplicity.¶
A number of VPN+ services will typically be provided by a common network infrastructure. Each VPN+ service is provisioned with an overlay VPN and a corresponding VTN, which has a specific set of network resources and functions allocated in the underlay to satisfy the needs of the customer. One VTN may support one of more VPN+ services. The integration between the overlay connectivity and the underlay resources ensures the required isolation between different VPN+ services, and achieves the guaranteed performance for different customers.¶
The VPN+ architecture needs to be designed with consideration given to:¶
These topics are expanded below.¶
The enhanced data plane:¶
The control plane:¶
The management plane:¶
Operations, Administration, and Maintenance (OAM)¶
Telemetry¶
Provides the mechanisms to collect network information about the operation of the data plane, control plane, and management plane. More specifically, telemetry provides the mechanisms to collect network data:¶
The layered architecture of VPN+ is shown in Figure 2.¶
Underpinning everything is the physical network infrastructure layer which provide the underlying resources used to provision the separated VTNs. This layer is responsbile for the partitioning of link and/or node resources for different VTNs. Each subset of link or node resource can be considered as a virtual link or virtual node used to build the VTNs.¶
Various components and techniques discussed in Section 5 can be used to enable resource partitioning, such as FlexE, TSN, DetNet, dedicated queues, etc. These partitions may be physical or virtual so long as the SLA required by the higher layers is met.¶
Based on the network resource partitions provided by the physical network infrastructure, multiple VTNs can be created, each with a set of dedicated or shared network resources allocated from the physical underlay network, and is associated with a customized logical network topology, so as to meet the requirements of different VPN+ services or different groups of VPN+ services. According to the associated logical network topology, each VTN needs to be instantiated on a set of network nodes and links which are involved in the logical topology. And on each node or link, each VTN is associated with a set of local resources which are allocated for the processing of traffic in the VTN. The VTN provides the integration between the virtual network topology and the required underlying network resources.¶
According to the service requirements on connectivity, performance and isolation, etc., VPN services can be mapped to the appropriate VTNs in the network. Different VPN services can be mapped to different VTNs, while it is also possible that multiple VPNs are mapped to the same VTN. Thus VTN is an essential scaling technique, as it has the potential of eliminating per-path state from the network. In addition, when a group of VPN+ services are mapped to a single VTN, only the network state of the single VTN needs to be maintained in the network (see Section 4.4 for more information).¶
The centralized network controller is responsible for creating a VTN, instructing the involved network nodes to allocate network resources to the VTN, and provisioning the VPN services on the VTN. A distributed control plane may be used for distributing the VTN resource and topology attributes among nodes in the VTN.¶
The process used to create VTNs and to allocate network resources for use by the VTNs needs to take a holistic view of the needs of all of its customers and to partition the resources accordingly. However, within a VTN these resources can, if required, be managed via a dynamic control plane. This provides the required scalability and isolation with some flexibility.¶
At the VPN service level, the required connectivity for an MP2MP VPN service is usually full or partial mesh. To support such VPN services, the corresponding VTN also needs to provide MP2MP connectivity among the end points.¶
Other service requirements may be expressed at different granularities, some of which can be applicable to the whole service, while some others may only be applicable to some pairs of end points. For example, when a particular level of performance guarantee is required, the point-to-point path through the underlying VTN of the VPN+ service may need to be specifically engineered to meet the required performance guarantee.¶
Although a lot of the traffic that will be carried over VPN+ will likely be IP based, the design must be capable of carrying other traffic types, in particular Ethernet traffic. This is easily accomplished through the various pseudowire (PW) techniques [RFC3985]. Where the underlay is MPLS, Ethernet traffic can be carried over VPN+ encapsulated according to the method specified in [RFC4448]. Where the underlay is IP, Layer Two Tunneling Protocol - Version 3 (L2TPv3) [RFC3931] can be used with Ethernet traffic carried according to [RFC4719]. Encapsulations have been defined for most of the common Layer-2 types for both PW over MPLS and for L2TPv3.¶
VPNs are instantiated as overlays on top of an operator's network and offered as services to the operator's customers. An important feature of overlays is that they can deliver services without placing per-service state in the core of the underlay network.¶
VPN+ may need to install some additional state within the network to achieve the features that they require. Solutions must consider minimizing and controlling the scale of such state, and deployment architectures should constrain the number of VPN+ services so that the additional state introduced to the network is acceptable and under control. It is expected that the number of VPN+ services will be small at the beginning, and even in future the number of VPN+ services will be fewer than traditional VPNs because pre-existing VPN techniques are good enough to meet the needs of most existing VPN-type services.¶
In general, it is not required that the state in the network be maintained in a 1:1 relationship with the VPN+ services. It will usually be possible to aggregate a set or group of VPN+ services so that they share the same VTN and the same set of network resources (much in the same way that current VPNs are aggregated over transport tunnels) so that collections of VPN+ services that require the same behavior from the network in terms of resource reservation, latency bounds, resiliency, etc. can be grouped together. This is an important feature to assist with the scaling characteristics of VPN+ deployments.¶
[I-D.dong-teas-nrp-scalability] provides more details of scalability considerations for the network resource partitions used to instantiate VTNs, and Section 7 includes a greater discussion of scalability considerations.¶
A VPN is a network created by applying a demultiplexing technique to the underlying network (the underlay) to distinguish the traffic of one VPN from that of another. A VPN path that travels by other than the shortest path through the underlay normally requires state in the underlay to specify that path. State is normally applied to the underlay through the use of the RSVP-TE signaling protocol, or directly through the use of an SDN controller, although other techniques may emerge as this problem is studied. This state gets harder to manage as the number of VPN paths increases. Furthermore, as we increase the coupling between the underlay and the overlay to support the VPN+ service, this state will increase further. Thus, a VPN+ solution needs tighter coupling with the underlay than is the case with existing VPN techniques. We cannot, for example, share the network resource between VPN+ services which require hard isolation.¶
In a VPN+ solution, different subsets of the underlay resources can be dedicated to different VPN+ services or different groups of VPN+ services through the use of VTNs.¶
Several candidate Layer 2 packet- or frame-based data plane solutions which provide the required isolation and guarantees are described in the following sections.¶
FlexE [FLEXE] provides the ability to multiplex channels over an Ethernet link to create point-to-point fixed- bandwidth connections in a way that provides hard isolation. FlexE also supports bonding links to create larger links out of multiple low capacity links.¶
However, FlexE is only a link level technology. When packets are received by the downstream node, they need to be processed in a way that preserves that isolation in the downstream node. This in turn requires a queuing and forwarding implementation that preserves the end-to-end isolation.¶
If different FlexE channels are used for different services, then no sharing is possible between the FlexE channels. This means that it may be difficult to dynamically redistribute unused bandwidth to lower priority services in another FlexE channel. If one FlexE channel is used by one customer, the customer can use some methods to manage the relative priority of their own traffic in the FlexE channel.¶
DiffServ based queuing systems are described in [RFC2475] and [RFC4594]. This approach is not sufficient to provide isolation for VPN+ services because DiffServ does not provide enough markers to differentiate between traffic of a large number of VPN+ services. Nor does DiffServ offer the range of service classes that each VPN+ service needs to provide to its tenants. This problem is particularly acute with an MPLS underlay, because MPLS only provides eight traffic classes.¶
In addition, DiffServ, as currently implemented, mainly provides per-hop priority-based scheduling, and it is difficult to use it to achieve quantitative resource reservation for different VPN+ services.¶
To address these problems and to reduce the potential interference between VPN+ services, it would be necessary to steer traffic to dedicated input and output queues per VPN+ service or per group of VPN+ services: some routers have a large number of queues and sophisticated queuing systems which could support this, while some routers may struggle to provide the granularity and level of isolation required by the applications of VPN+.¶
Time Sensitive Networking (TSN) [TSN] is an IEEE project to provide a method of carrying time sensitive information over Ethernet. It introduces the concept of packet scheduling where a packet stream may be given a time slot guaranteeing that it experiences no queuing delay or increase in latency beyond the very small scheduling delay. The mechanisms defined in TSN can be used to meet the requirements of time sensitive traffic flows of VPN+ service.¶
Ethernet can be emulated over a Layer 3 network using an IP or MPLS pseudowire. However, a TSN Ethernet payload would be opaque to the underlay and thus not treated specifically as time sensitive data. The preferred method of carrying TSN over a Layer 3 network is through the use of deterministic networking as explained in Section 5.2.1.¶
This section considers the problem of VPN+ service differentiation and the representation of underlying network resources in the network layer. More specifically, it describes the possible data plane mechanisms to determine the network resources and the logical network topology or paths associated with a VTN.¶
Deterministic Networking (DetNet) [RFC8655] is a technique being developed in the IETF to enhance the ability of Layer-3 networks to deliver packets more reliably and with greater control over the delay. The design cannot use re-transmission techniques such as TCP since that can exceed the delay tolerated by the applications. Even the delay improvements that are achieved with Stream Control Transmission Protocol Partial Reliability Extension (SCTP-PR) [RFC3758] may not meet the bounds set by application demands. DetNet pre-emptively sends copies of the packet over various paths to minimize the chance of all copies of a packet being lost. It also seeks to set an upper bound on latency, but the goal is not to minimize latency. Detnet can be realized over IP data plane [RFC8939] or MPLS data plane [RFC8964], and may be used to provide Virtual Transport Path (VTP) for VPN+ services.¶
MPLS-TE [RFC2702][RFC3209] introduces the concept of reserving end-to-end bandwidth for a TE-LSP, which can be used to provide a point-to-point Virtual Transport Path (VTP) across the underlay network to support VPN services. VPN traffic can be carried over dedicated TE-LSPs to provide reserved bandwidth for each specific connection in a VPN, and VPNs with similar behavior requirements may be multiplexed onto the same TE-LSPs. Some network operators have concerns about the scalability and management overhead of MPLS-TE system, especially with regard to those systems that use an active control plane, and this has lead them to consider other solutions for traffic engineering in their networks.¶
Segment Routing (SR) [RFC8402] is a method that prepends instructions to packets at the head-end of a path. These instructions are used to specify the nodes and links to be traversed, and allow the packets to be routed on paths other than the shortest path. By encoding the state in the packet, per-path state is transitioned out of the network.¶
An SR traffic engineered path operates with a granularity of a link. Hints about priority are provided using the Traffic Class (TC) or Differentiated Services Code Point (DSCP) field in the packet header. However, to achieve the performance and isolation characteristics that are sought by VPN+ customers, it will be necessary to steer packets through specific virtual links and/or queues on the same link and direct them to use specific resources. With SR, it is possible to introduce such fine-grained packet steering by specifying the queues and the associated resources through an SR instruction list.¶
Note that the concept of a queue is a useful abstraction for different types of underlay mechanism that may be used to provide enhanced isolation and performance support. How the queue satisfies the requirement is implementation specific and is transparent to the layer-3 data plane and control plane mechanisms used.¶
With Segment Routing, the SR instruction list could be used to build a P2P path, and a group of SR SIDs could also be used to represent an MP2MP network. Thus, the SR based mechanism could be used to provide both a Virtual Transport Path (VTP) and a Virtual Transport Network (VTN) for VPN+ services.¶
Non-packet underlay data plane technologies often have TE properties and behaviors, and meet many of the key requirements in particular for bandwidth guarantees, traffic isolation (with physical isolation often being an integral part of the technology), highly predictable latency and jitter characteristics, measurable loss characteristics, and ease of identification of flows. The cost is that the resources are allocated on a long-term and end-to-end basis. Such an arrangement means that the full cost of the resources has to be borne by the service that is allocated with the resources.¶
The control plane of VPN+ would likely be based on a hybrid control mechanism that takes advantage of a logically centralized controller for on-demand provisioning and global optimization, whilst still relying on a distributed control plane to provide scalability, high reliability, fast reaction, automatic failure recovery, etc. Extension to and optimization of the centralized and distributed control plane is needed to support the enhanced properties of VPN+.¶
As described in section 4, the VPN+ control plane needs to provide the following functions:¶
The collection of underlying network topology and resource information can be done using existing the IGP and BGP-LS based mechanisms. The creation of VTN and the distribution of VTN attributes may need further control protocol extensions. The computation of VTPs based on the attributes and constraints of the VTN can be performed either by the headend node of the path or a centralized Path Computation Element (PCE).¶
There are two candidate mechanisms for the setup of VTPs in the VTN: RSVP-TE and Segment Routing (SR).¶
According to the service requirements on connectivity, performance and isolation, one VPN+ service may be mapped a dedicated VTN, or a group of VPN+ services may be mapped to the same VTN. The mapping of VPN+ services to VTN can be achieved using existing control mechanisms with possible extensions, and it can be based on either the characteristics of the data packet or the attributes of the VPN service routes.¶
The management plane provides the interface between the VPN+ service provider and the customers for life-cycle management of the VPN+ service (i.e., creation, modification, assurance/monitoring, and decommissioning). It relies on a set of service data models for the description of the information and operations needed on the interface.¶
As an example, in the context of 5G end-to-end network slicing [TS28530], the management of VPN+ services is considered as the management of the transport network segment of the 5G end-to-end network slice. The 3GPP management system may provide the connectivity and performance related parameters as requirements to the management plane of the transport network. It may also require the transport network to expose the capabilities and status of the network slice. Thus, an interface between the VPN+ management plane and the 5G network slice management system, and relevant service data models are needed for the coordination of 5G end-to-end network slice management.¶
The management plane interface and data models for VPN+ services can be based on the service models described in Section 5.6.¶
It is important that the management life-cycle supports in-place modification of VPN+ services. That is, it should be possible to add and remove end points, as well as to change the requested characteristics of the service that is delivered. The management system needs to be able to assess the revised VPN+ requests and determine whether they can be provided by the existing VTNs or whether changes must be made, and it will additionally need to determine whether those changes to the VTN are possible. If not, then the customer's modification request may be rejected.¶
When the modification of a VPN+ service is possible, the management system should make every effort to make the changes in a non-disruptive way. That is, the modification of the VPN+ service or the underlying VTN should not perturbate traffic on the VPN+ service in a way that causes the service level to drop below the agreed levels. Furthermore, in the spirit of isolation, changes to one VPN+ service should not cause disruption to other VPN+ services.¶
The network operator for the underlay network (i.e., the provider of the VPN+ service) may delegate some operational aspects of the overlay VPN and the underlying VTN to the customer. In this way, the VPN+ is presented to the customer as a virtual network, and the customer can choose how to use that network. The customer cannot exceed the capabilities of the virtual links and nodes, but can decide how to load traffic onto the network, for example, by assigning different metrics to the virtual links so that the customer can control how traffic is routed through the virtual network. This approach requires a management system for the virtual network, but does not necessarily require any coordination between the management systems of the virtual network and the physical network, except that the virtual network management system might notice when the VTN is close to capacity or considerably under-used and automatically request changes in the service provided by the underlay network.¶
This section describes the applicability of the existing and in-progress service data models to VPN+. [RFC8309] describes the the scope and purpose of service models and shows where a service model might fit into a SDN based network management architecture. New service models may also be introduced for some of the required management functions.¶
Service data models are used to represent, monitor, and manage the virtual networks and services enabled by VPN+. The VPN customer service models (e.g., the layer 3 VPN service model (L3SM) [RFC8299], the layer 2 VPN service model (L2SM) [RFC8466]), or the ACTN Virtual Network (VN) model [I-D.ietf-teas-actn-vn-yang]) are service models which can provide the customer's view of the VPN+ service. The layer 3 VPN network model (L3NM) [I-D.ietf-opsawg-l3sm-l3nm], the layer 2 VPN network model (L2NM) [I-D.ietf-opsawg-l2nm] provide the operator's view of the managed infrastructure as a set of virtual networks and the associated resources. The NRP model [I-D.wd-teas-nrp-yang] further provides the management of the virtual underlay network topology and resources both in the controller and in the network devices to instantiate the VTNs needed for the VPN+ services.¶
The ACTN framework[RFC8453] supports operators in viewing and controlling different domains and presenting virtualized networks to their customers. [I-D.ietf-teas-applicability-actn-slicing] discusses the applicability of the ACTN approach in the context of network slicing. Since there is a strong correlation between network slices and enhanced VPNs, that document also give guidance on how ACTN can be applied to enhanced VPNs.¶
One of the typical use cases of enhanced VPN is to deliver IETF network slice service. This section describes the applicability of enhanced VPN to network slice realization.¶
In order to provide IETF network slices to customers, a technology-agnostic network slice service Northbound Interface (NBI) data model [I-D.ietf-teas-ietf-network-slice-nbi-yang] is needed for the customers to communicate the requirements of IETF network slices (end points, connectivity, SLOs, and SLEs). These requirements may be realized using technology specified in this document to instruct the network to instantiate a VPN+ service to meet the requirements of the IETF network slice customers.¶
According to the network operators' network resource planning policy, or based on the requirement of one or a group of customers or services, a VTN may need to be created. One of the basic requirements for a VTN is to provide a set of dedicated network resources to avoid unexpected interference from other services in the same network. Other possible requirements may include the required topology and connectivity, bandwidth, latency, reliability, etc.¶
A centralized network controller can be responsible for calculating a subset of the underlay network topology (which is called a logical topology) to support the VTN requirement. And on the network nodes and links within the logical topology, the set of network resources to be allocated to the VTN can also be determined by the controller. Normally such calculation needs to take the underlay network connectivity information and the available network resource information of the underlay network into consideration. The network controller may also take the status of the existing VTNs into consideration in the planning and calculation of a new VTN.¶
According to the result of the VTN planning, the network nodes and links involved in the logical topology of the VTN are instructed to allocated the required set of network resources for the VTN. One or multiple mechanisms as specified in section 5.1 can be used to partition the forwarding plane network resources and allocate different subsets of resources to different VTNs. In addition, the data plane identifiers which are used to identify the set of network resources allocated to the VTN are also provisioned on the network nodes. Depends on the data plane technologies used, the set of network resources of a VTN can be identified using either resource aware SR segments as specified in [I-D.ietf-spring-resource-aware-segments], or a dedicated VTN resource ID as specified in [I-D.dong-6man-enhanced-vpn-vtn-id] can be introduced. The network nodes involved in a VTN may distribute the logical topology information, the VTN specific network resource information and the VTN resource identifiers using the control plane. Such information could be used by the controller and the network nodes to compute the TE or shortest paths within the VTN, and install the VTN specific forwarding entries to network nodes.¶
According to the connectivity requirements of an IETF network slice service, an overlay VPN can be created using the existing or future multi-tenancy overlay technologies as described in Section 3.6.¶
Then according to the SLOs and SLEs requirements of the network slice, the overlay VPN is mapped to an appropriate VTN as the virtual underlay. The integration of the overlay VPN and the underlay VTN together provide an enhanced VPN service which can meet the network slice service requirements.¶
At the edge of the operator's network, traffic of IETF network slices can be classified based on the rules defined by operator's policy, so that the traffic is treated as a specific VPN+ service, which is further mapped to a underlay VTN. Packets belonging to the VPN+ service will be processed and forwarded by network nodes based the TE or shortest path forwarding entries and the set of network resources of the corresponding VTN.¶
VPN+ provides performance guaranteed services in packet networks, but with the potential cost of introducing additional state into the network. There are at least three ways that this additional state might be brought into the network:¶
Reducing the state in the network is important to VPN+, as it requires the overlay to be more closely integrated with the underlay than with traditional VPNs. This tighter coupling would normally mean that more state needs to be created and maintained in the network, as the state about fine granularity processing would need to be loaded and maintained in the routers. However, an SR approach allows much of this state to be spread amongst the network ingress nodes, and transiently carried in the packets as SIDs.¶
Further discussion of the scalability considerations of the underlaying network resource partitions of VPN+ can be found in [I-D.dong-teas-nrp-scalability].¶
One of the challenges with SR is the stack depth that nodes are able to impose on packets [RFC8491]. This leads to a difficult balance between adding state to the network and minimizing stack depth, or minimizing state and increasing the stack depth.¶
The traditional method of creating a resource allocated path through an MPLS network is to use the RSVP-TE protocol. However, there have been concerns that this requires significant continuous state maintenance in the network. Work to improve the scalability of RSVP-TE LSPs in the control plane can be found in [RFC8370].¶
There is also concern at the scalability of the forwarder footprint of RSVP-TE as the number of paths through a label switching router (LSR) grows. [RFC8577] addresses this by employing SR within a tunnel established by RSVP-TE.¶
The centralized approach of SDN requires state to be stored in the network, but does not have the overhead of also requiring control plane state to be maintained. Each individual network node may need to maintain a communication channel with the SDN controller, but that compares favorably with the need for a control plane to maintain communication with all neighbors.¶
However, SDN may transfer some of the scalability concerns from the network to the centralized controller. In particular, there may be a heavy processing burden at the controller, and a heavy load in the network surrounding the controller. A centralized controller also presents a single point of failure within the network.¶
The design of OAM for VPN+ services needs to consider the following requirements:¶
A study of OAM in SR networks has been documented in [RFC8403].¶
Network visibility is essential for network operation. Network telemetry has been considered as an ideal means to gain sufficient network visibility with better flexibility, scalability, accuracy, coverage, and performance than conventional OAM technologies.¶
As defined in [I-D.ietf-opsawg-ntf], the objective of Network Telemetry is to acquire network data remotely for network monitoring and operation. It is a general term for a large set of network visibility techniques and protocols. Network telemetry addresses the current network operation issues and enables smooth evolution toward intent-driven autonomous networks. Telemetry can be applied on the forwarding plane, the control plane, and the management plane in a network.¶
How the telemetry mechanisms could be used or extended for the VPN+ service is out of the scope of this document.¶
Each VPN+ service has a life cycle, and may need modification during deployment as the needs of its tenant change. This is discussed in Section 5.5. Additionally, as the network evolves, there may need to be garbage collection performed to consolidate resources into usable quanta.¶
Systems in which the path is imposed, such as SR or some form of explicit routing, tend to do well in these applications, because it is possible to perform an atomic transition from one path to another. That is, a single action by the head-end that changes the path without the need for coordinated action by the routers along the path. However, implementations and the monitoring protocols need to make sure that the new path is operational and meets the required SLA before traffic is transitioned to it. It is possible for deadlocks to arise as a result of the network becoming fragmented over time, such that it is impossible to create a new path or to modify an existing path without impacting the SLA of other paths. Resolution of this situation is as much a commercial issue as it is a technical issue and is outside the scope of this document.¶
There are, however, two manifestations of the latency problem that are for further study in any of these approaches:¶
There is also the matter of what happens during failure in the underlay infrastructure. Fast reroute is one approach, but that still produces a transient loss with a normal goal of rectifying this within 50ms [RFC5654]. An alternative is some form of N+1 delivery such as has been used for many years to support protection from service disruption. This may be taken to a different level using the techniques of DetNet with multiple in-network replication and the culling of later packets [RFC8655].¶
In addition to the approach used to protect high priority packets, consideration should be given to the impact of best effort traffic on the high priority packets during a transition. Specifically, if a conventional re-convergence process is used there will inevitably be micro-loops and whilst some form of explicit routing will protect the high priority traffic, lower priority traffic on best effort shortest paths will micro-loop without the use of a loop prevention technology. To provide the highest quality of service to high priority traffic, either this traffic must be shielded from the micro-loops, or micro-loops must be prevented completely.¶
It is likely that VPN+ services will be introduced in networks which already have traditional VPN services deployed. Depending on service requirements, the tenants or the operator may choose to use a traditional VPN or an enhanced VPN to fulfill a service requirement. The information and parameters to assist such a decision needs to be reflected on the management interface between the tenant and the operator.¶
All types of virtual network require special consideration to be given to the isolation of traffic belonging to different tenants. That is, traffic belonging to one VPN must not be delivered to end points outside that VPN. In this regard VPN+ neither introduce, nor experience a greater security risks than other VPNs.¶
However, in a VPN+ service the additional service requirements need to be considered. For example, if a service requires a specific upper bound to latency then it can be damaged by simply delaying the packets through the activities of another tenant, i.e., by introducing bursts of traffic for other services. In some respects this makes the enhanced VPN more susceptible to attacks since the SLA may be broken. But another view is that the operator must, in any case, preform monitoring of the enhanced VPN to ensure that the SLA is met, and this means that the operator may be more likely to spot the early onset of a security attack and be able to take pre-emptive protective action.¶
The measures to address these dynamic security risks must be specified as part to the specific solution are form part of the isolation requirements of a service.¶
While a VPN+ service may be sold as offering encryption and other security features as part of the service, customers would be well advised to take responsibility for their own security requirements themselves possibly by encrypting traffic before handing it off to the service provider.¶
The privacy of VPN+ service customers must be preserved. It should not be possible for one customer to discover the existence of another customer, nor should the sites that are members of an VPN+ be externally visible.¶
There are no requested IANA actions.¶
Daniel King Email: daniel@olddog.co.uk Adrian Farrel Email: adrian@olddog.co.uk Jeff Tansura Email: jefftant.ietf@gmail.com Zhenbin Li Email: lizhenbin@huawei.com Qin Wu Email: bill.wu@huawei.com Bo Wu Email: lana.wubo@huawei.com Daniele Ceccarelli Email: daniele.ceccarelli@ericsson.com Mohamed Boucadair Email: mohamed.boucadair@orange.com Sergio Belotti Email: sergio.belotti@nokia.com Haomian Zheng Email: zhenghaomian@huawei.com¶
The authors would like to thank Charlie Perkins, James N Guichard, John E Drake, Shunsuke Homma, and Luis M. Contreras for their review and valuable comments.¶
This work was supported in part by the European Commission funded H2020-ICT-2016-2 METRO-HAUL project (G.A. 761727).¶