Internet-Draft ALTO Multi-Domain July 2023
Lachos, et al. Expires 11 January 2024 [Page]
Workgroup:
ALTO Working Group
Internet-Draft:
draft-yang-alto-multi-domain-02
Published:
Intended Status:
Standards Track
Expires:
Authors:
D. Lachos
Benocs
I. Poese
Benocs
M. Lassnig
CERN
A. Gu
Yale University
Y. Yang
Yale University
J. Ros Giralt
Qualcomm

ALTO Multi-Domain Use Cases and Services

Abstract

Application-Layer Traffic Optimization (ALTO) provides means for network applications to obtain network information. Although ALTO is inherently multi-domain, in that the ALTO server representing the network and the ALTO client requesting the network information belong to different trust domains, there are more general cases where the path from the source and the destination spans multiple autonomous networks, which we call multi-domain settings. This document first gives 3 multi-domain use cases, and the challenges to address the challenges. It then gives a brief update on the implementation solutions that we explored to address the challenges.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 11 January 2024.

Table of Contents

1. Introduction

Application-Layer Traffic Optimization (ALTO) provides means for network applications to obtain network information. For example, the Endpoint Cost Service (ECS) and the Cost Map Service (CMS) defined by ALTO in [RFC7285] can provide the network-path properties (e.g., routing costs, or relative ranking) of data transmissions from a set of source network locations to a set of destination network locations. The Path Vector Service allows ALTO to provide bandwidth availability (bottlenecks) for a given set of flows defined by a set of source-destination pairs.

Although such services can provide values when only a single network, called a single-domain, provides these services, there are many use cases where multiple networks, called multi-domains, can be involved from a given source to a given destination. [RFC7971] states that "The ALTO protocol is designed for use cases where the ALTO server and client can be located in different organizations or trust domains. ALTO is inherently designed for use in multi-domain environments. Most importantly, ALTO is designed to enable deployments in which the ALTO server and the ALTO client are not located within the same administrative domain.”

This document first specifies three multi-domain use cases. It then discuss the two challenges when deploying ALTO in multi-domain settings. At the end, this document provides initial designs, based on current implementation experiences to start the design conversation.

2. Multi-domain Use Cases

To be concrete, this document uses 3 use cases from the context of data-intensive sciences for CERN/WLCG.

2.1. Multi-domain Path Distance/Ranking for Rucio

The data orchestration system of CERN/WLCG is Rucio, which selects a source for a given downloading client, when multiple sources provide the data set. Rucio also conducts destination selection, when multiple destinations can be the target locations for replication.

The main mechanism of source/destination selection in Rucio is distance, which is a perfect match for ALTO cost services; see appendix for a review of the full source/destination mecahnism of Rucio. Specifically, consider the following ECS query realizing ALTO ECS for Rucio to do destination selection. The source is located at CERN and the destination candidates are at multiple locations of the LHCONE network (BNL, Caltech, and KIT, for example).

  POST /endpointcost/lookup HTTP/1.1
  Host: alto.example.com
  Content-Length: 248
  Content-Type: application/alto-endpointcostparams+json
  Accept:
     application/alto-endpointcost+json,application/alto-error+json

  {
    "cost-type": {"cost-mode" : "numerical",
                  "cost-metric" : "routingcost"},
    "endpoints" : {
      "srcs": [ "ipv4:128.141.201.74" ],
      "dsts": [
        "ipv4:130.199.4.27",
        "ipv4:104.18.24.74",
        "ipv4:141.3.128.6"
      ]
    }
  }


  HTTP/1.1 200 OK
  Content-Length: 274
  Content-Type: application/alto-endpointcost+json

  {
    "meta" : {
      "cost-type": {"cost-mode" : "numerical",
                    "cost-metric" : "routingcost"
      }
    },
    "endpoint-cost-map" : {
      "ipv4:128.141.201.74": {
        "ipv4:130.199.4.27" : 20,
        "ipv4:104.18.24.74" : 30,
        "ipv4:141.3.128.6"  : 10
      }
    }
  }

The use case provide an example of a multi-domain setting, because the source and the desintations are located at different networks and the network paths span multiple autonomous networks. Figure 1 below illustrates a network path from src to dst spanning 4 networks.

      AS S              AS A        AS B         AS D
+-------------+se1  +---------+   +-----+   +------------+
| src       --|-----|ai1   ae1|---|     |---|di1     dst |
|+--+    --/  |     |         |   |     |   |       +--+ |
||  | --/     |     |         |   |     |   |       |  | |
|+--+    \    |se2  |         |   |     |   |       +--+ |
|         \__ |_____|ai2   ae2|---|     |---|di2         |
+-------------+     +---------+   +-----+   +------------+

              Figure 1. Multi-domain Network.

The data transport scheduling system of CERN/WLCG is FTS, which schedules given transfers, where a transfer has a fixed dataset, a source and a destination (for example, chosen by Rucio).

One main function of FTS is to satisfy operator resource constraints. As an example, FTS may schedule concurrent transfers from CERN to CalTech and KIT, under the constraint that the total transmission rate for a given link should be within a threshold.

Consider Figure 1. Assume S is CERN and D is CalTech. The network path from CERN to CalTech uses the top inter-AS link (se1->ai1), and the network path from CERN to KIT uses the lower inter-AS link (se2->ai2). For FTS to know how much demand it is placing on each one of these two links to stay within the constraint of each link, FTS needs to know the network paths, and the paths span multiple autonomous networks.

2.3. Multi-domain Resource Discovery for NOTED/SENSE

An emerging capacity for LHCONE/WLCG is the ability to create new paths to obtain additional capacity when there is high demand from a given source storage element (SE) to a destination element (SE). In particular, the NOTED project monitors the backlog at Rucio/FTS for each source SE, destination SE pair, and may decide to create a path if one pair has a large backlog.

To identify available networking capacity, ALTO Path Vector Service (PVS) is an ideal service. However, as we see from Figure 1, the setting can be a multi-domain setting.

3. Multi-domain Challenges

Although implementing ECS/CMS/PVS is relatively straightforward in a single domain, there are challenges implementing these services in multi-domain settings.

3.1. Challenge: Distributed Information

Consider Figure 1, to compute the total routing cost of the path from the src to the dst. The routing costs will consist of multiple segments spanning 4 autonomous networks: (1) from src to the link connecting the egress of AS S (se1) and the ingress of AS A (ia1); (2) from the ingress of A to the ingress of B; (3) from the ingress of B to the ingress of D; ( 4) from the ingress of D to dst.

One may think that BGP collects information from multiple autonomous networks through back propagation from the destination, and hence includes information for all 4 segments. But BGP information is distributed, coarse-grained, and incomplete.

Source: The BGP router at AS S knows that the path from src to dst consists of the AS-PATH [S A B D]. Combining BGP and intradomain routing, AS S will also know which one of the two egress routers (se1, se2) that it will use to forward traffic to dst. However, AS S does not know more details downstream: for example, it does not know whether the packet will use ae1 or ae2 as the egress router at AS A to enter AS B; neither does it know the internal routing inside AS A. Hence, an ALTO server provided by AS S cannot provide all of the information for the example ECS query.

Non-Source AS: A non-source AS knows the AS-PATH starting from itself to dst. But it may not know the ingress point. For example, AS A does not know whether the packet will come in from ai1 or ai2. Hence, an ALTO server provided by AS A may consider the example ECS query as an ambiguous query (because it gives only source (src) and destination (dst), but it does not in general know the ingress point).

3.2. Challenge: Partial Deployment

It is possible to design protocol extensions to collect the aforementioned distributed information to provide complete information (see below), but one challenge is that the deployment may be only incremental and hence is partially deployed during the process.

4. Solution Space Exploration

During the process of integrating ALTO to support the Rucio and FTS use cases, multiple potential solutions are implemented. Below we discuss them briefly to motivate discussions.

4.1. Solution Space to Obtain Forwarding Information Bases (FIBs)

Obtaining FIBs is a basic building block of implementing ALTO route-related visibility services. At a high-level, FIBs have the following life cycle:

FIB-Configure/input:
Consider a network as a distributed system. Then the input to the distributed system is the input to the components: the routers. The input to each router is its configuration, and external input such as BGP routes from peers outside the network.
FIB-Compute:
The distributed system computes the FIBs using a distributed protocol (or configured by a logically centralized control plane). During the computing process, the vendor specific configuration may be turned into standard format such as IGP information model.
FIB-Apply:
The FIBs are computed and then applied. When FIBs are applied to data packets, the application may be observed by NetFlow/sFlow or similar capturing capabilities; it may also be applied to control mechanisms such as traceoute, which can be observed.

4.2. FIB-Compute Based Design

This is a type of solution that makes it possible to extends FIB-Compute to compute all needed network information at a single autonomous network, addressing the distributed information challenge. Assume that use an ALTO server at the source network to abstract and expose the information. One natural candidate is to modify the routing control plane itself: BGP extensions, which can be extended to collect needed information and propagate upstream. For example, when a BGP router at AS A (e.g., ai1) propagates BGP info to its peer at AS S (se1), it includes not only the AS-PATH [A, B, D], but also additional information so that the upstream can construct the complete path cost (distance) metrics.

The upside of this design is that it integrates with routing system and hence may even extend routing capabilities. However, routing protocol extensions can be complex in deployment. Further, it provides a different trust model: the original ALTO model is a star trust model, with the application (e.g., Rucio/FTS) at the hub and each AS needs to trust the application. The BGP extension model requires the trust of peers and recursive peers (BGP community may be used to impose policies).

4.3. FIB-Apply Based Design

This is a type of solution that allows data path to collect control plane information. For example, a traceroute based system called PerfSonar is widely deployed in our setting. It is also natural to use NetFlow/sFlow to identify ingress points. Such a system can collect other network information such as delay and loss naturally as measurements.

However, this type of solutions can have many issues. For example, traceoute has issues including anonymous routers, uncertain router IP resulting in node aliasing; load balancing routing resulting in link aliasing.

4.4. Multi-Domain Cascading ALTO

This is a model that is presented by Ingmar Poese as Cascading ALTO at IETF 116. For Cascading ALTO by Poese, please see his IETF 116 slides.

In the ALTO base model, a network is a container, which we call a big switch, with endpoints attached to the big switch. In the multi-domain model, each network (represented by an ALTO server) has a set of ingress points (in-1 to in-m) and a set of egress points (e-1 to e-n). An endpoint belonging to the network will be attached to an ingress point and an egress point. Hence, a single-domain ALTO query will specify ingress and egress directly attached to an ingress point and an egress point. A source network, to a destination that is not in the same network, however, will only return the egress point; a destination network, when the source is from a different network, will need an ingress point. A general transit network will need an ingress point and return egress point. For consistency, the egress point must be a valid ingress point, represented by a unique address, of the peer.



            in-1   +-------------+  e-1
               ----|             |----
                   |             |
               ----|             |----
                   |             |
               ----|             |----
                   |             |
               ----|             |----
            in-m   +-------------+  e-n

ALTO Server Multi-domain Query Model: Each ECS query, if the src is not in the home domain of the ALTO server, should include an ingress point, where the ingress point is returned by the ALTO server of the previous domain. If the domain of the ALTO server is not the home domain of the destination, the ALTO server should return the egress point of the home domain and the ingress of the network domain.

As an optional feature, the query should allow indication of iterative or recursive queries.

To support incremental deployment, an ALTO server may respond to a query without specifying an ingress point and the source is not in the domain of the ALTO server. In this case, the ALTO server will return the results from each potential ingress points. For each ingress point indicated, the server indicates information of the previous hop (e.g., peer AS number and potential address).

4.5. General-Path Model Supporting Partial Deployment

Complementing the cascading model, we introduced a generic-path model at ALTO clients so that they can use the acquired information to gradually refine network information.

In particular, it allows the path from a src to a dst to be a directed acyclic graph, with the following components:

A set of nodes, where each node has both a type, and attributes, where the type can be (1) host: such as src/dst, with attributes such as IP address; (2) AS: which is a group of nodes, i.e., subgraph, with attributes including ASN; (3) router, with subtypes such as BGP-router, with attributes such as IP address.

A set of links, where each link has a head and a tail; hence the types of links will be the unique combinations of head-type x tail type. A link can have its attributes as well.

Now, some examples of this representation in our deployment use case:

For the geo-distance ALTO cost derived from geo-ip: the src is a host and the dst is also a host, and the metric is the geo distance;

For CERN looking glass ALTO server, from a src host in CERN to a dst host in another network, say KIT, the src is a host, with two links, one for each of the two looking glass BGP routers from cern; each of these BGP routers links to its BGP peer, and each such BGP peer links to the next AS, in the AS-PATH exposed by CERN.

5. IANA Considerations

Some of the solutions will need IANA registrations.

6. Acknowledgments

The authors of this document would also like to thank many for the reviews and comments.

7. References

7.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC7285]
Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., Previdi, S., Roome, W., Shalunov, S., and R. Woundy, "Application-Layer Traffic Optimization (ALTO) Protocol", RFC 7285, DOI 10.17487/RFC7285, , <https://www.rfc-editor.org/info/rfc7285>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

7.2. Informative References

[RFC7971]
Stiemerling, M., Kiesel, S., Scharf, M., Seidel, H., and S. Previdi, "Application-Layer Traffic Optimization (ALTO) Deployment Considerations", RFC 7971, DOI 10.17487/RFC7971, , <https://www.rfc-editor.org/info/rfc7971>.

Authors' Addresses

Danny Lachos
Benocs
Berlin
Germany
Ingmar Poese
Benocs
Berlin
Germany
Mario Lassnig
CERN
CH-1211 Geneva 23
Switzerland
Annie Gu
Yale University
51 Prospect St
New Haven, CT 06520
United States of America
Y. Richard Yang
Yale University
51 Prospect St
New Haven, CT 06520
United States of America
Jordi Ros Giralt
Qualcomm
Madrid
Spain