Internet-Draft Use Cases, Problems and Requirements for August 2024
Xiong, et al. Expires 1 March 2025 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-xiong-uc-problem-req-hp-wan-00
Published:
Intended Status:
Informational
Expires:
Authors:
Q. Xiong
ZTE Corporation
K. Yao
China Mobile
C. Huang
China Telecom
Z. Han
China Unicom

Use Cases, Problems and Requirements for High Performance Wide Area Network

Abstract

High Performance Wide Area Network (HP-WAN) is designed for many applications such as scientific research, education, and other data-intensive applications which demand massive data transmission, and it needs to ensure data integrity and provide stable and efficient transmission services.

This document describes the use cases, analyses the problems, and outlines the requirements for HP-WANs.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 1 March 2025.

Table of Contents

1. Introduction

Data is fundamental for many scientific research, including biology, astronomy, and artificial intelligence(AI), etc. Within these areas, there are many applications that generate huge volume of data by using advanced instruments and high-end computing devices. For data sharing and data backup, these applications usually require massive data transmission over long distance, for example, sharing data between research institutes over thousands of kilometers. These applications include High Performance Computing (HPC) for scientific research, cloud storage and backup of industrial internet data, distributed training, and so on. It needs to ensure data integrity and provide stable and efficient transmission services in Wide Area Networks (WANs). These WANs need to connect research institutions, universities, and data centers across large geographical areas.

Traditional data migration solutions include manual transportation of hard copy, which not only incurs more labor cost, but also lacks safety, and high-speed dedicated connectivity (e.g. Direct optical connection), which is expensive. Moreover, the applications may demand a periodic and temporary migration, require task-based data transmission with low real-time requirements, and the transmission frequency is variable, all of which will lead to low network utilization and cost-effectiveness.

The massive data may be transmitted over non-dedicated WANs and the network requirements demand high performance such as the high-throughput data transmission which depends on the transport layer protocols such as Transfer Control Protocol (TCP), Quick UDP Internet Connections (QUIC), Remote Direct Memory Access (RDMA) and so on. But the performance of TCP will be impacted by the packet loss retransmission techniques. And for RDMA, there are three main implementation methods such as InfiniBand (IB), which is a high-performance dedicated network technology, but requires specific InfiniBand hardware support, Internet Wide Area RDMA Protocol (iWARP), which is based on the TCP/IP protocol, but the transmission performance may be affected by the congestion control and flow control of TCP, and RDMA over Converged Ethernet (RoCE), which allows the execution of RDMA over Ethernet, but it has applicability issues over WANs.

Moreover, the long-distance connection and massive data transmission between two or more sites have become a key factor affecting the performance. For instance, the long-distance networks may have more uncertainties, such as routing changes, network congestion, packet loss and link quality fluctuations, all of which may have a negative impact on the performance. The services are massive and concurrent with multiple types and different traffic models such as the elephant flows with short interval time, high speed and large data scale, which may occupy a large amount of network resources and affect the performance.

High Performance Wide Area Network (HP-WAN) is designed specifically to meet the high-speed, low-latency, and high-capacity needs of massive data set applications, which puts forward higher performance requirements such as ultra-high goodput, high bandwidth utilization, ultra-low packet loss ratio, and resilience to ensure effective high-throughput transmission.

This document describes the use cases, analyses the problems, and outlines the requirements for HP-WANs.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Terminology

The terminology is defined as following.

High Performance Wide Area Networks (HP-WANs): indicate the networks designed specifically to meet the high-speed, low-latency, and high-capacity needs of scientific research, education, and data-intensive applications. The primary goal of HP-WAN is to achieve massive data transmission, which puts forward higher performance requirements such as ultra-high goodput, high bandwidth utilization, ultra-low packet loss ratio, and resilience to ensure effective high-throughput transmission.

It also makes use of the following abbreviations and definitions in this document:

DC:
Data Center
DCI:
Data Centers Interconnection
HPC:
High Performance Computing
WAN:
Wide Area Networks
MAN:
Metropolitan Area Networks
PFC:
Priority Flow Control
ECN:
Explicit Congestion Notification
ECMP:
Equal-Cost Multipath
RTT:
Round-Trip Time
TCP:
Transfer Control Protocol
RDMA:
Remote Direct Memory Access Round-Trip Time
QUIC:
Quick UDP Internet Connections

3. Use Cases

Several use cases are documented for scenarios requiring high-performance data transmission over WANs.

3.1. High Performance Computing (HPC)

High Performance Computing (HPC) uses computing clusters to perform complex scientific computing and data analysis tasks. HPC is a critical component to solve some complex problems in various fields such as scientific research, engineering, finance, and data analysis.

For example, the research data of large science and engineering projects in cooperation with many research institutions requires long-term archiving of about 50~300PB of data every year. The PSII protein process generates 30 to 120 high-resolution images per second during experiments. This results in 60~100 GB of data every five minutes, requiring data transmission from one laboratory to another for analysis. Another example is Five-hundred-meter Aperture Spherical radio Telescope (FAST), astronomical data calculation with over 200 observations for each project, a single project generating observation data of TB~PB, and an annual production data of about 15PB per year.

HPC requires high bandwidth and high-speed network to facilitate the rapid data exchange between processing units. It also requires high-capacity and high-throughput storage solutions to handle the vast amounts of data generated by simulations and computations. It is necessary to support large-scale parallel processing, high-speed data transmission, and low latency communication to achieve effective collaboration between computing nodes.

3.2. Backup and Disaster Recovery

As the development of the cloud computing industry, cloud data centers are bearing a large amount of various enterprise IT services. The storage, transmission, and protection of the massive growth data bring new challenges.

For instance, disaster recovery of core application data is required to ensure the enterprise data security and the service continuity. In the scenario of disaster recovery of the operator's traffic data, the daily data backup volume of a single IT cloud resource pool is at the TB level. The primary and backup data centers are normally built in different locations with long data transmission distances. However, they do not have strict requirements for data transmission time. By utilizing the tidal effect of the network, the idle bandwidth at night can be utilized for the transmission, so as to improve the data transmission efficiency and reduce the data transmission cost.

3.3. Multimedia Content Production

Multimedia Content Production refers to the process of creating and editing content that combines different media forms such as text, audio, images, animations, and video. This field is characterized by the use of digital technology to produce engaging and dynamic content for various platforms, including film, television, the internet, and mobile devices. It requires processing a large amount of data, including raw video materials, special effects, and rendering results.

For example, for film and video production, the raw material data of a large-scale variety show or film and television program is at the PB level, with a single transmission of data in the range of 10TB to 100TB. And with the development of new media such as 4K/8K, 5G, AI, VR/AR and short video, large amount of audio and video data needs to be transmitted between data centers or different storage sites across long distance. For AR/VR videos, the terminal outputs 1080P image quality requires 40M per user. It demands data transmission with the traffic characteristics such as massive data scale and large burst.

3.4. AI Training

With the increasing demand for computing power in AI large-scale model training, the scale of a single data center is limited due to factors such as power supply. The AI training clusters expands from single data center to multiple DCs. Collaborative training across multiple DCs typically refers to the process of distributed machine learning training across multiple data centers, which can improve computational efficiency, accelerate model training speed, and utilize more data resources.

For example, it is used for the training process of deep learning and the training data has reached 3.05TB. Uploading a large model training templates requires uploading TB/PB level data to the data center. Each training session has fewer data flows with larger bandwidth. And 20% of the current network's services accounts for 80% of the traffic which resulting in elephant flows. Compared with traditional DCI scenarios, parameters exchange significantly increases the amount of data transmission across DCs, typically from tens to hundreds of TB. It should provide sufficient bandwidth, low latency, and high reliability for data centers communications.

4. Problem Statements

Challenges of effective high-performance transmission in HP-WAN come from massive concurrent services and long-distance delays and packet loss. The existing network technologies have various problems and cannot meet the demands. This document outlines the problems for HP-WANs.

4.1. Challenging with Low Bandwidth Utilization of a Single Elephant Flow

In HP-WAN applications, a large amount of data will be transmitted in a single time, for example, a single flow data is TB~PB. It may be elephant flows which lasts for a long time with short interval time, high speed and large data scale in the network, which may occupy a large amount of network resources and affect the performance. It may be challenging for low bandwidth utilization with network congestion and load imbalance.

When transmitting massive data, the traffic is mainly elephant flow and the network resources is insufficient in WANs. Uneven network load will lead to a decrease in network throughput and low link utilization. Load balance refers to a method for the allocation of load (traffic) to multiple links for forwarding traffic. For example, it will be challenging for HASH conflict and poor network balancing with massive elephant flows when flow-based ECMP distributes the elephant flows into the same link, resulting in congestion and packet loss.

4.2. Challenging with Massive Flows Data with Large Burst

There are massive flows data transfers with large burst which may cause instantaneous congestion and packet loss within network device queues in WANs. There will be more aggregations at the edge of WANs and it may be accumulated as the flows traverse, join, and separate over hops. It will be challenging for congestion control and bandwidth guarantee for the bursty traffic.

Moreover, the applications may have multiple concurrent services co-existed with existing dynamic flows. Considering the multiple services with various types and different traffic requirements, the traffic is required to be scheduled to multiple paths and fine-grained network resources to achieve high utilization and QoS guarantee. It will be challenging for traffic scheduling especially when it is unable to get the Traffic Specification (T-SPEC) of the flows.

4.3. Challenging with Long-distance Delay and Slow Feedback

In HP-WAN scenarios, it will be challenging for flow control due to the long-distance link and transmission delay. Flow control refers to a method for ensuring the data is transmitted efficiently and reliably and controlling the rate of data transmission to prevent the fast sender from overwhelming the slow receiver and prevent packet loss in congested situations. It is required to configure the reasonable threshold and increase buffer for effective throughput without packet loss for the long-distance delay.

It will be also challenging for congestion control in WANs for controlling the total amount of data entering the network to maintain the traffic at an acceptable level. The long-distance transmission of thousands of kilometers results in extremely long link transmission delays and it will delay the network state feedback. For example, as per [RFC3168], Explicit Congestion Notification (ECN) defines an end-to-end congestion notification mechanism based on IP and transport layers. When the congestion occurred, the device will mark packets and transmits congestion information to the server and the server sends packets to the client to notify the source to adjust the transmission rate to achieve congestion control. The long-distance will delay the notification and slow the feedback, which result in the untimely adjustment.

Moreover, the slow feedback may has impact for some congestion control algorithms. For example, Bottleneck Bandwidth and Round-trip propagation time (BBR) is a congestion-based congestion control algorithm for TCP, which actively measures bottleneck bandwidth (BtlBw) and round-trip propagation time (RTprop) based on the model to calculate the bandwidth delay product (BDP) and then to adjust the transmission rate to maximize throughput and minimize latency. But BBR relies on real-time measurement of the parameters which may vary greatly, feedback slowly, thereby affecting the control precision of BBR in long-distance networks. Moreover, the Data Center Quantized Congestion Notification (DCQCN) and High Precision Congestion Control (HPCC++) would not tolerate the long feedback loop. The stability and adaptability of congestion control algorithms may be challenging in HP-WAN scenarios.

4.4. Challenging with Packet Loss Impacting Transport Protocols

It will be challenging that the packet loss has a significant impact on the throughput of some transmission protocols especially in HP-WAN scenarios. For example, the design of RDMA is aimed at high performance and low latency, which makes RDMA have strict requirements for the network, that is, the network would be better to provide ultra-low packet loss, otherwise the performance degradation will be significant, which poses greater challenges to the underlying network hardware and also limits the network size of RDMA. RDMA relies on a goBackN retransmission mechanism and the throughput dramatically decreases with packet loss rates greater than 0.1%, and a 2% packet loss rate effectively reduces throughput to zero.

And for TCP and QUIC, Congestion-based Upon Bandwidth-Information (CUBIC) is a traditional congestion algorithm, as per [RFC9438], and it uses a more aggressive window increase function which is suitable for high-speed and long-distance network. When packet loss occurs, CUBIC will reduce the congestion window based on its multiplicative window decrease factor, that will slow the convergence speed. So it has a requirement for low network packet loss. As per [RFC9438], section 5.2, it is required a packet loss rate of 2.9e-8 to achieve the throughput of 10 Gbps rate. The throughput will dramatically decrease when the packet loss ratio is over a threshold value.

5. Requirements

5.1. Service Requirements

The characteristics of above use cases and problems may include massive elephant flows data with large burst, multiple concurrent services co-existed with dynamic flows and long distances between sites. This document outlines the service requirements from users as following shown.

5.2. Performance Requirements

This document outlines the requirements for effective high-throughput data transmission in HP-WAN with the performance indicators such as ultra-high bandwidth utilization, ultra-low packet loss ratio and low latency as following shown.

6. Security Considerations

This document covers a number of representative applications and network scenarios that are expected to make use of HP-WAN technologies. Each of the potential use cases does not raise any security concerns or issues, but may have security considerations from both the use-specific perspective and the technology-specific perspective.

7. IANA Considerations

This document makes no requests for IANA action.

8. Acknowledgements

The authors would like to acknowledge Zheng Zhang, Yao Liu and Guangping Huang for their thorough review and very helpful comments.

9. References

9.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3168]
Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, , <https://www.rfc-editor.org/info/rfc3168>.
[RFC7424]
Krishnan, R., Yong, L., Ghanwani, A., So, N., and B. Khasnabish, "Mechanisms for Optimizing Link Aggregation Group (LAG) and Equal-Cost Multipath (ECMP) Component Link Utilization in Networks", RFC 7424, DOI 10.17487/RFC7424, , <https://www.rfc-editor.org/info/rfc7424>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8664]
Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., and J. Hardwick, "Path Computation Element Communication Protocol (PCEP) Extensions for Segment Routing", RFC 8664, DOI 10.17487/RFC8664, , <https://www.rfc-editor.org/info/rfc8664>.
[RFC9232]
Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, , <https://www.rfc-editor.org/info/rfc9232>.
[RFC9438]
Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., "CUBIC for Fast and Long-Distance Networks", RFC 9438, DOI 10.17487/RFC9438, , <https://www.rfc-editor.org/info/rfc9438>.

Authors' Addresses

Quan Xiong
ZTE Corporation
China
Kehan Yao
China Mobile
China
Cancan Huang
China Telecom
China
Zhengxin Han
China Unicom
China