Network Working Group Q. Xiong Internet-Draft ZTE Corporation Intended status: Informational K. Yao Expires: 1 March 2025 China Mobile C. Huang China Telecom Z. Han China Unicom 28 August 2024 Use Cases, Problems and Requirements for High Performance Wide Area Network draft-xiong-uc-problem-req-hp-wan-00 Abstract High Performance Wide Area Network (HP-WAN) is designed for many applications such as scientific research, education, and other data- intensive applications which demand massive data transmission, and it needs to ensure data integrity and provide stable and efficient transmission services. This document describes the use cases, analyses the problems, and outlines the requirements for HP-WANs. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 1 March 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. Xiong, et al. Expires 1 March 2025 [Page 1] Internet-Draft Use Cases, Problems and Requirements for August 2024 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. High Performance Computing (HPC) . . . . . . . . . . . . 5 3.2. Backup and Disaster Recovery . . . . . . . . . . . . . . 5 3.3. Multimedia Content Production . . . . . . . . . . . . . . 6 3.4. AI Training . . . . . . . . . . . . . . . . . . . . . . . 6 4. Problem Statements . . . . . . . . . . . . . . . . . . . . . 7 4.1. Challenging with Low Bandwidth Utilization of a Single Elephant Flow . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Challenging with Massive Flows Data with Large Burst . . 7 4.3. Challenging with Long-distance Delay and Slow Feedback . 8 4.4. Challenging with Packet Loss Impacting Transport Protocols . . . . . . . . . . . . . . . . . . . . . . . . 8 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 9 5.1. Service Requirements . . . . . . . . . . . . . . . . . . 9 5.2. Performance Requirements . . . . . . . . . . . . . . . . 10 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction Data is fundamental for many scientific research, including biology, astronomy, and artificial intelligence(AI), etc. Within these areas, there are many applications that generate huge volume of data by using advanced instruments and high-end computing devices. For data sharing and data backup, these applications usually require massive data transmission over long distance, for example, sharing data between research institutes over thousands of kilometers. These applications include High Performance Computing (HPC) for scientific research, cloud storage and backup of industrial internet data, distributed training, and so on. It needs to ensure data integrity Xiong, et al. Expires 1 March 2025 [Page 2] Internet-Draft Use Cases, Problems and Requirements for August 2024 and provide stable and efficient transmission services in Wide Area Networks (WANs). These WANs need to connect research institutions, universities, and data centers across large geographical areas. Traditional data migration solutions include manual transportation of hard copy, which not only incurs more labor cost, but also lacks safety, and high-speed dedicated connectivity (e.g. Direct optical connection), which is expensive. Moreover, the applications may demand a periodic and temporary migration, require task-based data transmission with low real-time requirements, and the transmission frequency is variable, all of which will lead to low network utilization and cost-effectiveness. The massive data may be transmitted over non-dedicated WANs and the network requirements demand high performance such as the high- throughput data transmission which depends on the transport layer protocols such as Transfer Control Protocol (TCP), Quick UDP Internet Connections (QUIC), Remote Direct Memory Access (RDMA) and so on. But the performance of TCP will be impacted by the packet loss retransmission techniques. And for RDMA, there are three main implementation methods such as InfiniBand (IB), which is a high- performance dedicated network technology, but requires specific InfiniBand hardware support, Internet Wide Area RDMA Protocol (iWARP), which is based on the TCP/IP protocol, but the transmission performance may be affected by the congestion control and flow control of TCP, and RDMA over Converged Ethernet (RoCE), which allows the execution of RDMA over Ethernet, but it has applicability issues over WANs. Moreover, the long-distance connection and massive data transmission between two or more sites have become a key factor affecting the performance. For instance, the long-distance networks may have more uncertainties, such as routing changes, network congestion, packet loss and link quality fluctuations, all of which may have a negative impact on the performance. The services are massive and concurrent with multiple types and different traffic models such as the elephant flows with short interval time, high speed and large data scale, which may occupy a large amount of network resources and affect the performance. High Performance Wide Area Network (HP-WAN) is designed specifically to meet the high-speed, low-latency, and high-capacity needs of massive data set applications, which puts forward higher performance requirements such as ultra-high goodput, high bandwidth utilization, ultra-low packet loss ratio, and resilience to ensure effective high- throughput transmission. Xiong, et al. Expires 1 March 2025 [Page 3] Internet-Draft Use Cases, Problems and Requirements for August 2024 This document describes the use cases, analyses the problems, and outlines the requirements for HP-WANs. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Terminology The terminology is defined as following. High Performance Wide Area Networks (HP-WANs): indicate the networks designed specifically to meet the high-speed, low-latency, and high- capacity needs of scientific research, education, and data-intensive applications. The primary goal of HP-WAN is to achieve massive data transmission, which puts forward higher performance requirements such as ultra-high goodput, high bandwidth utilization, ultra-low packet loss ratio, and resilience to ensure effective high-throughput transmission. It also makes use of the following abbreviations and definitions in this document: DC: Data Center DCI: Data Centers Interconnection HPC: High Performance Computing WAN: Wide Area Networks MAN: Metropolitan Area Networks PFC: Priority Flow Control ECN: Explicit Congestion Notification ECMP: Equal-Cost Multipath RTT: Round-Trip Time TCP: Transfer Control Protocol RDMA: Remote Direct Memory Access Round-Trip Time Xiong, et al. Expires 1 March 2025 [Page 4] Internet-Draft Use Cases, Problems and Requirements for August 2024 QUIC: Quick UDP Internet Connections 3. Use Cases Several use cases are documented for scenarios requiring high- performance data transmission over WANs. 3.1. High Performance Computing (HPC) High Performance Computing (HPC) uses computing clusters to perform complex scientific computing and data analysis tasks. HPC is a critical component to solve some complex problems in various fields such as scientific research, engineering, finance, and data analysis. For example, the research data of large science and engineering projects in cooperation with many research institutions requires long-term archiving of about 50~300PB of data every year. The PSII protein process generates 30 to 120 high-resolution images per second during experiments. This results in 60~100 GB of data every five minutes, requiring data transmission from one laboratory to another for analysis. Another example is Five-hundred-meter Aperture Spherical radio Telescope (FAST), astronomical data calculation with over 200 observations for each project, a single project generating observation data of TB~PB, and an annual production data of about 15PB per year. HPC requires high bandwidth and high-speed network to facilitate the rapid data exchange between processing units. It also requires high- capacity and high-throughput storage solutions to handle the vast amounts of data generated by simulations and computations. It is necessary to support large-scale parallel processing, high-speed data transmission, and low latency communication to achieve effective collaboration between computing nodes. 3.2. Backup and Disaster Recovery As the development of the cloud computing industry, cloud data centers are bearing a large amount of various enterprise IT services. The storage, transmission, and protection of the massive growth data bring new challenges. For instance, disaster recovery of core application data is required to ensure the enterprise data security and the service continuity. In the scenario of disaster recovery of the operator's traffic data, the daily data backup volume of a single IT cloud resource pool is at the TB level. The primary and backup data centers are normally built in different locations with long data transmission distances. However, they do not have strict requirements for data transmission Xiong, et al. Expires 1 March 2025 [Page 5] Internet-Draft Use Cases, Problems and Requirements for August 2024 time. By utilizing the tidal effect of the network, the idle bandwidth at night can be utilized for the transmission, so as to improve the data transmission efficiency and reduce the data transmission cost. 3.3. Multimedia Content Production Multimedia Content Production refers to the process of creating and editing content that combines different media forms such as text, audio, images, animations, and video. This field is characterized by the use of digital technology to produce engaging and dynamic content for various platforms, including film, television, the internet, and mobile devices. It requires processing a large amount of data, including raw video materials, special effects, and rendering results. For example, for film and video production, the raw material data of a large-scale variety show or film and television program is at the PB level, with a single transmission of data in the range of 10TB to 100TB. And with the development of new media such as 4K/8K, 5G, AI, VR/AR and short video, large amount of audio and video data needs to be transmitted between data centers or different storage sites across long distance. For AR/VR videos, the terminal outputs 1080P image quality requires 40M per user. It demands data transmission with the traffic characteristics such as massive data scale and large burst. 3.4. AI Training With the increasing demand for computing power in AI large-scale model training, the scale of a single data center is limited due to factors such as power supply. The AI training clusters expands from single data center to multiple DCs. Collaborative training across multiple DCs typically refers to the process of distributed machine learning training across multiple data centers, which can improve computational efficiency, accelerate model training speed, and utilize more data resources. For example, it is used for the training process of deep learning and the training data has reached 3.05TB. Uploading a large model training templates requires uploading TB/PB level data to the data center. Each training session has fewer data flows with larger bandwidth. And 20% of the current network's services accounts for 80% of the traffic which resulting in elephant flows. Compared with traditional DCI scenarios, parameters exchange significantly increases the amount of data transmission across DCs, typically from tens to hundreds of TB. It should provide sufficient bandwidth, low latency, and high reliability for data centers communications. Xiong, et al. Expires 1 March 2025 [Page 6] Internet-Draft Use Cases, Problems and Requirements for August 2024 4. Problem Statements Challenges of effective high-performance transmission in HP-WAN come from massive concurrent services and long-distance delays and packet loss. The existing network technologies have various problems and cannot meet the demands. This document outlines the problems for HP- WANs. 4.1. Challenging with Low Bandwidth Utilization of a Single Elephant Flow In HP-WAN applications, a large amount of data will be transmitted in a single time, for example, a single flow data is TB~PB. It may be elephant flows which lasts for a long time with short interval time, high speed and large data scale in the network, which may occupy a large amount of network resources and affect the performance. It may be challenging for low bandwidth utilization with network congestion and load imbalance. When transmitting massive data, the traffic is mainly elephant flow and the network resources is insufficient in WANs. Uneven network load will lead to a decrease in network throughput and low link utilization. Load balance refers to a method for the allocation of load (traffic) to multiple links for forwarding traffic. For example, it will be challenging for HASH conflict and poor network balancing with massive elephant flows when flow-based ECMP distributes the elephant flows into the same link, resulting in congestion and packet loss. 4.2. Challenging with Massive Flows Data with Large Burst There are massive flows data transfers with large burst which may cause instantaneous congestion and packet loss within network device queues in WANs. There will be more aggregations at the edge of WANs and it may be accumulated as the flows traverse, join, and separate over hops. It will be challenging for congestion control and bandwidth guarantee for the bursty traffic. Moreover, the applications may have multiple concurrent services co- existed with existing dynamic flows. Considering the multiple services with various types and different traffic requirements, the traffic is required to be scheduled to multiple paths and fine- grained network resources to achieve high utilization and QoS guarantee. It will be challenging for traffic scheduling especially when it is unable to get the Traffic Specification (T-SPEC) of the flows. Xiong, et al. Expires 1 March 2025 [Page 7] Internet-Draft Use Cases, Problems and Requirements for August 2024 4.3. Challenging with Long-distance Delay and Slow Feedback In HP-WAN scenarios, it will be challenging for flow control due to the long-distance link and transmission delay. Flow control refers to a method for ensuring the data is transmitted efficiently and reliably and controlling the rate of data transmission to prevent the fast sender from overwhelming the slow receiver and prevent packet loss in congested situations. It is required to configure the reasonable threshold and increase buffer for effective throughput without packet loss for the long-distance delay. It will be also challenging for congestion control in WANs for controlling the total amount of data entering the network to maintain the traffic at an acceptable level. The long-distance transmission of thousands of kilometers results in extremely long link transmission delays and it will delay the network state feedback. For example, as per [RFC3168], Explicit Congestion Notification (ECN) defines an end-to-end congestion notification mechanism based on IP and transport layers. When the congestion occurred, the device will mark packets and transmits congestion information to the server and the server sends packets to the client to notify the source to adjust the transmission rate to achieve congestion control. The long- distance will delay the notification and slow the feedback, which result in the untimely adjustment. Moreover, the slow feedback may has impact for some congestion control algorithms. For example, Bottleneck Bandwidth and Round-trip propagation time (BBR) is a congestion-based congestion control algorithm for TCP, which actively measures bottleneck bandwidth (BtlBw) and round-trip propagation time (RTprop) based on the model to calculate the bandwidth delay product (BDP) and then to adjust the transmission rate to maximize throughput and minimize latency. But BBR relies on real-time measurement of the parameters which may vary greatly, feedback slowly, thereby affecting the control precision of BBR in long-distance networks. Moreover, the Data Center Quantized Congestion Notification (DCQCN) and High Precision Congestion Control (HPCC++) would not tolerate the long feedback loop. The stability and adaptability of congestion control algorithms may be challenging in HP-WAN scenarios. 4.4. Challenging with Packet Loss Impacting Transport Protocols It will be challenging that the packet loss has a significant impact on the throughput of some transmission protocols especially in HP-WAN scenarios. For example, the design of RDMA is aimed at high performance and low latency, which makes RDMA have strict requirements for the network, that is, the network would be better to provide ultra-low packet loss, otherwise the performance degradation Xiong, et al. Expires 1 March 2025 [Page 8] Internet-Draft Use Cases, Problems and Requirements for August 2024 will be significant, which poses greater challenges to the underlying network hardware and also limits the network size of RDMA. RDMA relies on a goBackN retransmission mechanism and the throughput dramatically decreases with packet loss rates greater than 0.1%, and a 2% packet loss rate effectively reduces throughput to zero. And for TCP and QUIC, Congestion-based Upon Bandwidth-Information (CUBIC) is a traditional congestion algorithm, as per [RFC9438], and it uses a more aggressive window increase function which is suitable for high-speed and long-distance network. When packet loss occurs, CUBIC will reduce the congestion window based on its multiplicative window decrease factor, that will slow the convergence speed. So it has a requirement for low network packet loss. As per [RFC9438], section 5.2, it is required a packet loss rate of 2.9e-8 to achieve the throughput of 10 Gbps rate. The throughput will dramatically decrease when the packet loss ratio is over a threshold value. 5. Requirements 5.1. Service Requirements The characteristics of above use cases and problems may include massive elephant flows data with large burst, multiple concurrent services co-existed with dynamic flows and long distances between sites. This document outlines the service requirements from users as following shown. * Massive data transmission, e.g. a single flow data is TB~PB. * Task-based data transmission, and the frequency is variable, e.g.a periodic and temporary migration. * Long-distance transmission, between one or more sites or DCs, e.g.more than 1000km. * Instant transmission, it needs to be transmitted immediately or at a specific time. * Timely transmission, it has a completion time but without real- time transmission requirements. * Low cost * Data security and integrity Xiong, et al. Expires 1 March 2025 [Page 9] Internet-Draft Use Cases, Problems and Requirements for August 2024 * Compatibility and complementation with dedicated networks such as Research and Education Network. For example, it is required to provide switching with a fine-grained mapping between private networks and WANs to achieve optimal operating and consumption costs. 5.2. Performance Requirements This document outlines the requirements for effective high-throughput data transmission in HP-WAN with the performance indicators such as ultra-high bandwidth utilization, ultra-low packet loss ratio and low latency as following shown. * Ultra-low Packet Loss Ratio: according to the performance indicators of throughput, the packet loss negatively correlates with throughput. The lower the packet loss rate, the higher the throughput. It is important to ensure the ultra-low packet loss ratio to achieve high-throughput data transmission in HP-WAN. * Ultra-high Bandwidth Utilization: refers to the efficient use of available network capacity to maximize data transfer rates and minimize latency. It is required to improve the bandwidth utilization to achieve high-throughput data transmission for multiple concurrent services in HP-WAN. * Low Latency: RTT is another performance indicators of throughput which negatively correlated with throughput. The lower the RTT, the higher the throughput. It is required to guarantee low long- distance delay to achieve high-throughput data transmission in HP- WAN. 6. Security Considerations This document covers a number of representative applications and network scenarios that are expected to make use of HP-WAN technologies. Each of the potential use cases does not raise any security concerns or issues, but may have security considerations from both the use-specific perspective and the technology-specific perspective. 7. IANA Considerations This document makes no requests for IANA action. 8. Acknowledgements The authors would like to acknowledge Zheng Zhang, Yao Liu and Guangping Huang for their thorough review and very helpful comments. Xiong, et al. Expires 1 March 2025 [Page 10] Internet-Draft Use Cases, Problems and Requirements for August 2024 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, September 2001, . [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., and B. Khasnabish, "Mechanisms for Optimizing Link Aggregation Group (LAG) and Equal-Cost Multipath (ECMP) Component Link Utilization in Networks", RFC 7424, DOI 10.17487/RFC7424, January 2015, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8664] Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., and J. Hardwick, "Path Computation Element Communication Protocol (PCEP) Extensions for Segment Routing", RFC 8664, DOI 10.17487/RFC8664, December 2019, . [RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, May 2022, . [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., "CUBIC for Fast and Long-Distance Networks", RFC 9438, DOI 10.17487/RFC9438, August 2023, . Authors' Addresses Quan Xiong ZTE Corporation China Email: xiong.quan@zte.com.cn Xiong, et al. Expires 1 March 2025 [Page 11] Internet-Draft Use Cases, Problems and Requirements for August 2024 Kehan Yao China Mobile China Email: yaokehan@chinamobile.com Cancan Huang China Telecom China Email: huangcanc@chinatelecom.cn Zhengxin Han China Unicom China Email: hanzx21@chinaunicom.cn Xiong, et al. Expires 1 March 2025 [Page 12]