Network Working Group D. King Internet-Draft Lancaster University Intended status: Informational T. Chown Expires: 1 March 2025 Jisc C. Rapier Pittsburgh Supercomputing Center D. Huang ZTE Corporation 28 August 2024 Current State of the Art for High Performance Wide Area Networks draft-kcrh-state-of-art-hp-wan-00 Abstract High Performance Wide Area Networks (HP-WANs) represent a critical infrastructure for the modern global research and education community, facilitating collaboration across national and international boundaries. These networks, such as Janet, ESnet, GÉANT, Internet2, CANARIE, and others, are designed to support the general needs of the research and education users they serve but also the the transmission of vast amounts of data generated by scientific research, high-performance computing, distributed AI-training and large-scale simulations. This document provides an overview of the terminology and techniques used for existing HP-WANS. It also explores the technological advancements, operational tools, and future directions for HP-WANs, emphasising their role in enabling cutting-edge scientific research, big data analysis, AI training and massive industrial data analysis. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 1 March 2025. King, et al. Expires 1 March 2025 [Page 1] Internet-Draft HP-WAN STATE OF ART August 2024 Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Example Use Cases for HP-WANs . . . . . . . . . . . . . . . . 5 4. Current Technologies Used in HP-WANs: Key Components . . . . 6 4.1. Topology . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Bandwidth and Latency . . . . . . . . . . . . . . . . . . 8 4.3. Data Movement Protocols . . . . . . . . . . . . . . . . . 8 4.4. Forwarding Optimisation . . . . . . . . . . . . . . . . . 8 4.5. Reliability . . . . . . . . . . . . . . . . . . . . . . . 9 4.6. Quality of Service . . . . . . . . . . . . . . . . . . . 9 4.7. Performance Monitoring . . . . . . . . . . . . . . . . . 10 4.8. Scalability . . . . . . . . . . . . . . . . . . . . . . . 10 4.9. Resource Scheduling . . . . . . . . . . . . . . . . . . . 10 5. Examples of HP-WANs . . . . . . . . . . . . . . . . . . . . . 10 5.1. GÉANT . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2. Janet . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.3. Energy Sciences Network . . . . . . . . . . . . . . . . . 12 5.4. Internet2 . . . . . . . . . . . . . . . . . . . . . . . . 12 5.5. CANARIE . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.6. Asia-Pacific Advanced Network . . . . . . . . . . . . . . 12 6. Emerging Trends and Future Directions . . . . . . . . . . . . 12 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 10. Normative References . . . . . . . . . . . . . . . . . . . . 13 11. Informative References . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 King, et al. Expires 1 March 2025 [Page 2] Internet-Draft HP-WAN STATE OF ART August 2024 1. Introduction High Performance Wide Area Networks (HP-WANs) are the backbone of global research and education infrastructure, enabling the seamless transfer of vast amounts of data and supporting advanced scientific collaborations worldwide. These networks are designed to meet the demanding requirements of data-intensive research fields, including high-energy physics, climate modeling, genomics, and artificial intelligence. The evolution of HP-WANs is deeply intertwined with the growing need for advanced scientific research and the increasing globalisation of collaboration. Traditional WANs, which were sufficient for general business and communication needs, quickly became inadequate for the specialised requirements of research institutions. As scientific endeavours began to generate larger datasets, ranging from terabytes to petabytes, there arose a need for networks capable of transferring these massive volumes of data reliably and securely across large distances. The first HP-WANs emerged as specialised research networks, such as ESnet in the United States, Janet in the UK, and GÉANT in Europe, developed to support the unique needs of the scientific community. These networks were designed to provide high bandwidth and ensure low latency, high reliability, and robust security, which are critical for applications like real-time data analysis, distributed computing, and remote instrumentation. Today, HP-WANs are foundational to the research community and are leading the way in demonstrating how advanced networking technologies can be applied to other sectors. They serve as testbeds for innovations in networking that eventually trickle down to broader commercial applications. As we look toward the future, HP-WANs will continue to play a critical role in enabling scientific discoveries and fostering international collaboration, particularly as emerging technologies such as quantum computing and the Internet of Things (IoT) push the boundaries of what these networks must support. This document explores the current state of the art in HP-WANs, examining the technological advancements, operational challenges, and emerging trends shaping the future of networks built for research, education, massive data analysis and collaborative AI training at scale and speed. Through this exploration, we aim to provide a better understanding of the current state of the art in high performance computing across wide area networking. King, et al. Expires 1 March 2025 [Page 3] Internet-Draft HP-WAN STATE OF ART August 2024 1.1. Background [Editor's note - to add a historical development of HP-WANs description.] [Editor's note - to add description of the role of HP-WANs in supporting scientific research and education.] 2. Terminology This document provides a lexicon terminology that relates to high performance WANs. CERN: The European Organization for Nuclear Research, housing the Large Hadron Collider (LHC). High Performance Computing (HPC): Is a general term for computing with a high level of performance. Often high performance computing specifically refers to running jobs which are very parallel, often running on hundreds or even thousands of cores. High Performance Wide Area Network (HP-WAN): A type of Wide Area Network (WAN) designed specifically to meet the high-speed, low- latency, and high-capacity needs of scientific research, education, and data-intensive applications. These networks connect research institutions, universities, and data centers across large geographical areas. Infiniband: Traditionally, a localised data interconnect used by many high performance computing (HPC) systems providing high bandwidth and low latency. National Research and Education Network (NREN): A specialised network supporting the research and education community within a specific country or region. NRENs provide high-speed connectivity and other services tailored to the needs of academic and research institutions. Remote direct memory access (RDMA): Enables one networked node to access another networked nodes's memory without involving either computer's operating system or interrupting either nodes's processing. This helps minimise latency and maximise throughput, reducing memory bandwidth bottlenecks. RDMA over Converged Ethernet (RoCE): Traditionally, a network protocol which allows remote direct memory access (RDMA) over a local Ethernet network. There are multiple RoCE versions. RoCE v1 is an Ethernet link layer protocol and hence allows King, et al. Expires 1 March 2025 [Page 4] Internet-Draft HP-WAN STATE OF ART August 2024 communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed. Worldwide LHC Computing Grid (WLCG): Is a global network of over 170 computing centres across more than 40 countries, designed to process, store, and analyse the vast amounts of data generated by the Large Hadron Collider (LHC) at CERN. Performance Service Oriented Network monitoring Architecture(PerfSONAR): Is a network performance monitoring toolkit designed to provide end-to-end performance measurement and monitoring across multi-domain network infrastructures. Science DMZ: A model for deployment of infrastructure at a site (campus) to optimise the performance of data transfers in and out of data transfer nodes (DTNs) at the site – see https://fasterdata.es.net/science-dmz/. Elements of the model include the local network architecture, tuning of DTNs, choice of data transfer software, efficient security policy implementation and persistent monitoring. 3. Example Use Cases for HP-WANs HP-WAN applications have become synonymous with large scale research and experimentation, big data, and AI. HPC and therefore HP-WAN, is driving continuous innovation in use cases across the following industries. * High-Energy Physics Research, e.g., the Large Hadron Collider (LHC) * Climate Modeling * Radioastronomy, e.g., the Square Kilometer Array (SKA) project * Healthcare, Genomics and Life Sciences * AI training * Media Content Creation * Government and Defence King, et al. Expires 1 March 2025 [Page 5] Internet-Draft HP-WAN STATE OF ART August 2024 The data rates required by HPC applications vary significantly based on the application type and data scale. Scientific simulations, such as climate modeling and molecular dynamics, typically demand data rates from 10 Gbps to over 100 Gbps due to the large volumes of data processed and moved between nodes and storage systems. In high-energy physics, such as experiments at CERN, data rates can reach hundreds of gigabits per second, with aggreagte peaks between site exceeding 1 Tbps currently, and predicted to rise to 10 Tbps, during intensive data processing. Healthcare, Genomics, and Life Sciences might typically operate at rates between 1 Gbps and 40 Gbps. These applications require high throughput to handle large datasets efficiently, often through parallel data streams. AI learning and tasks, particularly those involving deep learning, require data rates ranging from 10 Gbps to 100 Gbps to ensure efficient data movement, keeping GPUs and other accelerators fully utilised. These varying data rates underscore the high demands of HPC applications, which are expected to grow as the field evolves and datasets become larger. 4. Current Technologies Used in HP-WANs: Key Components High Performance Computing (HPC) networks are specialised networks designed to connect supercomputers and other high-performance computing resources, enabling them to collaborate on computational tasks that require significant processing power, memory, and data storage. These networks are essential for facilitating large-scale scientific research, complex simulations, and data-intensive tasks beyond standard computing systems' capabilities. The following sub-sections outline typical characterics and requirements for HP-WANs. These technical requirements ensure that wide-area interconnects can meet the demanding needs of distributed HPC environments, enabling researchers and scientists to collaborate effectively across the globe. King, et al. Expires 1 March 2025 [Page 6] Internet-Draft HP-WAN STATE OF ART August 2024 4.1. Topology HPC networks can be broadly categorised into intra-site networks, which connect components within a single HPC site, such as a data centre, and inter-site networks, which link multiple HPC sites across different geographical locations. Intra-site networks typically use high-speed, low-latency interconnects like InfiniBand or high-speed Ethernet. In contrast, inter-site networks rely on dedicated high- capacity wide area networks (WANs) to facilitate distributed computing and data sharing on a regional and global scale. Each NREN operator, e.g., Jisc in the case of Janet in the UK, will build and operate the NREN infrastructure for its research and education users. This may typically take the form of a well- provisioned backbone, with regional access networks extending to the end sites (campuses, research organisations, etc). The NREN demarcation is typically at the campus edge. In some countries the regional networks are separately operated. The NRENs then typically have interconnects to other NRENs, forming a worldwide RE network infrastructure. In Europe, GÉANT provides connectivity between the European NRENs and then wider connectivity to the rest of the world. And NRENs will have other interconnects to non-RE networks, e.g., via one or more national IXs, direct peerings to content providers (including the big cloud providers) and then "catch-all" commodity connectivity via one or more Tier 1 ISPs. Dedicated infrastructure is commonly used in HPC environments where performance, security, and reliability are paramount. In these cases, the network infrastructure is built exclusively for HPC applications, including dedicated fibre-optic connections, private data centres, and specialised network transport like RDMA over Converged Ethernet (RoCE) and InfiniBand nodes. The primary benefits of dedicated infrastructure are its ability to provide optimised performance for HPC tasks, ensure high levels of security by preventing unauthorised access, and maintain consistent reliability by avoiding congestion or performance issues caused by other network traffic. Usually, the responsibility for networking within an end site or campus lies with that organisation, e.g., a university IT department, while the operation of an HPC facility may have dedicated (separate) staff. With the additional administrative domains of the NRENs and inter-NREN backbones like GÉANT, end-to-end traffic may pass through many networks operated by different organisations. To achieve optimal e2e performance, everyone needs to implement best practice. King, et al. Expires 1 March 2025 [Page 7] Internet-Draft HP-WAN STATE OF ART August 2024 4.2. Bandwidth and Latency The technical requirements for wide area interconnects between HPC sites are stringent, given the unique demands of distributed high- performance computing. High bandwidth is a primary requirement, as these interconnects must support the rapid transfer of large datasets between sites, ensuring that data movement does not become a bottleneck in computational workflows. HPC data flows might typical consume 1Gbit to beyond 400GBit/s. Low latency is equally critical, as many HPC applications. Latency requirements for inter-DC locations will be in the low-millisecond range. This low latency is essential for applications that require real-time or near-real-time data processing. 4.3. Data Movement Protocols Network-intensive applications like networked storage or cluster computing need a network infrastructure with high bandwidth and low latency. These interconnects may need to support specialised communication protocols designed for HPC environments, such as Remote Direct Memory Access (RDMA), which optimises the performance of distributed HPC applications by reducing overhead and improving data transfer efficiency. InfiniBand (IB) is another computer networking communications standard used in high-performance computing that features very high throughput and very low latency. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. The advantages of RDMA and IB over other network application programming interfaces, are lower latency, CPU load, and bandwidth. The downside with these specialised protocols is the need for all interfaces and nodes to support the technique on the end-to-end path. [Editors Note - Do we need to discuss iWARP?] 4.4. Forwarding Optimisation The scaling of HPC applications, especially across a WAN between multiple sites, requires the ability to route the massive traffic. Specifically, this requires network infrastructure to provide several routing and forwarding characteristics, detailed below. King, et al. Expires 1 March 2025 [Page 8] Internet-Draft HP-WAN STATE OF ART August 2024 * Low entropy: Compared to traditional data center workloads, the number and the diversity of flows for workloads and flow patterns are usually repetitive and predictable. * Burstiness: Flows usually exhibit the “on and of”’ nature in the time granularity of milliseconds. * Jumbo frames: Ethernet frames larger than the standard maximum transmission unit (MTU) size of 1,500 bytes, typically carrying payloads of up to 9,000 bytes. Using jumbo frames can significantly enhance network efficiency and reduce CPU overhead. * Elephant flows: For each burst, the intensity of each flow could reach up to the line rate of NICs. It should be noted that efficiently handling these elephant flows is crucial in HPC as they can otherwise saturate network links, leading to congestion and reduced performance for other network traffic. Strategies to manage elephant flows effectively, such as prioritising these flows or segmenting network traffic, help maintain overall network performance and ensure that large data transfers do not hinder the execution of other critical tasks within the HPC environment. HPC transport options include IP (both UDP and TCP), and emerging mechanisms such as QUIC. However, each transport technology provides strengths and weaknesses. In all cases, the primary goal is to ensure the effective high-throughput, low latency abd jitter, low- packet loss ratio, transmission of massive data sets. 4.5. Reliability Reliability and redundancy are essential to prevent data loss and ensure continuous operation, especially given that HPC tasks often run for extended periods and involve critical research. These networks must also incorporate advanced security measures, including encryption and secure access controls, to protect the often sensitive or classified data being transmitted. 4.6. Quality of Service The network should support Quality of Service (QoS) mechanisms to prioritise traffic, ensuring that critical HPC tasks receive the necessary bandwidth and low-latency performance. Congestion control mechanisms ensures that data transfers between nodes and across networks are efficient and do not overwhelm the HPC network infrastructure. By managing and regulating the flow of data, King, et al. Expires 1 March 2025 [Page 9] Internet-Draft HP-WAN STATE OF ART August 2024 congestion control mechanisms help prevent bottlenecks, reduce latency, and maintain high throughput, which are essential for the performance and reliability of HPC applications that require the rapid movement of large volumes of data across distributed systems. Depending on the transport technology used in the HPC enviroment, several congestion control schemes may be use: * InfiniBand Congestion Control * RDMA-based Data Center Quantized Congestion Notification (DCQCN) * TCP-based Bottleneck Bandwidth and Round-Trip Time (BBRv3) * Explicit Congestion Protocol (XCP) 4.7. Performance Monitoring End-to-end performance measurement and monitoring across multi- domains and network infrastructures are important in HPC environments. They provide a method to diagnose and troubleshoot network performance issues that can affect data-intensive applications and distributed computing tasks commonly found in HPC. PerfSONAR is a network measurement toolkit commonly used. It is designed to provide federated coverage of network paths. It provides an interface that allows for the scheduling of measurements, storage of data and generate visualisations. 4.8. Scalability Scalability is another crucial aspect, allowing the network to expand efficiently as computational needs grow, accommodating additional sites or increased capacity without significant reconfiguration. Interoperability is also necessary, ensuring that the network can communicate seamlessly across different types of hardware, software, and protocols used at various HPC sites. 4.9. Resource Scheduling [Editor's Note - Do we need to discuss service and resource scheduling?] 5. Examples of HP-WANs The following sub-sections highlight examples of HP-WANS, and their technical specifications. King, et al. Expires 1 March 2025 [Page 10] Internet-Draft HP-WAN STATE OF ART August 2024 5.1. GÉANT The GÉANT network is a pan-European data network dedicated to research and education, providing high-speed, high-capacity connectivity across Europe, between European NRENs and to other worldwide NRENs. It is an essential infrastructure for HPC applications, enabling collaboration and data sharing among research institutions, universities, and HPC centers across the continent and beyond. The core of GÉANT operates at speeds of up to 600 Gbps, using Dense Wavelength Division Multiplexing (DWDM) technology. This provides connectivity suitable for HPC applications, particularly those involving large-scale simulations, scientific research, and real-time data processing. Reliability is provided by using multiple optical underlay paths for data to travel between GÉANT nodes. This design ensures high availability and reliability, which is crucial for the continuous operation of HPC environment. The GÉANT network integrates PerfSONAR for real-time network performance monitoring, allowing HPC users to detect and troubleshoot potential issues that could impact data transfer and overall performance. This ensures that the high-performance requirements of HPC applications are met consistently across the network. GÉANT provides specialized services for specific HPC projects, such as the LHC Optical Private Network (LHCOPN) and LHC Open Network Environment (LHCONE), which are critical for supporting the data- intensive needs of the Large Hadron Collider (LHC) at CERN. These services offer dedicated, high-bandwidth connections that are optimised for the massive data flows generated by LHC experiments. The GÉANT network connects over 50 million users across more than 10,000 institutions in 40 countries. This extensive reach supports a wide range of HPC applications by enabling seamless collaboration between geographically dispersed research facilities. Beyond Europe, GÉANT connects to other major research and education networks, including Internet2 in the United States and CANARIE in Canada, allowing for global HPC collaborations and data exchanges. 5.2. Janet The Janet network is the UK NREN, operated by Jisc. First established in 1984, backbone links now run at up to 800Gbps, with a growing number of sites connected at 100Gbps, in some cases with multiple 100G links. A typical university site will have multiple 10G links. King, et al. Expires 1 March 2025 [Page 11] Internet-Draft HP-WAN STATE OF ART August 2024 Janet connects to other RE networks via a 400G resilient link to GÉANT. It has a presence in multiple IXes, predominantly LINX, connects/peers directly to many content and cloud providers, and has commodity connectivity via Tier1 ISPs. The total aggregate external capacity is around 4-5 Tbit/s. Some private, dedicated optical links are used by Janet sites, e.g., the CERN to RAL (UK Tier 1 site) LHCOPN link, which is a 200G path. 5.3. Energy Sciences Network TBA 5.4. Internet2 TBA 5.5. CANARIE TBA 5.6. Asia-Pacific Advanced Network TBA 6. Emerging Trends and Future Directions To be discussed. 7. IANA Considerations This document makes no requests for action by IANA. 8. Security Considerations The security requirements for HPC networks, particularly in inter- data center scenarios, are crucial to ensuring the integrity, confidentiality, and availability of sensitive data and computational resources. These requirements are stringent due to the high-value and often sensitive nature of the data processed within HPC systems, such as research data in fields like national defense, pharmaceuticals, and climate science. 9. Acknowledgements This document was in part motivated by the discussion occuring on the IETF hp-wan@ietf.org mailing list. King, et al. Expires 1 March 2025 [Page 12] Internet-Draft HP-WAN STATE OF ART August 2024 10. Normative References 11. Informative References Authors' Addresses Daniel King Lancaster University Email: d.king@lancaster.ac.uk Tim Chown Jisc Email: tim.chown@jisc.ac.uk Chris Rapier Pittsburgh Supercomputing Center Email: rapier@psc.edu Daniel Huang ZTE Corporation Email: huang.guangping@zte.com.cn King, et al. Expires 1 March 2025 [Page 13]