Network Working Group                                            D. King
Internet-Draft                                      Lancaster University
Intended status: Informational                                  T. Chown
Expires: 1 March 2025                                               Jisc
                                                               C. Rapier
                                        Pittsburgh Supercomputing Center
                                                                D. Huang
                                                         ZTE Corporation
                                                          28 August 2024


    Current State of the Art for High Performance Wide Area Networks
                   draft-kcrh-state-of-art-hp-wan-00

Abstract

   High Performance Wide Area Networks (HP-WANs) represent a critical
   infrastructure for the modern global research and education
   community, facilitating collaboration across national and
   international boundaries.  These networks, such as Janet, ESnet,
   GÉANT, Internet2, CANARIE, and others, are designed to support the
   general needs of the research and education users they serve but also
   the the transmission of vast amounts of data generated by scientific
   research, high-performance computing, distributed AI-training and
   large-scale simulations.

   This document provides an overview of the terminology and techniques
   used for existing HP-WANS.  It also explores the technological
   advancements, operational tools, and future directions for HP-WANs,
   emphasising their role in enabling cutting-edge scientific research,
   big data analysis, AI training and massive industrial data analysis.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 1 March 2025.


King, et al.              Expires 1 March 2025                  [Page 1]

Internet-Draft             HP-WAN STATE OF ART               August 2024


Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Background  . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Example Use Cases for HP-WANs . . . . . . . . . . . . . . . .   5
   4.  Current Technologies Used in HP-WANs: Key Components  . . . .   6
     4.1.  Topology  . . . . . . . . . . . . . . . . . . . . . . . .   7
     4.2.  Bandwidth and Latency . . . . . . . . . . . . . . . . . .   8
     4.3.  Data Movement Protocols . . . . . . . . . . . . . . . . .   8
     4.4.  Forwarding Optimisation . . . . . . . . . . . . . . . . .   8
     4.5.  Reliability . . . . . . . . . . . . . . . . . . . . . . .   9
     4.6.  Quality of Service  . . . . . . . . . . . . . . . . . . .   9
     4.7.  Performance Monitoring  . . . . . . . . . . . . . . . . .  10
     4.8.  Scalability . . . . . . . . . . . . . . . . . . . . . . .  10
     4.9.  Resource Scheduling . . . . . . . . . . . . . . . . . . .  10
   5.  Examples of HP-WANs . . . . . . . . . . . . . . . . . . . . .  10
     5.1.  GÉANT . . . . . . . . . . . . . . . . . . . . . . . . . .  11
     5.2.  Janet . . . . . . . . . . . . . . . . . . . . . . . . . .  11
     5.3.  Energy Sciences Network . . . . . . . . . . . . . . . . .  12
     5.4.  Internet2 . . . . . . . . . . . . . . . . . . . . . . . .  12
     5.5.  CANARIE . . . . . . . . . . . . . . . . . . . . . . . . .  12
     5.6.  Asia-Pacific Advanced Network . . . . . . . . . . . . . .  12
   6.  Emerging Trends and Future Directions . . . . . . . . . . . .  12
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  12
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  12
   10. Normative References  . . . . . . . . . . . . . . . . . . . .  13
   11. Informative References  . . . . . . . . . . . . . . . . . . .  13
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  13


King, et al.              Expires 1 March 2025                  [Page 2]

Internet-Draft             HP-WAN STATE OF ART               August 2024


1.  Introduction

   High Performance Wide Area Networks (HP-WANs) are the backbone of
   global research and education infrastructure, enabling the seamless
   transfer of vast amounts of data and supporting advanced scientific
   collaborations worldwide.  These networks are designed to meet the
   demanding requirements of data-intensive research fields, including
   high-energy physics, climate modeling, genomics, and artificial
   intelligence.

   The evolution of HP-WANs is deeply intertwined with the growing need
   for advanced scientific research and the increasing globalisation of
   collaboration.  Traditional WANs, which were sufficient for general
   business and communication needs, quickly became inadequate for the
   specialised requirements of research institutions.  As scientific
   endeavours began to generate larger datasets, ranging from terabytes
   to petabytes, there arose a need for networks capable of transferring
   these massive volumes of data reliably and securely across large
   distances.

   The first HP-WANs emerged as specialised research networks, such as
   ESnet in the United States, Janet in the UK, and GÉANT in Europe,
   developed to support the unique needs of the scientific community.
   These networks were designed to provide high bandwidth and ensure low
   latency, high reliability, and robust security, which are critical
   for applications like real-time data analysis, distributed computing,
   and remote instrumentation.

   Today, HP-WANs are foundational to the research community and are
   leading the way in demonstrating how advanced networking technologies
   can be applied to other sectors.  They serve as testbeds for
   innovations in networking that eventually trickle down to broader
   commercial applications.  As we look toward the future, HP-WANs will
   continue to play a critical role in enabling scientific discoveries
   and fostering international collaboration, particularly as emerging
   technologies such as quantum computing and the Internet of Things
   (IoT) push the boundaries of what these networks must support.

   This document explores the current state of the art in HP-WANs,
   examining the technological advancements, operational challenges, and
   emerging trends shaping the future of networks built for research,
   education, massive data analysis and collaborative AI training at
   scale and speed.  Through this exploration, we aim to provide a
   better understanding of the current state of the art in high
   performance computing across wide area networking.


King, et al.              Expires 1 March 2025                  [Page 3]

Internet-Draft             HP-WAN STATE OF ART               August 2024


1.1.  Background

   [Editor's note - to add a historical development of HP-WANs
   description.]

   [Editor's note - to add description of the role of HP-WANs in
   supporting scientific research and education.]

2.  Terminology

   This document provides a lexicon terminology that relates to high
   performance WANs.

   CERN:  The European Organization for Nuclear Research, housing the
      Large Hadron Collider (LHC).

   High Performance Computing (HPC):  Is a general term for computing
      with a high level of performance.  Often high performance
      computing specifically refers to running jobs which are very
      parallel, often running on hundreds or even thousands of cores.

   High Performance Wide Area Network (HP-WAN):  A type of Wide Area
      Network (WAN) designed specifically to meet the high-speed, low-
      latency, and high-capacity needs of scientific research,
      education, and data-intensive applications.  These networks
      connect research institutions, universities, and data centers
      across large geographical areas.

   Infiniband:  Traditionally, a localised data interconnect used by
      many high performance computing (HPC) systems providing high
      bandwidth and low latency.

   National Research and Education Network (NREN):  A specialised
      network supporting the research and education community within a
      specific country or region.  NRENs provide high-speed connectivity
      and other services tailored to the needs of academic and research
      institutions.

   Remote direct memory access (RDMA):  Enables one networked node to
      access another networked nodes's memory without involving either
      computer's operating system or interrupting either nodes's
      processing.  This helps minimise latency and maximise throughput,
      reducing memory bandwidth bottlenecks.

   RDMA over Converged Ethernet (RoCE):  Traditionally, a network
      protocol which allows remote direct memory access (RDMA) over a
      local Ethernet network.  There are multiple RoCE versions.  RoCE
      v1 is an Ethernet link layer protocol and hence allows


King, et al.              Expires 1 March 2025                  [Page 4]

Internet-Draft             HP-WAN STATE OF ART               August 2024


      communication between any two hosts in the same Ethernet broadcast
      domain.  RoCE v2 is an internet layer protocol which means that
      RoCE v2 packets can be routed.

   Worldwide LHC Computing Grid (WLCG):  Is a global network of over 170
      computing centres across more than 40 countries, designed to
      process, store, and analyse the vast amounts of data generated by
      the Large Hadron Collider (LHC) at CERN.

   Performance Service Oriented Network monitoring
   Architecture(PerfSONAR):  Is a network performance monitoring toolkit
      designed to provide end-to-end performance measurement and
      monitoring across multi-domain network infrastructures.

   Science DMZ:  A model for deployment of infrastructure at a site
      (campus) to optimise the performance of data transfers in and out
      of data transfer nodes (DTNs) at the site – see
      https://fasterdata.es.net/science-dmz/. Elements of the model
      include the local network architecture, tuning of DTNs, choice of
      data transfer software, efficient security policy implementation
      and persistent monitoring.

 
3.  Example Use Cases for HP-WANs

   HP-WAN applications have become synonymous with large scale research
   and experimentation, big data, and AI.  HPC and therefore HP-WAN, is
   driving continuous innovation in use cases across the following
   industries.

   *  High-Energy Physics Research, e.g., the Large Hadron Collider
      (LHC)

   *  Climate Modeling

   *  Radioastronomy, e.g., the Square Kilometer Array (SKA) project

   *  Healthcare, Genomics and Life Sciences

   *  AI training

   *  Media Content Creation

   *  Government and Defence


King, et al.              Expires 1 March 2025                  [Page 5]

Internet-Draft             HP-WAN STATE OF ART               August 2024


   The data rates required by HPC applications vary significantly based
   on the application type and data scale.

   Scientific simulations, such as climate modeling and molecular
   dynamics, typically demand data rates from 10 Gbps to over 100 Gbps
   due to the large volumes of data processed and moved between nodes
   and storage systems.

   In high-energy physics, such as experiments at CERN, data rates can
   reach hundreds of gigabits per second, with aggreagte peaks between
   site exceeding 1 Tbps currently, and predicted to rise to 10 Tbps,
   during intensive data processing.

   Healthcare, Genomics, and Life Sciences might typically operate at
   rates between 1 Gbps and 40 Gbps.  These applications require high
   throughput to handle large datasets efficiently, often through
   parallel data streams.

   AI learning and tasks, particularly those involving deep learning,
   require data rates ranging from 10 Gbps to 100 Gbps to ensure
   efficient data movement, keeping GPUs and other accelerators fully
   utilised.

   These varying data rates underscore the high demands of HPC
   applications, which are expected to grow as the field evolves and
   datasets become larger.

4.  Current Technologies Used in HP-WANs: Key Components

   High Performance Computing (HPC) networks are specialised networks
   designed to connect supercomputers and other high-performance
   computing resources, enabling them to collaborate on computational
   tasks that require significant processing power, memory, and data
   storage.  These networks are essential for facilitating large-scale
   scientific research, complex simulations, and data-intensive tasks
   beyond standard computing systems' capabilities.

   The following sub-sections outline typical characterics and
   requirements for HP-WANs.  These technical requirements ensure that
   wide-area interconnects can meet the demanding needs of distributed
   HPC environments, enabling researchers and scientists to collaborate
   effectively across the globe.


King, et al.              Expires 1 March 2025                  [Page 6]

Internet-Draft             HP-WAN STATE OF ART               August 2024


4.1.  Topology

   HPC networks can be broadly categorised into intra-site networks,
   which connect components within a single HPC site, such as a data
   centre, and inter-site networks, which link multiple HPC sites across
   different geographical locations.  Intra-site networks typically use
   high-speed, low-latency interconnects like InfiniBand or high-speed
   Ethernet.  In contrast, inter-site networks rely on dedicated high-
   capacity wide area networks (WANs) to facilitate distributed
   computing and data sharing on a regional and global scale.

   Each NREN operator, e.g., Jisc in the case of Janet in the UK, will
   build and operate the NREN infrastructure for its research and
   education users.  This may typically take the form of a well-
   provisioned backbone, with regional access networks extending to the
   end sites (campuses, research organisations, etc).  The NREN
   demarcation is typically at the campus edge.  In some countries the
   regional networks are separately operated.

   The NRENs then typically have interconnects to other NRENs, forming a
   worldwide RE network infrastructure.  In Europe, GÉANT provides
   connectivity between the European NRENs and then wider connectivity
   to the rest of the world.  And NRENs will have other interconnects to
   non-RE networks, e.g., via one or more national IXs, direct peerings
   to content providers (including the big cloud providers) and then
   "catch-all" commodity connectivity via one or more Tier 1 ISPs.

   Dedicated infrastructure is commonly used in HPC environments where
   performance, security, and reliability are paramount.  In these
   cases, the network infrastructure is built exclusively for HPC
   applications, including dedicated fibre-optic connections, private
   data centres, and specialised network transport like RDMA over
   Converged Ethernet (RoCE) and InfiniBand nodes.  The primary benefits
   of dedicated infrastructure are its ability to provide optimised
   performance for HPC tasks, ensure high levels of security by
   preventing unauthorised access, and maintain consistent reliability
   by avoiding congestion or performance issues caused by other network
   traffic.

   Usually, the responsibility for networking within an end site or
   campus lies with that organisation, e.g., a university IT department,
   while the operation of an HPC facility may have dedicated (separate)
   staff.  With the additional administrative domains of the NRENs and
   inter-NREN backbones like GÉANT, end-to-end traffic may pass through
   many networks operated by different organisations.  To achieve
   optimal e2e performance, everyone needs to implement best practice.


King, et al.              Expires 1 March 2025                  [Page 7]

Internet-Draft             HP-WAN STATE OF ART               August 2024


4.2.  Bandwidth and Latency

   The technical requirements for wide area interconnects between HPC
   sites are stringent, given the unique demands of distributed high-
   performance computing.  High bandwidth is a primary requirement, as
   these interconnects must support the rapid transfer of large datasets
   between sites, ensuring that data movement does not become a
   bottleneck in computational workflows.  HPC data flows might typical
   consume 1Gbit to beyond 400GBit/s.

   Low latency is equally critical, as many HPC applications.  Latency
   requirements for inter-DC locations will be in the low-millisecond
   range.  This low latency is essential for applications that require
   real-time or near-real-time data processing.

4.3.  Data Movement Protocols

   Network-intensive applications like networked storage or cluster
   computing need a network infrastructure with high bandwidth and low
   latency.

   These interconnects may need to support specialised communication
   protocols designed for HPC environments, such as Remote Direct Memory
   Access (RDMA), which optimises the performance of distributed HPC
   applications by reducing overhead and improving data transfer
   efficiency.

   InfiniBand (IB) is another computer networking communications
   standard used in high-performance computing that features very high
   throughput and very low latency.  InfiniBand is also used as either a
   direct or switched interconnect between servers and storage systems,
   as well as an interconnect between storage systems.

   The advantages of RDMA and IB over other network application
   programming interfaces, are lower latency, CPU load, and bandwidth.
   The downside with these specialised protocols is the need for all
   interfaces and nodes to support the technique on the end-to-end path.

   [Editors Note - Do we need to discuss iWARP?]

4.4.  Forwarding Optimisation

   The scaling of HPC applications, especially across a WAN between
   multiple sites, requires the ability to route the massive traffic.
   Specifically, this requires network infrastructure to provide several
   routing and forwarding characteristics, detailed below.


King, et al.              Expires 1 March 2025                  [Page 8]

Internet-Draft             HP-WAN STATE OF ART               August 2024


   *  Low entropy: Compared to traditional data center workloads, the
      number and the diversity of flows for workloads and flow patterns
      are usually repetitive and predictable.

   *  Burstiness: Flows usually exhibit the “on and of”’ nature in the
      time granularity of milliseconds.

   *  Jumbo frames: Ethernet frames larger than the standard maximum
      transmission unit (MTU) size of 1,500 bytes, typically carrying
      payloads of up to 9,000 bytes.  Using jumbo frames can
      significantly enhance network efficiency and reduce CPU overhead.

   *  Elephant flows: For each burst, the intensity of each flow could
      reach up to the line rate of NICs.

   It should be noted that efficiently handling these elephant flows is
   crucial in HPC as they can otherwise saturate network links, leading
   to congestion and reduced performance for other network traffic.
   Strategies to manage elephant flows effectively, such as prioritising
   these flows or segmenting network traffic, help maintain overall
   network performance and ensure that large data transfers do not
   hinder the execution of other critical tasks within the HPC
   environment.

   HPC transport options include IP (both UDP and TCP), and emerging
   mechanisms such as QUIC.  However, each transport technology provides
   strengths and weaknesses.  In all cases, the primary goal is to
   ensure the effective high-throughput, low latency abd jitter, low-
   packet loss ratio, transmission of massive data sets.

4.5.  Reliability

   Reliability and redundancy are essential to prevent data loss and
   ensure continuous operation, especially given that HPC tasks often
   run for extended periods and involve critical research.  These
   networks must also incorporate advanced security measures, including
   encryption and secure access controls, to protect the often sensitive
   or classified data being transmitted.

4.6.  Quality of Service

   The network should support Quality of Service (QoS) mechanisms to
   prioritise traffic, ensuring that critical HPC tasks receive the
   necessary bandwidth and low-latency performance.

   Congestion control mechanisms ensures that data transfers between
   nodes and across networks are efficient and do not overwhelm the HPC
   network infrastructure.  By managing and regulating the flow of data,


King, et al.              Expires 1 March 2025                  [Page 9]

Internet-Draft             HP-WAN STATE OF ART               August 2024


   congestion control mechanisms help prevent bottlenecks, reduce
   latency, and maintain high throughput, which are essential for the
   performance and reliability of HPC applications that require the
   rapid movement of large volumes of data across distributed systems.

   Depending on the transport technology used in the HPC enviroment,
   several congestion control schemes may be use:

   *  InfiniBand Congestion Control

   *  RDMA-based Data Center Quantized Congestion Notification (DCQCN)

   *  TCP-based Bottleneck Bandwidth and Round-Trip Time (BBRv3)

   *  Explicit Congestion Protocol (XCP)

4.7.  Performance Monitoring

   End-to-end performance measurement and monitoring across multi-
   domains and network infrastructures are important in HPC
   environments.  They provide a method to diagnose and troubleshoot
   network performance issues that can affect data-intensive
   applications and distributed computing tasks commonly found in HPC.

   PerfSONAR is a network measurement toolkit commonly used.  It is
   designed to provide federated coverage of network paths.  It provides
   an interface that allows for the scheduling of measurements, storage
   of data and generate visualisations.

4.8.  Scalability

   Scalability is another crucial aspect, allowing the network to expand
   efficiently as computational needs grow, accommodating additional
   sites or increased capacity without significant reconfiguration.
   Interoperability is also necessary, ensuring that the network can
   communicate seamlessly across different types of hardware, software,
   and protocols used at various HPC sites.

4.9.  Resource Scheduling

   [Editor's Note - Do we need to discuss service and resource
   scheduling?]

5.  Examples of HP-WANs

   The following sub-sections highlight examples of HP-WANS, and their
   technical specifications.


King, et al.              Expires 1 March 2025                 [Page 10]

Internet-Draft             HP-WAN STATE OF ART               August 2024


5.1.  GÉANT

   The GÉANT network is a pan-European data network dedicated to
   research and education, providing high-speed, high-capacity
   connectivity across Europe, between European NRENs and to other
   worldwide NRENs.  It is an essential infrastructure for HPC
   applications, enabling collaboration and data sharing among research
   institutions, universities, and HPC centers across the continent and
   beyond.

   The core of GÉANT operates at speeds of up to 600 Gbps, using Dense
   Wavelength Division Multiplexing (DWDM) technology.  This provides
   connectivity suitable for HPC applications, particularly those
   involving large-scale simulations, scientific research, and real-time
   data processing.  Reliability is provided by using multiple optical
   underlay paths for data to travel between GÉANT nodes.  This design
   ensures high availability and reliability, which is crucial for the
   continuous operation of HPC environment.

   The GÉANT network integrates PerfSONAR for real-time network
   performance monitoring, allowing HPC users to detect and troubleshoot
   potential issues that could impact data transfer and overall
   performance.  This ensures that the high-performance requirements of
   HPC applications are met consistently across the network.

   GÉANT provides specialized services for specific HPC projects, such
   as the LHC Optical Private Network (LHCOPN) and LHC Open Network
   Environment (LHCONE), which are critical for supporting the data-
   intensive needs of the Large Hadron Collider (LHC) at CERN.  These
   services offer dedicated, high-bandwidth connections that are
   optimised for the massive data flows generated by LHC experiments.

   The GÉANT network connects over 50 million users across more than
   10,000 institutions in 40 countries.  This extensive reach supports a
   wide range of HPC applications by enabling seamless collaboration
   between geographically dispersed research facilities.  Beyond Europe,
   GÉANT connects to other major research and education networks,
   including Internet2 in the United States and CANARIE in Canada,
   allowing for global HPC collaborations and data exchanges.

5.2.  Janet

   The Janet network is the UK NREN, operated by Jisc.  First
   established in 1984, backbone links now run at up to 800Gbps, with a
   growing number of sites connected at 100Gbps, in some cases with
   multiple 100G links.  A typical university site will have multiple
   10G links.


King, et al.              Expires 1 March 2025                 [Page 11]

Internet-Draft             HP-WAN STATE OF ART               August 2024


   Janet connects to other RE networks via a 400G resilient link to
   GÉANT.  It has a presence in multiple IXes, predominantly LINX,
   connects/peers directly to many content and cloud providers, and has
   commodity connectivity via Tier1 ISPs.  The total aggregate external
   capacity is around 4-5 Tbit/s.

   Some private, dedicated optical links are used by Janet sites, e.g.,
   the CERN to RAL (UK Tier 1 site) LHCOPN link, which is a 200G path.

5.3.  Energy Sciences Network

   TBA

5.4.  Internet2

   TBA

5.5.  CANARIE

   TBA

5.6.  Asia-Pacific Advanced Network

   TBA

6.  Emerging Trends and Future Directions

   To be discussed.

7.  IANA Considerations

   This document makes no requests for action by IANA.

8.  Security Considerations

   The security requirements for HPC networks, particularly in inter-
   data center scenarios, are crucial to ensuring the integrity,
   confidentiality, and availability of sensitive data and computational
   resources.  These requirements are stringent due to the high-value
   and often sensitive nature of the data processed within HPC systems,
   such as research data in fields like national defense,
   pharmaceuticals, and climate science.

9.  Acknowledgements

   This document was in part motivated by the discussion occuring on the
   IETF hp-wan@ietf.org mailing list.


King, et al.              Expires 1 March 2025                 [Page 12]

Internet-Draft             HP-WAN STATE OF ART               August 2024


10.  Normative References

11.  Informative References

Authors' Addresses

   Daniel King
   Lancaster University
   Email: d.king@lancaster.ac.uk


   Tim Chown
   Jisc
   Email: tim.chown@jisc.ac.uk


   Chris Rapier
   Pittsburgh Supercomputing Center
   Email: rapier@psc.edu


   Daniel Huang
   ZTE Corporation
   Email: huang.guangping@zte.com.cn


King, et al.              Expires 1 March 2025                 [Page 13]