Internet-Draft | Network Performance Digital Twin | October 2023 |
Cabellos & Janz | Expires 25 April 2024 | [Page] |
This draft introduces the concept of a Network Digital Twin (NDT), including the architecture as well as the interfaces. Then two specific instances of the NDT are introduced, the first one for packet networks. This produces performance estimates (delay, jitter, loss) for a packet network with a specified topology, traffic demand, and routing and scheduling configuration. Second, a NDT for optical networks, this produces transmission performance estimates of an optical network with specified optical service topologies and network equipment types, topology and status.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 25 April 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
A Digital Twin for computer networks is a virtual replica of an existing network with a behavior equivalent to that of the real one. The key advantage of a Network Digital Twin (NDT) is the ability to recreate the complexities and particularities of the network infrastructure without the deployment cost of a real network. Hence, network administrators can test, deploy and modify network configurations safely, without worrying about the impact on the real network. Once the administrator has found a configuration that fulfills the expected objectives, it can be deployed to the real network. The information provided by the NDT can also be used as part of a closed loop-based automated process. In addition, using a NDT is faster, safer and more cost-effective than interacting with the physical network. All these characteristics make NDT useful for different network management tasks ranging from network planning or troubleshooting to optimization.¶
The concept of a NDT has been proposed for different approaches: 5G networks [digital-twin-5G], Vehicular networks [digital-twin-vanets], artificial intelligence [digital-twin-AI], or Industry 4.0 [digital-twin-industry], among others.¶
This draft introduces the Network Digital Twin (NDT), defining a NDT architecture as well as the interfaces and related with other network components.¶
The usefulness of this architecture is tested by introducing two particular instances of the general NDT paradigm, the first one is the Packet Network Digital Twin (PNDT) that given the following input parameters (topology, traffic matrix, etc.) predicts network performance metrics such as delay (per path or per link), jitter, or loss. The second is an Optical Network Digital Twin (ONDT), given specified optical service topologies and network equipment types, topology and status, an ONDT estimates transmission performance metrics, such as optical channel terminal powers and margins, in the face of channel transmission noise and impairments [ONDT].¶
For both instances of the NDT, the draft defines its associated interfaces to other modules in the network management or control plane as well as specific use-cases.¶
This draft further discusses possible implementation options for the NDT, with a special emphasis on those based on Machine Learning. The aim of Section 8 (Implementation Challenges) is, in part, to describe the advantages and limitations of these techniques. For example, most Machine Learning technologies rely heavily on large amounts of data to achieve acceptable accuracy. Other considerations include adjusting the architecture of the Neural Network to successfully understand the structure of the input data. Challenges particular to ONDT implementation are also discussed.¶
Figure 1 presents an overview of the generic architecture of a Network Digital Twin (NDT).¶
|Administrator Interfaces, |Service Demand Interfaces, |Intent-Based Interfaces, |Associated Application Interfaces, etc. | +-------------------------------------------+ | | | | | | +-------------+ | | DT | | | |Interface| | | Management Plane |<------->| Network | | | | Digital | | + Embedded Applications |-------->| Twin | | | Network | | | | Inform. +-------------+ | |Interface | | +-------------------------------------------+ | | Measurement | | Configuration Interface | | Interface | | +--------------------------------------+ | | | Physical Network | | | +--------------------------------------+¶
Individual elements are discussed further in following sections of this draft. But overall, the NDT represented in Figure 1 is an information-oriented "utility". It works in concert with a Management Plane and the latter's "embedded" applications or components (i.e. applications or components comprising a part of the Management Plane implementation), as well as any "associated" applications (i.e. applications working in interaction with the Management Plane implementation).¶
The essential function of the NDT is to estimate - and then to present at the Digital Twin (DT) interface - particular, targeted performance-oriented behaviours of the physical network, assessed in scenarios that are defined by information presented at the DT Interface. Optionally, complementary information used by the behavioural models may be provided by the Network Information Interface. The DT Interface is a sort of "run-time" interface while the Network Information Interface, if present, serves as a conduit for e.g. measurement data streamed from the physical network to the management plane, or network or service topology information generated within the management plane itself. As a rule, the Network Information Interface, if present, provides to the NDT the broad set of changing information necessary to construct an accurate and up-to-date replica of the physical network for behavioural modeling purposes.¶
The Management Plane may have interfaces to administrator, service demand and/or intent generating systems, etc. It also has configuration or control interfaces to the physical network. Configuration or control inputs do not flow directly from the NDT to the network: rather, the NDT provides information to the Management Plane and the latter’s embedded and associated applications, components and processes - potentially including closed loop-based automated processes - may generate network-facing configuration or control outputs. The information provided by the NDT may thus be used in evaluation, decision, etc. procedures that serve to optimize operational outcomes, whether or not such procedures form part of closed loop-based automated processes. The particular nature of such operations, decisions etc. of NDT outputs is what defines and distinguishes various use cases. Examples of use cases are considered in Section 4.¶
This interface can be a simple CLI or a state-of-the-art GUI, depending on the final product. In summary, it has to offer the network administrator the following options/features:¶
This interface is used to configure the Physical Network with the configuration parameters obtained from the optimizer. It can be composed of one or more IETF protocols for network configuration, a non-exhaustive list is: NETCONF [RFC6241], RESTCONF/YANG [RFC8040], PCE [RFC4655], OVSDB [RFC7047], or LISP [RFC6830]. It is also possible to use other standards defined outside the IETF that allow the configuration of elements in the forwarding plane, e.g. OpenFlow [OFspec] or P4 Runtime [P4Rspec].¶
This interface can be defined with any widespread data format, such as CSV files or JSON objects. The goal of this interface is to exrpess:¶
Inputs: data sent to the NDT to calculate the performance estimates. This can represent the network topology, physical aspects of the network devices, traffic demands, etc.¶
Outputs: performance estimates of the NDT. This represents the performance of the network when configured by the input parameters.¶
The NDT is a unique tool for performing what-if analysis; that is, for analyzing the impact of potential scenarios and configurations safely without any impact on the real network. In this context, the NDT acts as a safe sandbox wherein, different configurations and scenarios may be presented to the NDT in order to understand their prospective impacts on physical network behaviours. It might be said that the role of the NDT is always to perform a what-if behavioural analysis, as its intrinsic function is to assess aspects of network or service performance in scenarios that may be partly or entirely hypothetical. Nevertheless, a variety of uses of NDTs beyond the specific use cases already covered, and which fit the what-if scenario analysis description, may be imagined. Some examples of such cases include:¶
There are many factors that cause network failures (e.g., invalid network configurations, unexpected protocol interactions). Debugging modern networks is complex and time consuming. Currently, troubleshooting is typically done by human experts with years of experience using networking tools.¶
Network operators can leverage a NDT to reproduce previous network failures, in order to find the source of service disruptions. Specifically, network operators can replicate past network failure scenarios and analyze their impact on network performance, making it easier to find specific configuration errors. In addition, the NDT helps in finding more robust network configurations that prevent service disruptions in the future.¶
Since the NDT models the behaviour of a real-world network, network operators have access to an estimation of the expected network behaviour. When the real-world network behaviour deviates from the NDT's behaviour, it can act as an indicator of an anomaly in the real-world network. Such anomalies can appear at different places in a network (e.g. core, edge, IoT), and different data sources can be used to detect such anomalies. Further, trial-and-error modifications to the scenario assessed by the NDT – e.g. to network or component configuration or status – can assist with anomaly troubleshooting if convergence between predicted and observed performance can be thus obtained. The detailed differences between the scenario found to produce such convergence, and the actual physical network scenario, may help to indicate the source of the anomaly.¶
As discussed before, the NDT can be understood as a safe playground where misconfigurations do not affect the real-world system performance. In this context, the NDT can play an important role in improving the education and certification process of network professionals, both in basic networking training and advanced scenarios. For example:¶
Figure 2 presents an overview of the architecture of a Packet Network Digital Twin (PNDT). This PNDT is an instance of the NDT used in the context of IP networks.¶
Administrator Intent | | |Intent-Based Interface | | +-------------+-----------------------------+ | | | | | | Intent-Based Optimizer | | | Renderer | +-------------+ | | | DTI | Packet | | Management | |Interface| Network | | Plane | |<------->| Digital | | | | | Twin | | | | | | | | Measure Configure | +-------------+ | | | | | +-------------+-----------------------------+ | | | | Measurement | | Configuration Interface | | Interface | | +--------------------------------------+ | | | Physical Packet Network | | | +--------------------------------------+¶
Each element is defined as:¶
And the functions of each interface are:¶
The size and traffic of networks has doubled every year [network-capacity]. To accommodate this growth in users and network applications, networks need periodical upgrades. For example, ISPs might be willing to increase certain link capacities or add new connections to alleviate the burden on the existing infrastructure. This is typically a cumbersome process that relies on expert knowledge. Furthermore, modern networks are becoming larger and more complex, thus exacerbating the difficulty of existing solutions to scale to larger networks [planning-scalability].¶
Since the PNDT models large infrastructures and can produce accurate and fast performance estimates, it can help in different tasks related to network capacity and planning:¶
Since the DT can provide performance estimates in short timescales, it is possible to pair it with a network optimizer (Figure 3). The network administrator defines one or more optimization objectives e.g. maximum average delay for all paths in the network. The optimizer can be implemented with a classical optimization algorithm, like Constraint Programming [DEFO], or Local Search [LS], or a Machine-Learning one, such as Deep Neural Networks [DNN-TM], or Multi-Agent Reinforcement Learning [MARL-TE]. Regardless of the implementation, the optimizer tests various configurations to find the network configuration parameters that satisfy the optimization objectives. In order to know the performance of a specific network configuration, the optimizer sends such configuration to the PNDT, that predicts the performance metrics of such configuration.¶
+------------+ Candidate +-------------+ | | Network Config. | Packet | Optimization----> | Network |------------------->| Network | objectives | Optimizer | | Digital | | |<-------------------| Twin | +------------+ Estimated +-------------+ | Performance | | v Optimized Network Configuration¶
An example of optimization use case would be multi-objective optimization scenarios: commonly, the network administrator defines a set of optimization goals that must be concurrently met [DEFO], for example:¶
Figure 4 presents an overview of the Optical Network Digital Twin (ONDT).¶
|Service Demand Interfaces, |Intent-Based Interface, |Associated Application Interfaces, etc. | +-------------------------------------------+ | | | | | DT +-------------+ | Management Plane |Interface| | | |<------->| Optical | | + Embedded Applications | | Network | | | | Digital | | |-------->| Twin | | | Network | | | | Inform. +-------------+ | |Interface +-------------------------------------------+ | | | | Measurement | | Configuration Interface | | Interface | | +--------------------------------------+ | | | Physical Optcial Network | | | +--------------------------------------+¶
The elements shown are defined as follows:¶
The functions of the respective interfaces are:¶
The combination of the NII and a flexibly-defined DTI, enables optical transmission performance to be assessed in respect of any scenario, ranging from: wholly defined by the actual (or, some historical) state and status of the physical network and services; to, an entirely hypothetical network and service state and status; or, to any scenario between these extremes. This enables the NDT to be used in a wide variety of use cases, which are discussed in detail in Section 4.¶
Per Figure 4 in Section 6, an Optical Network Digital Twin (ONDT) interfaces with a Management Plane to obtain the information related to the physical network and the scenario for which optical service performance information is sought. Again, such information is either sent to the ONDT at scenario assessment run-time through the DT Interface (DTI), or is made available to the ONDT through the Network Information Interface (NII) as stable (if evolving) configuration, state, instrumentation and other data. As discussed in Section 4, the best partitioning of information presentation between DTI and NII is to some degree use case-specific. Categories of such information include:¶
A good part of the interface specifications needed to support these information categories are available today in the IETF. For example, the ACTN framework defines a hierarchical control framework, which coupled with the various models defined in the TEAS, CCAMP, OPSAWG and other working groups, can provide the configuration and state data related to TE topology and services. The following is at least a partial list of available models applicable to the DTI and NII interfaces:¶
Data models for management configuration and optical device configurations, on the other hand, are mostly not available and need to be developed in the IETF. As a starting point, the following draft could potentially be extended to support ONDT functional requirements:¶
Management models developed in other standard organizations such as TM Forum and OpenConfig, might also be used by the ONDT. Applicable instrumentation and measurement telemetry models are for further study.¶
In an optical service planning exercise, the optical network topology and equipment map are presumed fixed while the set of provisioned optical services may be altered. In an optical network planning exercise, not only the optical service map (or some component of it) but also the network topology and deployed equipment map may be augmented or changed. Optical network planning may thus be viewed as a superset of the steps and processes associated with optical service planning. The goal of optical network planning is usually to accommodate some particular set of services or, more fundamentally, some particular transmitted traffic requirements, using the least amount or least (total or incremental) cost of equipment. Optimization is thus generally a process component of optical network planning; again, this is discussed in Section 6.2.2.¶
In general, the role of an ONDT in support of optical network planning is the same as the one described in section Section 6.2.4 supporting optical services planning: to verify that all postulated optical services would operate within acceptable performance bounds, when deployed on a postulated new network topology and detailed equipment and fibre map. As in the optical services planning case, beyond identifying scenarios in which one or more optical services would fail to operate, the identification of undesirably low or unnecessarily high optical service margins could serve as a trigger to explore alternative conjoint optical network and service plans.¶
For this use case, the appropriate input DT Interface specifies the topology and other characteristics of postulated optical services, and also the postulated new or modified network topology and map of deployed equipment. Specific such postulated scenarios are conceived by the Management Plane-embedded or -associated applications and processes that are responsible for optical network planning. These same applications and processes make use of the performance information provided by the ONDT.¶
In the case of brown field optical network planning, the physical network "twinned" is partly real and partly hypothetical. In the case of green field planning the physical network is entirely hypothetical. In practice, this means that in optical network planning, the Network Information Interface can supply to the ONDT's behavioural models only part - at best - of the information it would supply in other cases. Such information "gaps" must be filled by other means, e.g. using generic rather than specific equipment- and fibre-characterizing information.¶
As suggested in Section 6.2.4 and Section 6.2.1, optimization – finding the "best" solution as determined by some quantitative criterion that is assessed for each candidate solution – is generally an intrinsic component of optical network planning. This is because such planning usually involves trying to find e.g. a lowest total or incremental cost-of-equipment network plan. Where optical services planning considers new service demands in batches or permits re-configuration of some or all existing services, optimization of new and/or modified batches of optical services may be sought as part of the planning solution. Such optimization could involve, e.g. seeking to maximize the overall spectral efficiency of the total optical services. Such an optimization maximizes, in effect, the unused optical network capacity that remains available for further service deployment.¶
The functions of the ONDT in these cases are essentially those described in Section 6.2.4 and Section 6.2.1. First, the ONDT is used to verify that all postulated optical services would operate within acceptable performance bounds, when deployed on the existing or on a postulated new network topology and detailed equipment map. Second, the optical service margin information generated by the ONDT may flag candidate solutions that feature a large number of unnecessarily high optical service margins. Such findings reflect a general inefficiency of the candidate solutions and may be used to indicate that e.g. more spectrally efficient solutions are available and should be sought.¶
The use of the ONDT with and within optimization process architectures may be represented in ways qualitatively similar to what Figure 3 depicts in respect of NDTs in packet network optimization use cases.¶
Optical service (re-)provisioning presents operational challenges and risks. Optical service power levels and - by extension - their optical noise and other impairment characteristics, are coupled by optical amplifiers, which act collectively on transiting optical services. Optical service add, drop and change operations can thus have deleterious and non-obvious impacts across optical services, particularly in ring and mesh optical network topologies and potentially resulting in failure of added, changed or unchanged optical services.¶
An ONDT can be used to assess the optical service performances that would result from prospective optical service (re-)provisioning operations. Such information could then be used by Management Plane-embedded or -associated applications seeking to e.g. optimize add/drop/change batching and sequencing operations, or to determine optimized optical service launch powers.¶
For this use case, the appropriate input DT Interface specifies the topology of optical services postulated for the post-(re)provisioning scenario, as well as the launch powers and other characteristics (modulation, coding, spectral characteristics, etc.) defining those services. Specific scenarios thus postulated for performance assessment are conceived and determined by the applications referred to above.¶
Before optical service provisioning is attempted, proposed routes (topology) and other characteristics - launch powers, spectral allocations, modulation, coding, baud rates, etc. - must be planned for new optical services to accommodate new traffic, as must any changes to or deletions of existing optical services that may be suggested by shifting transmission traffic loads.¶
An ONDT, per the description in Section 6.2.3, can be used directly in support of an optical service planning application, which is presumed to operate as part of, or in conjunction with the Management Plane. For example, prospective new optical service plans can be validated as functional - i.e. that all services would operate within acceptable performance bounds - by the ONDT. Beyond identifying scenarios in which one or more optical services would fail to operate, the identification of undesirably low or unnecessarily high optical service margins could serve as a trigger to explore alternative plans. This suggests a linkage to optimization processes, which are discussed in Section 6.2.2¶
The ONDT can be used to assess in advance the impacts on optical services that can be expected should various “risk” scenarios materialize [ONDT]. For example, the ONDT may be used to predict the impacts on transmission performances of optical services that would be indirectly affected by particular fibre cuts. This is important, as while services that transit a cut fibre link will be interrupted, other optical services – those that co-transited uncut links along with the services that interrupted by the cut(s) - may experience changes in terminal powers and margins due to amplifier-based coupling. Where unacceptable event-driven risks to optical service performances are identified by ONDT-based analysis, solutions may be proactively sought. For example, optical service planning may be undertaken to find more resilient optical service solutions for the at-risk service instances identified.¶
An important specific use of risk mapping is in the assessment of optical service dynamic restoration solutions [ONDT]. Dynamic restoration involves pre-computing a set of failure scenario-based restoration responses. Resources are not reserved a priori for each active service; rather, restoration services are delivered as needed from a "pool" of resources. This is a more efficient restoration modality than dedicated protection, as the size of the resource pool is limited by an assumption that only a limited number of failures may happen at once. However, dynamic restoration requires ongoing re-planning of restoration solutions, as optical service maps and equipment and fibre conditions may both evolve over time, affecting restoration service performance. Risk mapping as described in Section 6.2.5 may be used on an ongoing basis to identify service risks corresponding to planned failure-restoration scenarios. When such performance risks are found, a search for new dynamic restoration plans may be triggered, with new candidate restoration solutions checked for predicted performance integrity using the ONDT.¶
Significant challenges to ONDT implementation, deployment and use relate to e.g. models and instrumentation.¶
Optical transmission performance is difficult to model accurately because the different impairments and other factors that determine performance are not easy to model with high accuracy and system specificity. For example, optical amplifiers are a key determinant of transmission behaviours and limits, but their gain and noise characteristics are complicated functions of optical service input power and spectral profiles, operational set points, etc. In addition, they vary considerably among amplifier designs, types and even instances. Yet, transmission systems may contain long chains of amplifiers, so that accurate end-to-end service modeling requires highly accurate individual amplifier models. The development of models that deliver sufficiently accurate performance predictions across operational circumstances and potentially also amplifier vendors, types etc., represents a significant challenge. Nonetheless, promising solution paths have been developed [EDFA1], [EDFA2].¶
Transmission performance prediction accuracy may be improved when the necessary scope of modeling can be reduced through enhancements in direct measurement of relevant parameters on the physical network. For example, if optical signal-to-noise ratio and other impairments can be measured directly on operating services, the available margins on those optical services is yielded directly. Although significant advances have been made in this area it will take time before such improved instrumentation features become widely deployed, and both usable and susceptible to standardization (e.g. of Measurement Interfaces).¶
NDTs should be understood as analytical utilities providing information to use-case-oriented applications; these applications drive decision-making operations in respect of particular objectives: e.g. resource optimization, fault classification, service planning, etc. Insofar as multiple such applications may make use of similar analyses and, therefore, of similar models and data, they may be served by shared NDT. However, different applications generally require, at any point in time, analyses corresponding to scenarios that differ in detail, with respect to traffic, services, network composition or condition, or other factors. A NDT concurrently serving multiple applications requires careful handling of data and models to accommodate such differences in scenario details.¶
Concurrent servicing of multiple scenario-based analyses requires, in effect, the management and execution of concurrent computational sessions. Such sessions require managed life cycles. For example, sessions may be short-lived, supporting one-time “what-if” scenario-based analyses. Alternatively, they may be long-lived, providing ongoing analysis output as the circumstances of, and on, the physical network may evolve. Applications making use of NDTs must drive these session life cycles.¶
Depending on the details of scenarios postulated for analysis by applications, NDT computational sessions may use data that corresponds fully, partly or – in some cases - not at all to data representing the current state of the physical network. However, the latter data must remain accessible to every session and uncorrupted by its differential substitution with scenario-driven data in computational sessions, even as it is continuously updated and kept current with data from the real network, management and control systems, etc.¶
NDT computational sessions may also involve different detailed functional models. For example, scenarios may involve modifications to the inventory and configuration of deployed equipment or other elements. Models in respect of particular network behaviours may be generated by composition from individual equipment or element-level models, thus driving a requirement for different network model compositions across scenario-based sessions. Further, the particular functional models invoked may differ depending on whether the analysed network is fully real – i.e. is fully deployed and potentially operating – or is partly or fully hypothetical. Better behavioural models may be available corresponding to specific instances of deployed equipment than are available when only equipment or element type or generic class or function may be specified.¶
NDTs usually (i.e. except in application such as green field planning) operate in conjunction with real, operating networks, and real networks may vary and evolve over time in a number of ways, including scale, specific equipment or network element types included, available instrumentation or other data sources, and variations in generated data types and quality. These variations may affect both available behavioural models and the data available to feed such models. The best available models and data should always be used, in optimum combinations, to generate the best quality analytical outputs, with choices re-examined based on evolution of available models, data and target computation scenarios. It should also be noted that targeted conjoint use of particular data and models may support assessment of accuracy of both data and models, through ‘prediction of measurables’ – i.e. using modelling in conjunction with certain data, to generate other data which is in fact directly measurable or available on the real network.¶
This section presents different technologies that can be used to build a NDT, and details the advantages and disadvantages of using them to implement a NDT. It takes into account how they perform with respect to the requirements of accuracy, speed, and scale of the NDT predictions.¶
Packet-level simulators, such as OMNET++ [OMNET] and NS-3 [ns-3] simulate network events. In a nutshell, they simulate the operation of a network by processing a series of events, such as the transmission of a packet, enqueuing and dequeuing packets in the router, etc. Hence, they offer excellent accuracy when predicting network performance metrics (delay, jitter and loss), but they take a significant amount of time to run the simulation. They scale linearly with number of packets to simulate.¶
In fact, the simulation time depends on the number of events to process [limitations-net-sim]. This limits the scalability of simulators, even if the topology does not change: increasing traffic intensities will take longer to simulate because more packets enter the network per unit of time. Conversely, simulating the same traffic intensity in larger topologies will also increase the simulation time. For example, consider a simulator that takes 11 hours to process 4 billion events (these values are obtained from an actual simulation). Although 4 billion events may appear a large figure, consider:¶
These figures show that, despite the high accuracy of network simulators, they take too much time to calculate performance estimations.¶
Network emulators run the original network software in a virtualized environment. This makes them easy to deploy, and depending on the emulation hardware, they can produce reasonably fast estimations. However, for large scale networks their speed will eventually decrease because they are not using specific hardware built for networking. For fully-virtualized networks, emulating a network requires as many resources as the real one, which is not cost-effective.¶
In addition, some studies have reported variable accuracy depending on the emulation conditions, both the parameters and underlying hardware and OS configurations [emulation-perf]. Hence, emulators show some limitations if we want to build a fast and scalable NDT. However, emulators are useful in other use cases, for example in training, debugging, or testing new features.¶
Queueing Theory (QT) is an analytical tool that models computer networks as a series of queues. The key advantage of QT is its speed, because the calculations rely on mathematical equations. QT is arguably the most popular modeling technique, where networks are represented as interconnected queues that are evaluated analytically. This represents a well-established framework that can model complex and large networks.¶
However, the main limitation of QT is the traffic model: although it offers high accuracy for Poisson traffic models, it presents poor accuracy under realistic traffic models [qt-precision]. Internet traffic has been extensively analyzed in the past two decades, and despite the community has not agreed on a universal model, there is consensus that in general aggregated traffic shows strong autocorrelation and a heavy-tail [inet-traffic].¶
Finally, Neural Networks (NN) and other Machine Learning (ML) tools are as fast as QT (in the order of milliseconds), and can provide similar accuracy to that of packet-level simulators. They represent an interesting alternative, but have two key limitations. First, they require training the NN with a large amount of data from a wide range of network scenarios: different routings, topologies, scheduling configurations, as well as link failures and network congestion. This dataset may not be always accessible, or easy to produce in a production network (see Section 8.1.4.5). Second, in order to scale to larger topologies and keep the accuracy, not all NN provide sufficient accuracy, therefore, some use cases need custom NN architectures.¶
A MultiLayer Perceptron [MLP] is a basic kind of NN from the family of feedforward NN. In short, input data is propagated unidirectionally from the input layer of neurons through the output. There may be an arbitrary number of hidden layers between the input and output layer. They are widely used for basic ML applications, such as regression.¶
Recurrent Neural Networks [RNN] are a more advanced type of NN because they connect some layers to the previous ones, which gives them the ability to store state. They are mostly used to process sequential data, such as handwriting, text, or audio. They have been used extensively in speech processing [RNN-speech], and in general, Natural Language Processing applications [NLP].¶
Convolutional Neural Networks (CNN), are a Deep Learning NN designed to process structured arrays of data such as images. CNNs are highly performant when detecting patterns in the input data. This makes them widely used in computer vision tasks, and have become the state of the art for many visual applications, such as image classification [CNN-images]. Hence, their current design presents limited applicability to computer networks.¶
Graph Neural Networks [GNN] are a type of neural network designed to work with graph-structured data. A relevant type of GNN with interesting characteristics for computer networks are Message Passing Neural Networks (MPNN). In a nutshell, MPNN exchanges a set of messages between the graph nodes in order to understand the relationship between the input graph and the expected outputs of the training dataset. They are composed of three functions, that are repeated several iterations, depending on the size of the graph:¶
Note that the internal architecture of a MPNN is re-build for each input graph.¶
Such ability to understand graph-structured data naturally renders them interesting for a Network Performance Digital Twin. Since computer networks are fundamentally graphs, they have the potential to take as input a graph of the network, and produce as output performance estimations of such the input network [qt-precision].¶
In the context of Digital Twins based on Machine Learning, they require a training process before they can be deployed. Commonly, the training process makes use of a dataset of inputs and expected outputs, that guides the training process to adjust the internal architecture of e.g. the neural network. There are some caveats regarding the training process:¶
This memo includes no request to IANA.¶
An attacker can alter the software image of the NDT. This could produce inaccurate performance estimations, that could result in network misconfigurations, disruptions or outages. Hence, in order to prevent the accidental deployment of a malicious NDT, the software image of the NDT MUST be digitally signed by the vendor.¶