Internet-Draft | Tele Methods Analog Measurement | March 2024 |
Janz & King | Expires 4 September 2024 | [Page] |
Evolution toward network operations automation requires systems encompassing software-based analytics and decision-making. Network-based instrumentation provides crucial data for these components and processes. However, the proliferation of such instrumentation and the need to migrate the data it generates from the physical network to "off-the-network" software, poses challenges. In particular, analog measurement instrumentation, which generates time-continuous real number data, may generate significant data volumes.¶
Methodologies for handling analog measurement instrumentation data will need to be identified and discussed, informed in part by consideration of requirements for the operation of network digital twins, which may be important software-realm consumers of such data.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 4 September 2024.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
Existing studies for network telemetry typically deal with packet-oriented measurements for generating packet traffic, path, discard, latency and other data [RFC7799], [OPSAWG-IFIT-FRAMEWORK]. However, some networking equipment and network operations scenarios feature or use more physically-oriented measurement instrumentation that generates data of a different character. Here, the particularities of data generated by such "analog" instrumentation are examined, and telemetry methodologies suitable for such data are considered. This consideration is informed by the requirements of specific use cases, including network digital twins.¶
Optical networks, which are increasingly rich in analog instrumentation, are used as a specific example here. But the telemetry methodologies discussed may apply to instrumentation and telemetry intersecting a wide variety of networks and their related operational software, for example, in support of digital twins that provide modeling of radio-based transmission, thermal characteristics or energy consumption.¶
This document presents telemetry methodologies tailored for analog measurement instruments, aiming to enhance data accuracy, transmission efficiency, and real-time monitoring capabilities for network digital twins. The findings underscore the potential of these methodologies to for best practice for telemetry digital twin networks that require analog measurement instruments. It provides a state-of-the-art summary, including gaps and possible areas for further research¶
Photonic networks, which transmit data through light signals via fiber optic cables, are fundamental to telecommunications, internet services, data center operations, and many other critical aspects of modern digital infrastructure. A range of measurement instruments are routinely used in the deployment and maintenance of these networks. Key examples include:¶
The concept of network slicing is a key capability to serve a customer with a wide variety of different service needs expressed as SLOs/SLEs in terms of, e.g., latency, reliability, capacity, and service function-specific capabilities.¶
This section outlines the key capabilities required to realize network slicing in a TE-enabled IETF technology network.¶
These instruments play a critical role in the characterization, deployment, optimization, and troubleshooting of optical networks. But their use tends to be restricted to specific operational phases, requires manual operation, and is generally not compatible with application to operating facilities. The term instrumentation refers more properly to "embedded" capability that is both operable on active infrastructure and capable of continuous measurement operation. Such instrumentation is a necessary foundation for telemetry¶
Optical network instrumentation has typically focused on detecting transmission performance degradation, through measurement of error correction rates in FEC engines, counting of errored OTN frames, etc. Such measurements are typically executed on network elements through time-interval-based counting. The resulting counts may be forwarded to or collected by software on a subscription or polling basis. The data consists of series of integer numbers, or series of time stamp-integer number couplets.¶
In recent years, however, the nature and scope of optical network instrumentation has broadened and deepened [JIANG]. The idea has been to instrument the optical network more richly to support more effective operations management, including using software-based analytics and modeling. Implicated network operations include network and connection planning and configuration, network and connection fault management (fault and impairment detection, classification, localization, preemption, correction), and others.¶
The optical network is a high-performance analog transmission network, so, unsurprisingly, much of this new instrumentation is analog; that is, it produces time-continuous real-number data or data sets. Examples include optical loss, optical power (total, channel peak, etc.), optical spectra (narrow-band-filtered power measured at a series of center wavelengths), differential group delay (DGD), polarization mode dispersion (PMD), polarization dependent loss (PDL), Stokes vector components reflecting state of polarization (SOP), linear optical signal-to-noise ratio (OSNR) and generalized optical signal-to-noise ratio (GSNR). Many of these measurements are synthesized by coherent receivers across the network, while some may be synthesized by in-span elements such as amplifiers and ROADMs.¶
One application of this data in the software realm is with optical network digital twins (NDTs), used for transmission performance modeling [JANZ], [NMRG-PODTS]. Such NDTs constitute an important class of analytical engine supporting optical network and service planning and other operations, and they rely heavily on data from network instrumentation to enable accurate modeling of optical transmission performance on targeted variations of the actual network and service configuration, state and condition. A default expectation would be that all instrumentation measurements are reflected continuously in the software realm for use by optical NDTs. However, at best only an approximation to this can be achieved (e.g., only a series of sampled measurements may in fact be streamed from the network), so the imperative is to find efficient ways to support sufficiently-accurate such approximations. This imperative grows more compelling the greater the scale of the network and the greater the richness of embedded instrumentation.¶
A second example application lies in the fault management domain, wherein analysis of rich data, concentrated around the time of a detected evolution in transmission conditions, may be used to classify and localize the origin of the observed evolution [HAHN]. Transient evolutions of transmission performance are commonplace on optical networks and have myriad causes, including extrinsic causes such as lightning strikes, earthworks and construction, weather, road and rail traffic, fires, etc., as well as intrinsic causes including continuous or discrete deteriorations to equipment or fibre plant. Detection, classification, and localization of transmission performance evolutions permit assessment of the likelihood, expected severity, and rate of further deterioration, and planning of timely and cost-effective corrective interventions where indicated. However, successful analysis may depend on the availability of richer data sets in software that may be supported by continuous streaming or required by other applications.¶
[RFC9232]provides a framework for considering concepts, constructs and developments in network telemetry. Many of the methods and mechanisms it discusses or suggests are invoked here.¶
An analog-to-digital conversion process typically converts analog signals into digital data that can be transmitted, stored, and processed more efficiently. This often involves sampling the signal at a certain rate and quantizing the amplitude into digital values. The "mirroring" (transmission for replication at a different place) of continuous-time real number data, generated by in-network instrumentation, begins with sampling and representing measured values by a scalar or vector of finite-decimal-place numbers. As neither sampling at fixed intervals, nor fixed time alignment or offset among measurement points in the network or between such points and the off-network software realm, can generally be assumed; it is useful that instrumentation should generate, as primary data, a series of couplets or vectors consisting of sample time stamps and corresponding measured data values.¶
Inadequate sampling frequency and quantization error are both potential sources of error, in the - literal or effective - "reconstruction"" of the original time-continuous measurement in the software realm. It is possible that sampling frequencies might be varied in response to evolving temporal characteristics of measured parameters; this is one strategy for data reduction (and one reason why sampling may not occur at fixed-period intervals).¶
Requirements on the precision of reconstructed data, its time basis, and the alignment in time of different reconstructed measurements; are determined by the operational role played by the analytical functions that consume the data. Some operations of interest, such as network and service planning or fault and impairment management, may impose only relatively relaxed requirements on time synchronization among measurement instruments, and between those instruments and the software realm. Other applications, e.g., those concerning operations tending toward closed loop control, may require tighter temporal data alignment among different measurement sources. These considerations have implications in terms of source and synchronization of clocks producing time stamps; but in general, requirements on clock synchronization and precision are far from those required for bit-level operations: i.e. they are generally more like "network time" than "digital time".¶
Similarly, requirements on the absolute or relative (i.e. among different measurement instruments) precision of reconstructed measured data values may be application-dependent. In many cases, relative precision, or precision consistency, may be more important than absolute precision.¶
With telemetric data volume a primary potential challenge, methods for reducing data volume associated with analog measurement instrumentation are of evident interest. Signals may also be filtered to remove noise and unwanted frequencies to improve the data quality.¶
Data compression is an obvious candidate methodology for bandwidth reduction. Methods for lossless compression of series of numerical data have been widely studied, e.g. [RATANAWORABHAN].¶
Obviously, such compression must be implemented as a "pre-processing" function executed by the telemetric instrumentation itself, or some proxy to it. Similarly, decompression must be implemented as a "post-processing" function within the software realm. Where time stamps are uncompressed, depending on the compression methodology employed, it may be possible to support selective decompression of data, e.g., only on selected time intervals. This might allow for application-driven "as-required" post-processing (decompression) of more limited volumes of telemetric data.¶
The compressibility of time-based data depends on its evolution in data-entropic terms, resulting in streamed data flows of varying volume or rate. The effective transmission and reception rates of data samples thus may vary and differ at any point from the rate of data generation. This is another reason why data samples may require time stamps.¶
Other forms of effective data reduction through pre-processing may also be useful, or preferred:¶
Post-processing of threshold-driven data may or may not be required by applications. For example, an application may generate a scenario for behavioral analysis by an NDT that requires the "current" data from network instrumentation. To whatever precision is effectively reflected in the details of the operating thresholding mechanisms, that data is simply the most recently transmitted sample from network measurement instruments. Another application, however, perhaps one dealing with fault or impairment management, might require a regular and continuous time series presentation of measured data. In that case, e.g. interpolation or other post-processing of received data samples might be needed.¶
Other kinds of pre-processing may also be interest, including normalization of data, frequency domain conversion, and computation of statistics.¶
As discussed in [RFC9232], in-network pre-processing of telemetry data may usefully be "programmed" by telemetry clients (i.e., software applications that are consumers of instrumentation data), including dynamically or variably. The range and nature of software applications and their data requirements may vary among systems, may evolve with time within any given system - based on experience and learning (automated or not) or with the deployment of new capabilities - and may also vary as a function of available instrumentation capabilities on a given network, which themselves may evolve.¶
Streaming - i.e., subscription-based push - is, as identified in [RFC9232] and other works, and as suggested by the discussion above, expected to be the principal, if not exclusive, operational modality for telemetry, including analog instrumentation telemetry. Software clients consume data generated by the network, and having identified which data they require and from where within the network, use subscriptions to place themselves in a position to receive it, on an ongoing basis, without continuing operational steps.¶
Triggered transmission of "batched" data is aligned with a streaming paradigm, as the telemetry server (i.e., instrumentation) must detect the trigger conditions and react by capturing and transmitting data to subscribing clients.¶
It is worth considering, however, whether polling can or should be completely dispensed with, or whether it might retain some utility in some cases or circumstances.¶
The discussion so far supports a view that the data needs of NDTs can be satisfied, and in fact probably are best served by, streaming. However, polling could be used if NDT-based analyses are required relatively infrequently, do not require very rapid execution, and do not draw arbitrarily on historical data. Polling might also be useful as a complementary mechanism to streaming. For example, to reduce data transmission and handling volumes, an NDT might choose to unsubscribe from telemetry it has observed changes little with time. However, for particularly critical analyses, the NDT might want to ensure that all available telemetry data is up-to-date, by polling the unsubscribed instrumentation. Further, if certain kinds of data compression are used, decompression processes can enter into errored regimes e.g. through transmission loss of telemetry data. Periodic polling may be useful to "re-set" absolute data values in such cases. In fact, as suggested in [RFC7799], the possibility of transmission loss of streamed telemetry packets, a concern particularly if unreliable transport paradigms such as UDP are used, may provide a general reason to enable polling as a "failsafe" mechanism.¶
Communication protocols facilitate the reliable data exchange between telemetry devices and control systems. Depending on the method, streaming and/or polling, various messaging protocols exist to provide efficient delivery of instrumentation data.¶
A complete framework for analog instrumentation telemetry might require data models supporting:¶
This document makes no requests for action by IANA.¶
Operational considerations for Optical Network Measurement Instrumentation involve a range of factors to ensure accurate, reliable, and efficient performance of the optical networks. These considerations are critical for deploying, maintaining, and troubleshooting fiber optic systems. Key operational considerations include:¶
Future version of this document will expand on the topics above and increase the scope of operational considerations.¶
The security implications of optical network telemetry are critical, given the increasing reliance on optical networks for data transmission in various sectors. Ensuring the security and integrity of these networks and thetelemetry instrumentation used to measure and maintain them is paramount to prevent unauthorized access, data breaches, potential service disruptions, and use as possible threat vectors and attack surfaces.¶
Key security considerations include:¶
Future version of this document will expand on the topics above and increase the scope of security considerations.¶
Thanks to discussions in the Network Digital Twin discussions Network Management Research Group that provided further input into this work.¶
This work is supported by the UK Department for Science, Innovation and Technology under the Future Open Networks Research Challenge project TUDOR (Towards Ubiquitous 3D Open Resilient Network). The views expressed are those of the authors and do not necessarily represent the project¶