Internet-Draft | Network Anomaly Semantics | July 2024 |
Graf, et al. | Expires 8 January 2025 | [Page] |
This document explains why and how semantic metadata annotation helps to test, validate and compare outlier detection, supports supervised and semi-supervised machine learning development, enables data exchange among network operators, vendors and academia and make anomalies for humans apprehensible. The proposed semantics uniforms the network anomaly data exchange between and among operators and vendors to improve their network outlier detection systems.¶
This note is to be removed before publishing as an RFC.¶
Discussion of this document takes place on the Operations and Management Area Working Group Working Group mailing list (nmop@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nmop/.¶
Source for this draft and an issue tracker can be found at https://github.com/network-analytics/draft-netana-nmop-network-anomaly-semantics/.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 8 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
[I-D.netana-nmop-network-anomaly-architecture] provides an overall introduction into how anomaly detection is being applied into the IP network domain and which operational data is needed. It approaches the problem space by automating what a Network Engineer would normally do when verifying a network connectivity service. Monitor from different network plane perspectives to understand wherever one network plane affects another negatively.¶
In order to fine tune outlier detection as described in [I-D.netana-nmop-network-anomaly-lifecycle], the results provided as analytical data need to be reviewed by a Network Engineer. Keeping the human out of the monitoring but still involving him in the alert verification loop.¶
This document describes what information is needed to understand the output of the outlier detection for a Network Engineer, but also at the same time is semantically structured that it can be used for outlier detection testing by comparing the results systematically and set a baseline for supervised machine learning which requires labeled operational data.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document defines the following terms:¶
Message Broker: is an intermediary software component that translates messages from the formal messaging protocol of the sender to the formal messaging protocol of the receiver routed in topics. Message brokers are elements in Data Mesh where software applications communicate by exchanging formally-defined messages.¶
Stream Catalog: provides a single point of access that allows users to centrally search semantics for information across a Message Broker.¶
Additionally it makes use of the terms defined in [I-D.netana-nmop-network-anomaly-architecture] and [I-D.ietf-nmop-terminology].¶
The following terms are used as defined in [I-D.netana-nmop-network-anomaly-architecture]:¶
Outlier¶
The following terms are used as defined in [I-D.ietf-nmop-terminology]:¶
In this section observed network symptoms are specified and categorized according to the following scheme:¶
Which action the network node performed for a packet in the Forwarding Plane, a path or adjacency in the Control Plane or state or statistical changes in the Management Plane. For Forwarding Plane we distinguish between missing, where the drop occurred outside the measured network node, drop and on-path delay, which was measured on the network node. For Control Plane we distinguish between reachability, which refers to a change in the routing or forwarding information base (RIB/FIB) and adjacency which refers to a change in peering or link-layer resolution. For Management Plane we refer to state or statistical changes on interfaces.¶
For each action, one or more reasons describe why this action was used. For Drops in Forwarding Plane we distinguish between Unreachable because network layer reachability information was missing, Administered because an administrator configured a rule preventing the forwarding for this packet and Corrupt where the network node was unable to determine where to forward to due to packet, software or hardware error. For on-path delay we distinguish between Minimum, Average and Maximum Delay for a given flow. For Control Plane wherever a the reachability was updated or withdrawn or the adjacency was established or teared down. For Management Plane we distinguish between interfaces states up and down, and statistical errors, discards or unknown protocol counters.¶
For each reason one or more cause describe the cause why the network node has chosen that action.¶
Table 1 consolidates for the forwarding plane a list of common symptoms with their Actions, Reasons and Causes.¶
Action | Reason | Cause |
---|---|---|
Missing | Previous | Time |
Drop | Unreachable | next-hop |
Drop | Unreachable | link-layer |
Drop | Unreachable | Time To Life expired |
Drop | Unreachable | Fragmentation needed and Don't Fragment set |
Drop | Administered | Access-List |
Drop | Administered | Unicast Reverse Path Forwarding |
Drop | Administered | Discard Route |
Drop | Administered | Policed |
Drop | Administered | Shaped |
Drop | Corrupt | Bad Packet |
Drop | Corrupt | Bad Egress Interface |
Delay | Min | - |
Delay | Mean | - |
Delay | Max | - |
Table 2 consolidates for the control plane a list of common symptoms with their actions, reasons and causess.¶
Action | Reason | Cause |
---|---|---|
Reachability | Update | Imported |
Reachability | Update | Received |
Reachability | Withdraw | Received |
Reachability | Withdraw | Peer Down |
Reachability | Withdraw | Suppressed |
Reachability | Withdraw | Stale |
Reachability | Withdraw | Route Policy Filtered |
Reachability | Withdraw | Maximum Number of Prefixes Reached |
Adjacency | Established | Peer |
Adjacency | Established | Link-Layer |
Adjacency | Locally Teared Down | Peer |
Adjacency | Remotely Teared Down | Peer |
Adjacency | Locally Teared Down | Link-Layer |
Adjacency | Remotely Teared Down | Link-Layer |
Adjacency | Locally Teared Down | Administrative |
Adjacency | Remotely Teared Down | Administrative |
Adjacency | Locally Teared Down | Maximum Number of Prefixes Reached |
Adjacency | Remotely Teared Down | Maximum Number of Prefixes Reached |
Adjacency | Locally Teared Down | Transport Connection Failed |
Adjacency | Remotely Teared Down | Transport Connection Failed |
Table 3 consolidates for the management plane a list of common symptoms with their Actions, Reasons and Causes.¶
Action | Reason | Cause |
---|---|---|
Interface | Up | Link-Layer |
Interface | Down | Link-Layer |
Interface | Errors | - |
Interface | Discards | - |
Interface | Unknown Protocol | - |
Metadata adds additional context to data. For instance, in networks the software version of a network node where Management Plane metrics are obtained from as described in[I-D.claise-opsawg-collected-data-manifest]. Where in Semantic Metadata the meaning or ontology of the annotated data is being described. In this section a YANG model is defined in order to provide a structure for the metadata related to anomalies happening in the network. The module is intended to describe the metadata used to "annotate" the operational data collected from the network nodes, which can include time series data and logs, as well as other forms of data that is "time-bounded". The aspects discussed so far in this document are grouped under the concept of "anomaly" which represents a collection of symptoms. The anomaly overall has a set of parameters that describe the overall behavior of the network in a given time-window including all the spotted symptoms (network anomalies).¶
Figure 2 contains the YANG tree diagram [RFC8340] of the Figure 1 which augments the [RFC8343] defined ietf-interfaces and the Figure 3.¶
For each symptom, the following parameters have been assigned: A unique ID for identification, a description of the symptom, a list of affected metrics or counters, start and end time to specify the time-window, a confident score indicating how accurate the symptom was detected, a concern score indicating how critical the symptom is, the annotator indicating if it has been identified by a network expert or an algorithm, the tags with key value where Action, Reason and Cause can be annotated as described in previous section.¶
The YANG module has one typdef defining the score and a grouping which can be augmented.¶
The security considerations.¶
This section provides pointers to existing open source implementations of this draft. Note to the RFC-editor: Please remove this before publishing.¶
A tool called Antagonist has been implemented during the IETF 119 Hackathon, in order to validate the application of the YANG models defined in this draft. Antagonist provides visual support for two important use cases in the scope of this document:¶
The open source code can be found here: [Antagonist]¶
The authors would like to thank xxx for their review and valuable comments.¶