Internet-Draft | Incident Terminology | January 2024 |
Davis & Farrel | Expires 21 July 2024 | [Page] |
This document sets out some key terms that are fundamental to a common understanding of Incident Management.¶
The purpose of this document is to bring clarity to discussions and other work related to Incident Management in particular YANG models and management protocols that report, make visible, or manage incidents.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 21 July 2024.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Incident Management is an important aspect of network management and control solutions. It deals with the reporting, inspection, correlation, and management of events within the network where those events have a negative effect on the network's ability to forward traffic in an optimal way. Incident management extends to include actions taken that work toward recovery of optimal network behavior.¶
A number of work efforts within the IETF seek to provide components of an Incident Management system, such as YANG models or management protocols. It is important that a common terminology is used so that there is a clear understanding of how the elements of the management and control solutions fit together, and how the incidents will be handled.¶
This document sets out some key terms that are fundamental to a common understanding of Incident Management.¶
The terms are presented below in an order that is intended to flow such that it is possible to gain understanding reading top to bottom.¶
A component or commodity that can be used in a valuable way in the performance of some activity.¶
A particular condition that something is in (at a specific time).¶
A modification to the state of a resource in time.¶
A particular relevant change.¶
The state modification in an occurrence.¶
Compared with a change which is over a period of time, an event happens at a measurable instant.¶
An event that has a negative effect that is not as required/desired.¶
A state regarded as undesirable that needs to be dealt with and overcome.¶
There is a need to change to a desirable/appropriate state.¶
Note that there is a historic aspect to this. The current state may be operational, but there was a failure that is unexplained and therefore the network is in a state of unexplained recent failure which, although the network has recovered, is a problem.¶
Note that whilst a problem is unresolved it requires attention. A record of a resolved problem may be maintained in a log of history.¶
Note that the network may be in a state which is considered to be a problem from several perspectives (e.g., there is loss of light causing services to fail). A state change (so that the light recovers) may cause the problem to be resolved from one perspective (the services have are now operational) but may still leave the problem as unresolved from another perspective (because the loss of light has not been explained). There can be further developments (the reason for the temporary loss of light is traced to a microbend in the fiber that is repaired) that cause another problem to be resolved. But this leaves a final problem still unresolved (why did the microbend occur in the first place?).¶
The indication of the potential existence of a problem¶
Communication of a state change.¶
May be an alert.¶
An indication to a human operator highlighting the potential presence of a problem.¶
The alarm state change is an event.¶
A state, considered as a problem, that persists for a limited amount of time before becoming resolved without direct action by an operator or control system.¶
A state that is not maintained, but keeps occurring in some meaningfully short time frame.¶
The activity, event, etc. that gives rise to an (undesired) event, condition, or behavior.¶
To notice the presence of something (state, activity, form, etc.).¶
Hence also to notice a change (from the perspective of the viewer).¶
The state of something with regard to its working order.¶
Here, this term is used where the state is an issue with operation. For example, "signal degraded" is a condition that indicates an issue with the operation.¶
This document specifies terminology and has no direct effect on the security of implementations or deployments. However, protocol solutions and management models need to be aware of several aspects:¶
The exposure of information pertaining to incidents may make available knowledge of the internal workings of a network (in particular its vulnerabilities) that may be of use to an attacker.¶
Systems that generate management information (messages, notifications, etc.) when incidents occur, may be attacked by causing them to generate so much information that the management system is swamped an unable to properly manage the network.¶
Reporting false information about incidents (or masking reports of incidents) may cause the management system to function incorrectly.¶
In general, Incident Management will not expose information about end-user activities or user data. The main privacy concern is for a network operator to keep control of all information about incidents to protect their privacy and the details of how they operate their network.¶
This document makes no requests for IANA action.¶