Internet-Draft | DTNMA | October 2022 |
Birrane, et al. | Expires 8 April 2023 | [Page] |
This document describes the motivation for, and services required of, the management of devices deployed in a Delay-Tolerant Networking (DTN) environment. Together, this set of information outlines a conceptual DTN Management Architecture (DTNMA) suitable for deployment in any of the challenged and constrained DTN operational environments.¶
The DTNMA is supported by two types of asynchronous behavior. First, the DTNMA does not presuppose any synchronized transport behavior between managed and managing devices. Second, the DTNMA does not support any query-response semantics. In this way, the DTNMA allows for operation in extremely challenging conditions, to include over uni-directional links and cases where delays/disruptions prevent operation over traditional transport layers.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 8 April 2023.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The Delay-Tolerant Networking (DTN) architecture (as described in [RFC4838]) has been designed to cope with data exchange in challenged networks. Just as the DTN architecture requires new capabilities for transport and transport security, special consideration must be given for the management of DTN devices.¶
This document describes the DTN Management Architecture (DTNMA) designed to provide configuration, monitoring, and local control of both application and network services on a managed device operating either within or across a challenged network.¶
The structure of the DTNMA is derived from the unique properties of challenged networks are defined in [RFC7228]. These properties include cases where an end-to-end transport path may not exist at any moment in time and when delivery delays may prevent timely communications between a network operator and a managed device. These challenges may be caused by physical impairments such as long signal propagations and frequent link disruptions, or by other factors such as quality-of-service prioritizations, service-level agreements, and other consequences of traffic management and scheduling.¶
Device management in these environments must occur without human interactivity, without system-in-the-loop synchronous function, and without requiring a synchronous underlying transport layer. This means that managed devices need to determine their own schedules for data reporting, their own operational configuration, and perform their own error discovery and mitigation. Importantly, these capabilities must be designed and implemented in a way that results in outcomes that are determinable by an outside observer, as such observers may need to connect with a managed device after significant periods of disconnectivity.¶
The desire to define asynchronous and autonomous device management is not new. However, challenged networks (in general) and the DTN environment (in particular) represent unique deployment scenarios and impose unique design constraints. To the extent that these environments differ from more traditional, enterprise networks, their management may also differ from the management of enterprise networks. Therefore, existing techniques may need to be adapted to operate in the DTN environment or new techniques may need to be created.¶
Ultimately, the DTNMA is designed to leverage any transport, network, and security solutions designed for challenged networks. However the DTNMA is designed to be usable in any environment in which the Bundle Protocol (BPv7) [RFC9171] may be deployed.¶
This document describes the motivation, services, desirable properties, roles/responsibilities, logical data model, and system model that form the DTNMA. These descriptions comprise a concept of operations for management of challenged networks.¶
This document is not a normative standardization of a physical data model or any individual protocol. Instead, it serves as informative guidance to authors and users of such models and protocols.¶
The DTNMA is independent of transport and network layers. It does not, for example, require the use of BP, TCP, or UDP. Similarly, it does not pre-suppose the use of IPv4 or IPv6.¶
The DTNMA is not bound to a particular security solution and does not presume that transport layers can exchange messages in a timely manner. It is assumed that any network using this architecture supports services such as naming, addressing, routing, and security that are required to communicate DTNMA messages as would be the case with any other messages in the network.¶
While possible that a challenged network may interface with an unchallenged network, this document does not specifically address compatibility with other management approaches.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].¶
The remainder of this document is organized into the following seven sections, described as follows.¶
This section describes those design properties that are desirable when defining an architecture that must operate across challenged links in a network. These properties ensure that network management capabilities are retained even as delays and disruptions in the network scale. Ultimately, these properties are the driving design principles for the DTNMA.¶
Early work on the rationale and motivation for specialized management for the DTN architecture was captured in [BIRRANE1], [BIRRANE2], and [BIRRANE3]. Prototyping work done in accordance with the DTN Research Group within the IRTF as documented in [I-D.irtf-dtnrg-dtnmp] provides some of the desirable properties and necessary adaptations for this proposed management system for challenged networks.¶
The unique nature and constraints that characterize challenged networks require the development of new network capabilities to deliver expected network functions. For example, the distinctive constraints of the DTN architecture required the development of BPv7 [RFC9171] for transport functions and the Bundle Protocol Security (BPSec) Extensions [RFC9172] to provide end-to-end security. Similarly, a new approach to network management and the associated capabilities is necessary for operation in these challenged environments and when using these new transport and security mechanisms.¶
This section discusses the characteristics of challenged networks and how they may violate the assumptions made by non-DTNMA approaches about the operating environment.¶
Constrained networks are defined as networks where "some of the characteristics pretty much taken for granted with link layers in common use in the Internet at the time of writing are not attainable." [RFC7228]. This broad definition captures a variety of potential issues relating to physical, technical, and regulatory constraints on message transmission. Constrained networks typically include nodes that regularly reboot or are otherwise turned off for long periods of time, transmit at low or asynchronous bitrates, or have very limited computational resources [RFC7228].¶
Separately, a challenged network is defined as one that "has serious trouble maintaining what an application would today expect of the end-to-end IP model" [RFC7228]. This definition includes networks where there is never simultaneous end-to-end connectivity, when such connectivity is interrupted at planned or unplanned intervals, or when delays exceed those that could be accommodated by IP-based transport. Links in such networks are often unavailable due to attenuation, propagation delays, mobility, occultation, and other limitations imposed by energy and mass considerations.¶
Challenged networks exhibit the following properties that impact the way in which the function of network management is considered. These properties can make the establishment of sessions, synchronous data exchange, and the transmission of larger payloads in these networking environments difficult or impossible.¶
Finally, it is noted that "all challenged networks are constrained networks ... but not all constrained networks are challenged networks ... Delay-Tolerant Networking (DTN) has been designed to cope with challenged networks" [RFC7228].¶
Challenged networks differ from other kinds of constrained networks, in part, in the way that the topology and roles and responsibilities of the network may evolve over time. From the time at which data is generated to the time at which that data is delivered, the topology of the network and the roles assigned to various nodes, devices, and other actors may have changed several times. In certain circumstances, the physical node receiving messages for a given logical destination may have also changed.¶
Challenged networks cannot guarantee that a timely data exchange can be maintained between managing and managed devices. The topological changes characteristic of these networks can impact the path of messages, requiring the transport to wait to establish the incremental connectivity necessary to advance messages along their expected route. The BPv7 transport protocol implements this store-and-forward operation for DTNs.¶
When topological change impacts the semantic roles and responsibilities of nodes in the network then local configuration and autonomy must be present at the node to determine and execute time-variant changes. For example, the BPSec protocol does not encode security destinations and, instead, requires nodes in a network to identify themselves as security verifiers or acceptors when receiving secured messages.¶
When applied to network management, the semantic roles of Agent and Manager may also change with the evolving topology of the network. Individual nodes must implement desirable behavior without relying on a single configuration oracle or other coordinating function such as an operator-in-the-loop and/or supporting infrastructure. These mechanisms cannot be supported by an asynchronous, challenged network.¶
The support for changing roles implies that there must not be a defined relationship between a particular managing and managed device in a network. A network management architecture for challenged networks must support the association of multiple managing devices with a single managed device, allow "control from" and "reporting to" managing devices to function independent of one another, and allow the logical role of a managing device to be physically shared among assets and change over time..¶
Together, this means that a network management architecture suitable for challenged environments must account for certain operational situations.¶
In these and related scenarios, managed devices need to operate with local autonomy because managing devices may not be available within operationally-relevant timeframes. Managing devices deliver instruction sets that govern the local, autonomous behavior of the managed device. These behaviors include (but are not limited to) collecting performance data, state, and error conditions, and applying pre-determined responses to pre-determined events. The goal is asynchronous and autonomous communication between the device being managed and the manager, at times never expecting a reply, and with knowledge that commands and queries may be delivered much later than the initial request.¶
A DTNMA built to support DTN must be agnostic of the underlying physical topology, transport protocols, security solutions, and supporting infrastructure. The DTNMA shall be limited to only the network management protocols, message structure, and information content, including but not limited to the type of objects to manage and the expected behavior and interaction upon access or execution of those objects. There shall be no prescribed association between between a manager and an agent other than those defined in the responsibilities associated with each in this document. There should be no limitation to the number of managers that can control an agent, the number of managers that an agent should report to, or any requirement that a manager and agent relationship implies a pair.¶
A model to define a shared contract between agent and manager has long been an approach to network management solutions. A model is a schema that defines this contract and defines all sources of information that can be retrieved, configured, or executed, as well as the various functions for parameterization, filtering, or event driven behavior. A model gives way to concise representation of information, intelligent suffixing, and patterning. The DTNMA model shall be designed with a limited set of object and data types to allow and be organized hierarchally to provide for highly compressible and concise encoding. This allows the agents and managers to infer context with limited link utilization necessary in DTNs.¶
Pull management mechanisms require that a Manager send a query to an Agent and then wait for the response to that query. This practice implies a control-session between entities and increases the overall message traffic in the network. Challenged networks cannot guarantee that the round-trip data-exchange will occur in a timely fashion. In extreme cases, networks may be comprised of solely uni-directional links which drastically increases the amount of time needed for a round-trip data exchange. Therefore, pull mechanisms must be avoided in favor of push mechanisms.¶
Push mechanisms, in this context, refer to the ability of Agents to leverage rule-based criteria to determine when and what information should be sent to Managers. This could be based solely off logic applied to existing VARs or EDDs, based off operations applied to data elements, or triggered as a function of relative time.¶
Push mechanisms do not require round-trip communications as Managers do not request each reporting instance; Managers need only request once, in advance, that information be produced in accordance with a predetermined schedule or in response to a predefined state on the Agent. In this way information is "pushed" from Agents to Managers and the push is "intelligent" because it is based on some internal evaluation performed by the Agent.¶
Protocol designers must balance message size versus message processing time at sending and receiving nodes. Verbose representations of data simplify node processing whereas compact representations require additional activities to generate/parse the compacted message. There is no asynchronous management advantage to minimizing node processing time in a challenged network. However, there is a significant advantage to smaller message sizes in such networks. Compact messages require smaller periods of viable transmission for communication, incur less re-transmission cost, and consume less resources when persistently stored en-route in the network. A DTN Management Protocol (DTNMP) should minimize PDUs whenever practical, to include packing and unpacking binary data, variable-length fields, and pre-configured data definitions.¶
Elements within the management system must be uniquely identifiable so that they can be individually manipulated. Identification schemes that are relative to system configuration make data exchange between Agents and Managers difficult as system configurations may change faster than nodes can communicate.¶
Consider the following common technique for approximating an associative array lookup. A manager wishing to do an associative lookup for some key K1 will (1) query a list of array keys from the agent, (2) find the key that matches K1 and infer the index of K1 from the returned key list, and (3) query the discovered index on the agent to retrieve the desired data.¶
Ignoring the inefficiency of two pull requests, this mechanism fails when the Agent changes its key-index mapping between the first and second query. Rather than constructing an artificial mapping from K1 to an index, an AMP must provide an absolute mechanism to lookup the value K1 without an abstraction between the Agent and Manager.¶
Custom definition of new data from existing data (such as through data fusion, averaging, sampling, or other mechanisms) provides the ability to communicate desired information in as compact a form as possible. Specifically, an Agent should not be required to transmit a large data set for a Manager that only wishes to calculate a smaller, inferred data set. These new defined data elements could be calculated and used both as parameters for local stimulus-response rule-based criteria or simply serve to populate custom reports and tables. Since the identification of custom data sets is likely to occur in the context of a specific network deployment, AMPs must provide a mechanism for their definition.¶
Aggregation of controls and custom formatting of reports and tables are equally important. Custom reporting provides the flexibility allowing the manager to define the desired format of all information to be sent over the challenged network from the agents, serving to both save link capacity and increase the value of returned information. Aggregation of controls allows a Manager to specify a set of controls to execute, specifying both the order and criteria of execution. This aggregate set of controls can be sent as a single command rather than a series of sequential operands. In this case it is additionally possible to use outputs of one command to serve as an input to the next at the Agent.¶
DTNMA network functions must be achievable using only knowledge local to the Agent. Rather than directly controlling an Agent, a Manager configures an engine of the Agent to take its own action under the appropriate conditions in accordance with the Agent's notion of local state and time.¶
Such an engine may be used for simple automation of predefined tasks or to support semi-autonomous behavior in determining when to run tasks and how to configure or parameterize tasks when they are run. Wholly autonomous operations may be supported where required. Generally, autonomous operations should provide the following benefits.¶
Several network management solutions have been developed for both local-area and wide-area networks. Their capabilities range from simple configuration and report generation to complex modeling of device settings, state, and behavior. Each of these approaches are successful in the domains for which they have been built, but are not all equally functional when deployed in a challenged network.¶
Generally, network management solutions that require managing and managed devices to push and pull large sets of data may fail to operate in a challenged (and thus, constrained) environment as a function of transmit power, bitrates, and the ability of the network to store and forward large data volumes over long periods of time.¶
Newer network management approaches are exploring the application of more efficient message-based management, less reliance on end-to-end transport sessions, and increased levels of autonomy on managed devices. These approaches focus on problems different from those described above for challenged networks. For example, much of the autonomous network management work currently undertaken focuses more on well-resourced, unchallenged networks where devices self-configure, self-heal, and self-optimize with other nodes in their vicinity. While an important and transformational capability, such solutions will not be deployable in a challenged network environment.¶
This section describes some of the well-known, standardized protocols for network management and contrasts their purposes with the needs of challenged network management solutions.¶
Early network management tools designed for unchallenged networks provide synchronous mechanisms for communicating locally-collected data from devices to operators. Applications are managed using a "pull" mechanism, requiring a managing device to explicitly request the data to be produced and transmitted by a managed device.¶
The de facto example of this architecture is the Simple Network Management Protocol (SNMP) [RFC3416]. SNMP utilizes a request/response model to set and retrieve data values such as host identifiers, link utilizations, error rates, and counters between application software on managing and managed devices. Data may be directly sampled or consolidated into representative statistics. Additionally, SNMP supports a model for unidirectional push notification messages, called traps, based on predefined triggering events.¶
SNMP managing devices can query agents for status information, send new configurations, and request to be informed when specific events have occurred. Traps and queryable data are defined in a data model known as Managed Information Bases (MIBs) which define the information for a particular data standard, protocol, device, or application.¶
While there is a large installation base for SNMP, there are several aspects of the protocol that make it inappropriate for use in a challenged network. SNMP relies on sessions with low round-trip latency to support its "pull" model that challenged networks cannot maintain. Complex management can be achieved, but only through craftful orchestration using a series of real-time, end-to-end, managing-device-generated query-and-response logic that is not possible in challenged networks.¶
The SNMP trap model provides some low-fidelity Agent-side processing. Traps are typically used for alerting purposes, as they do not support an agent response to the event occurrence. In a challenged network where the delay between a managing device receiving an alert and sending a response can be significant, the SNMP trap model is insufficient for event handling.¶
Adaptive modifications to SNMP to support challenged networks and more complex application-level management would alter the basic function of the protocol (data models, control flows, and syntax) so as to be functionally incompatible with existing SNMP installations. This approach is therefore not suitable for use in challenged networks.¶
Yet Another Next Generation (YANG) [RFC6020] is a data modeling language used to model configuration and state data of managed devices and applications. The YANG model defines a schema for organizing and accessing a device's configuration or operational information. Once a model is developed, it is loaded to both the client and server, and serves as a contract between the two. A YANG model can be complex, describing many containers of managed elements, each providing methods for device configuration or reporting of operational state.¶
YANG supports the definition of parameterized Remote Procedure Calls (RPCs) to be executed on managed nodes as well as the definition of push notifications within the model. The RPCs are used to execute commands on a device, generating an expected, structured response. However, RPC execution is strictly limited to those issued by the client. Commands are executed immediately and sequentially as they are received by the server, and there is no method to autonomously execute RPCs triggered by specific events or conditions.¶
YANG defines the schema for data used by network management protocols such as NETCONF [RFC6241], RESTCONF [RFC8040], and CORECONF [I-D.ietf-core-comi]. These protocols provide the mechanisms to install, manipulate, and delete the configuration of network devices.¶
NETCONF is a stateful, XML-based protocol that provides a RPC syntax to retrieve, edit, copy, or delete any data nodes or exposed functionality on the server. It requires that underlying transport protocols support long-lived, reliable, low-latency, sequenced data delivery sessions. NETCONF connections are required to provide authentication, data integrity, confidentiality, and replay protection through secure transport protocols such as SSH or TLS. A bi-directional NETCONF session must be established before any data transfer can occur.¶
NETCONF uses verbose XML files to provide the ability to update and fetch multiple data elements simultaneously. These XML files are not easily or efficiently compressed, which is an important consideration for challenged networks.¶
RESTCONF is a stateless RESTful protocol based on HTTP. RESTCONF configures or retrieves individual data elements or containers within YANG data models by passing JSON over REST. This JSON encoding is used to GET, POST, PUT, PATCH, or DELETE data nodes within YANG modules. RESTCONF requires the use of a secure transport such as TLS.¶
Unlike NETCONF, RESTCONF is stateless. However, the transfer of large data sets, such as configuration changes of many data elements, or the collection of information, depends greatly on the support of synchronous communication.¶
CORECONF is stateless, as RESTCONF is, and is built atop the Constrained Application Protocol (CoAP) [RFC7252] which defines a messaging construct developed to operate specifically on constrained devices and networks by limiting message size and fragmentation. CORECONF requires the use of DTLS or Object Security for Constrained RESTful Environments (OSCORE) [RFC8613] to fulfill its security requirements. COAP supports a store and forward operation similar to DTN; however, it operates strictly at the application layer and requires specification of pre-determined proxies and moments of bi-directional communication.¶
CORECONF leverages the Concise Binary Object Representation (CBOR) [RFC8949] of YANG modules [I-D.ietf-core-yang-cbor] and provides further compressibility through the use of YANG Schema Item iDentifiers (SIDs) [I-D.ietf-core-sid]. While these design choices offer reductions in encoded data size, data compressibility is still dependent on underlying transport protocols and limited by the organization of the YANG schema.¶
YANG notifications are promising for challenged network management, defined as subscriptions to both YANG notifications [RFC8639] and YANG PUSH notifications [RFC8641]. In this model, a client may subscribe to the delivery of specific containers or data nodes defined in the model, either on a periodic or "on change" basis. The notification events can be filtered according to XPath [xpath] or subtree [RFC6241] filtering as described in [RFC8639] Section 2.2.¶
While the YANG model provides great flexibility for configuring a homogeneous network of devices, it becomes a burden in challenged networks where concise encoding is necessary. The YANG schema provides flexibility in the organization of data to the model developer. The YANG schema supports a broad range of data types noted in [RFC6991]. All the data nodes within a YANG model are referenced by a verbose, string-based path of the module, sub-module, container, and any data nodes such as lists, leaf-lists, or leaves, without any explicit hierarchical organization based on data or object type.¶
Recent efforts for compression of the YANG model have used CBOR [RFC9254] and SIDs [I-D.ietf-core-sid] to address YANG data nodes through integer identifiers. However, these compression strategies lack a formal hierarchical structure. The manual mapping of SIDs to YANG modules and data nodes limits the portability of these models and further increases the size of any encoding scheme.¶
While the protocols described above are useful and well-realized for different applications and networking environments, they simply do not meet the requirements for the management of challenged networks. However, that does not exclude features from each from contributing to the design of DTNMA.¶
The concept of a data model for describing network configuration elements has been used by many protocols to ensure compliance between managing and managed devices. A data model provides error checking and bounds operations, which is necessary when controlling mission critical devices.¶
The SNMP MIBs provide well-organized, hierarchical OIDs which support the compressibility necessary for challenged DTNs. YANG, NETCONF, and RESTCONF support notification abilities needed for DTN network management, but have limited features for describing autonomous execution and behavior.¶
CORECONF provides CBOR encoding and concise reference abilities using SIDs, but lack a hierarchical structure or authoritative planning to allocation. While this approach will become too verbose and prove limiting in the future, the encoding considerations from CORECONF can be used to inform the design of the DTNMA.¶
EJB - TODO. This subsection presents Services Provided - This section identifies and defines the DTNMA services provided to network and mission operators.¶
The future of network operations requires more autonomous behavior including self-configuration, self-management, self-healing, and self-optimization. One approach to support this is termed Autonomic Networking [RFC7575] and includes many recent efforts describe Autonomic architecture and protocols [RFC8993] as well as cite the gaps that exist between traditional and Autonomic Networking approaches [RFC7576]. Challenged networks require similar degrees of autonomy, however they lack the ability to depend on the complex coordination between nodes and the centralized and distributed supporting infrastructure that Autonomic networking proposes.¶
Policy-based management is a well-established approach that uses business and operations support systems to monitor and manage devices and networks in real-time. These systems leverage various, existing network management protocols and their supporting features, such as the use of YANG module classification types [RFC8199], to describe abstract services and support configuration of service level agreements. These services can then enact additional control over devices using network element modules. This approach is quite comprehensive but requires sufficient, supporting infrastructure and synchronous access, which cannot be provided by challenged networks.¶
The DTNMA is designed with consideration for the constraints discussed in section Section 3.1.1. The DTNMA seeks to incorporate existing network management protocols and features. However, there are core capabilities the DTNMA must provide in order to serve a challenged network that are not supported by these approaches.¶
The DTNMA proposes a data model that is that is designed for the compression required for a challenged network. The efficiency of data encoding is limited by the efficiency of the underlying data model. For this reason, naming schemes for the DTNMA must be hierarchical and patternable, supporting the level of compressibility needed by the resource-constrained devices that form a challenged network.¶
Autonomous behavior is required for the management of a DTN, which is characterized by link delays and disruptions. The constrained autonomy model of the DTNMA provides the deterministic management necessary for managed devices to detect and respond to events without intervention from an in-the-loop managing device. The separation of remote and local, autonomous managing devices supports autonomous behavior even when synchronization is not feasible.¶
The sections below describe the desirable features of the DTNMA and build from existing protocols and mechanisms where possible, with adaptations made for the challenged networking environment.¶
This section describes a network management concept for challenged networks (generally) and those conforming to the DTN architecture (in particular). The goal of this section is to describe how DTNMA services provide DTNMA desirable properties.¶
Similar to other network management architectures, the DTNMA draws a logical distinction between a managed device and a managing device. Managed devices use a DTNMA Agent (DA) to manage resident applications. Managing devices use a DTNMA Manager (DM) to both monitor and control DAs.¶
The DTNMA differs from some other management architectures in three significant ways, all related to the need for a device to self-manage when disconnected from a managing device.¶
There are a multitude of ways in which both existing and emerging network management protocols, APIs, and applications can be integrated for use in challenged environments. However, expressing the needed behaviors of the DTNMA in the context of any of these pre-existing elements risks conflating systems requirements, operational assumptions, and implementation design constraints.¶
One way to avoid such conflation is to, instead, develop a reference model that can be used to reason about a system independent of implementation. Such a DTNMA reference model is provided in Figure 1 below.¶
DTNMA Reference Model¶
In this reference model, applications and services on a managing device communicate with a DTNMA Manager (DM) which uses pre- shared definitions to create a set of directives that can be sent to a managed device's DTNMA Agent (DA). The DA provides local monitoring and control of the applications and services resident on the managed device. The DA also performs local data fusion as necessary to synthesize data products (such as reports) that can be sent back to the DM when appropriate.¶
This model preserves the familiar concept of "managers" resident on managing devices and "agents" resident on managed devices. However, the DTNMA model is unique in how the DM and DA operate. The DM is used to pre-configure DAs in the network with management policies. it is expected that the DAs, themselves, perform monitoring and control functions on their own. In this way, a properly configured DA may operate without a timely, reliable connection back to a DM.¶
The reference model illustrated in Figure 1 implies the existence of certain logical elements whose roles and responsibilities are discussed in this section.¶
By definition, managed applications and services reside on a managed device. These software entities can be controlled through some interface by the DA and their state can be sampled as part of periodic monitoring. It is presumed that the DA on the managed device has the proper data model, control interface, and permissions to alter the configuration and behavior of these software applications.¶
A DTNMA Agent resides on a managed device. As is the case with other network management approaches, this agent is responsible for the monitoring and control of the applications local to that device. Unlike other network management approaches, the agent accomplishes this task without a regular connection to a DTNMA Manager.¶
The DTNMA Agent performs three major functions on a managed device: the monitoring and control of local applications, production of data analytics, and the administrative control of the agent itself.¶
DTNMA Agents monitor the status of applications running on their managed device and selectively control those applications as a function of that monitoring. The following components are used to perform monitoring and control on an agent.¶
DTNMA Agents generate new data elements as a function of the current state of the managed device and its applications. These new data products may take the form of individual data values, or new collections of data used for reporting. The logical components responsible for these behaviors are as follows.¶
Agents in the DTNMA must perform a variety of administrative services in support of their configuration. The significant such administrative services are as follows.¶
Managing applications and services reside on a managing device and serve as the both the source of DA policy statements and the target of DA reporting. They may operate with or without an operator in the loop.¶
Unlike management applications in unchallenged networks, these applications cannot exert closed-loop control over any managed device application. Instead, these applications must be built to exercise open-loop control by producing policies that can be configured and enforced on managed devices by DAs.¶
A DTNMA Manager resides on a managing device. This manager provides an interface between various managing applications and services and the DTNMA Agents that enforce their policies. In providing this interface, DMs translate between whatever native interface exists to various managing applications and the autonomy models used to encode management policy.¶
The DTNMA Manager performs three major functions on a managing device: policy encoding, reporting, and administration.¶
DTNMA Managers translate policy directives from managing applications and services into standardized policy expressions that can be recognized by DTNMA Agents. The following logical components are used to perform this policy encoding.¶
DTNMA Managers receive reports on the status of managed devices during period of connectivity with the DTNMA agents on those devices. The following logical components are needed to implement reporting capabilities on a manager.¶
Agents in the DTNMA must perform a variety of administrative services in support of their proper configuration and operation. This includes the following logical components.¶
A consequence of operating in a challenged environment is the potential inability to negotiate information in real-time. For this reason, the DTNMA requires that managed and managing devices operate using pre-shared definitions rather than relying on data definition negotiation.¶
The three types of pre-shared definitions in the DTNMA are the DTNMA Agent autonomy model, managed application data models, and any runtime data shared by managers and agents.¶
A DTNMA autonomy model represents the data elements and associated autonomy structures that define the behavior of the agent autonomy engine. A standardized autonomy model allows for individual implementations of DTNMA Agents, and DTNMA Managers to interoperate. A standardized model also provides guidance to the design and implementation of both managed and managing applications.¶
This section provides a description of the services provided by DTNMA elements on both managing and managed devices. These service descriptions differ from other management descriptions because of the unique characteristics of the DTNMA operating environment.¶
DTNMA monitoring is associated with the agent autonomy engine. The term monitoring implies timely and regular access to information such that state changes may be acted upon within some response time period. Within the DTNMA, connections between a managed and managing device are unable to provide such a connection and, thus, monitoring functions must be handled on the managed device.¶
Predicate autonomy on a managed device should collect state associated with the device at regular intervals and evaluate that collected state for any changes the require a preventative or corrective action. Similarly, this monitoring may cause the device to generate one or more reports destined to the managing device.¶
Similar to monitoring, DTNMA control results in actions by the agent to change the state or behavior of the managed device. All control in the DTNMA is local control. In cases where there exists a timely connection to a manager, received controls a are still run through the autonomy engine. In this case, the stimulus is the direct receipt of the control and the response is to immediately run the control. In this way, there is never a dependency on a session or other stateful exchange with any remote entity.¶
DTNMA Fusion services produce new data products from existing state on the managed device. These fusion products can be anything from simple summations of sampled counters complex calculations of behavior over time.¶
Fusion is an important service in the DTNMA because fusion products are part of the overall state of a managed device. Complete knowledge of this overall state is important for the management of the device, particularly in a stimulus-response system whose stimuli are evaluated against this state.¶
While some fusion is performed in any management system, the DTNMA requires fusion to occur on the managed device itself. If the network is partitioned such that no connection to a managing device is available, fusion must happen locally. Similarly, connections to a managing device might not remain active long enough for round-trip data exchange or may not have the bandwidth to send all sampled data.¶
DTNMA configuration services must update the local configuration of a managed device with the intent to impact the behavior and capabilities of that device. The change of device configurations is a common service provided by many network management systems. The DTNMA has a unique approach to configuration for the following reasons.¶
The DTNMA configuration service is unique in that the selection of managed device configurations must occur, itself, as a function of the state of the device. This implies that management proxies on the device store multiple configuration functions that can be applied as needed without consultation from a managing device.¶
When detecting stimuli, the agent autonomy engine must support a mechanism for evaluating whether application monitoring data or runtime data values are recent enough to indicate a change of state. In cases where data has not been updated recently, it may be considered stale and not used to reliably indicate that some stimulus has occurred.¶
DTNMA reporting services collect information known to the managed device and prepare it for eventual transmission to one or more managing devices. The creation of these reports are intelligent in that the contents and frequency of this reporting occurs as a function of the state of the managed device, independent of the managing device.¶
Once generated, it is expected that reports might be queued pending a connection back to a managing device. Therefore, reports must be differentiable as a function of the time they were generated.¶
When reports are sent to a managing device over a challenged network, they may arrive out of order due to taking different paths through the network or being delayed due to retranmissions. A managing device should not infer meaning from the order in which reports are received, not should a given report be associated with a specific control or autonomy action on a given managed device.¶
Both local and remote services provided by the DTNMA affect the behavior of multiple applications on a managed device and may interface with multiple managing devices. It is expected that transport protocols used in any DTNMA implementation support security services such as integrity and confidentiality.¶
Authorization services enforce the potentially complex mapping of other DTNMA services amongst managed and managing devices in the network. For example, fine-grained access control can determine which managing devices receive which reports, and what controls can be used to alter which managed applications.¶
This is particularly beneficial in networks that either deal with multiple administrative entities or overlay networks that cross administrative boundaries. Whitelists, blacklists, key-based infrastructures, or other schemes may be used for this purpose.¶
An important characteristic of the DTNMA is the shift in the role of a managing device. In the DTNMA, managers configure the autonomy engines on agents, and it is the agents that provide local device management. One way to describe the behavior of the agent autonomy engine is to describe the characteristics of the autonomy model it implements.¶
This section describes a logical autonomy model in terms of the abstract data elements that would comprise the model. Defining abstract data elements allows for an unambiguous discussion of the behavior of an autonomy model without mandating a particular design, encoding, or transport associated with that model.¶
Managing autonomy on a potentially disconnected device must behave in both an expressive and deterministic way. Expressivity allows for the model to be configured for a wide range of future situations. Determinism allows for the forensic reconstruction of device behavior as part of debugging or recovery efforts.¶
The DTNMA autonomy model is built on a stimulus-response model in which the autonomy system responses to pre-identified stimuli with pre-configured responses. Stimuli are identified using simple predicate logic that examine aspects of the state of the managed device. Responses are implemented by running one or more procedures on the managed device.¶
As with many such systems, behavior can be captured using the construct:¶
IF stimulus THEN response¶
DTNMA Autonomy Model¶
The flow of data into and out of the agent autonomy engine is illustrated in Figure 2. In this model, the autonomy engine stores the combination of stimulus conditions and associated responses as a set of "rules" in a rules database. This database is updated through the execution of the autonomy engine and as configured from policy statements received by managers.¶
Stimuli are detected by examining the state of applications as reported through application monitoring interfaces and through any locally-derived data. Local data is calculated in accordance with definitions also provided by managers as part of the runtime data store.¶
Responses to stimuli are run as updated to the rules database, updated to the runtime data store, controls sent to applications, and the generation of reports.¶
There are a number of ways to represent data values, and many data modeling languages exist for this purpose. When considering how to model data in the context of the DTNMA autonomy model there are some modeling features that should be present to enable functionality. There are also some modeling features that should be prevented to avoid ambiguity.¶
Traditional network management approaches favor flexibility in their data models. The DTNMA stresses deterministic behavior that supports forensic analysis of agent activities "after the fact". As such, the following statements should be true of all data representations relating to DTNMA autonomy.¶
The expressive representation of data values is fundamental to the successful construction and evaluation of predicates in the DTNMA autonomy model. This section describes the characteristics of data representation for this model, both as individual data values and ways to aggregate these values into collections.¶
There is a useful distinction that can be made regarding the way in which data values are assigned in the context of an autonomy system. This section discusses four categories of assigning strategies and proposes mnemonics to differentiate each.¶
The four categories of value assignment can be derived by determining whether values are calculated internal or external to the autonomy model and whether, once calculated, these values can be changed.¶
Immutable | Mutable | |
---|---|---|
Internally Defined | CONST | LIT |
Externally Defined | VAR | EDD |
Constants (CONST) - Constant data values are named values that are defined in the context of the autonomy model. Both the name and the value of the constant are fixed and cannot be changed. An example of a constant would be defining the numerical value PI to 2 digits of precision (PI_2_DIGITS = 3.14).¶
Literals (LIT) - Literal data values are those whose name and value are the same. These values are used to represent atomic values that are too simple to be represented a constant. For example, the number 4 is a literal value. The name "4" and the value 4 are the same and inseparable. Literal values cannot change ("4" could not be used to mean 5) and they are defined external to the autonomy model (the autonomy model is not expected to redefine what 4 means).¶
Variables (VAR) - Variables are named data values defined by the autonomy model itself. They can be added and removed as a function of the function of the autonomy model, and the autonomy model is the sole determiner of their value. An example of a variable in an autonomy model would be the number of times that a particular predicate evaluated to true.¶
Externally-Defined Data (EDD) - External data values are those provided to the autonomy model from its hosting environment. These values are the foundation of state-based autonomy as they capture the state of the managed device. The autonomy model treats these values as read-only inputs. Examples of externally defined values include temperature sensor readings and the instantaneous data rate from a radio.¶
The DTNMA autonomy model should, as required, report on the state of its managed device (to include the state of the model itself). This reporting should be done as a function of the changing state of the managed device, independent of the connection to any managing device. Queuing reports allows for later forensic analysis of device behavior, which is a desirable property of DTNMA management.¶
There are at least four useful categories of reporting mechanism that should be present in the DTNMA These categories can be distinguished by whether the reported data share a common structure or not, and whether the report mechanism represents a scheme or data adherent to that schema.¶
Schema | Values | |
---|---|---|
Common Structure | TBLT | TBL |
Mixed Structure | RPTT | RPT |
Relational database tables provide collection, filtering, and reporting efficiencies when representing series of data collections that share a common syntactic structure and semantic meaning. Tables have a fixed structure identified by one or more vertical columns. They are populated by zero or more data collections, with one row per represented data collection.¶
To the extent that DTNMA reporting includes data collections similarly adhering to a common structure, these reports can be modeled similarly to tables. Such reports are called tabular reports (TBLs).¶
Every TBL is populated in accordance to a pre-defined schema, which is termed the Tabular Report Template (TBLT). This template defines the columns that comprise the TBL and associated constraints on data values for those columns.¶
Dissimilar to relational database tables, TBLs are reporting mechanisms. They represent a report generated at a specific moment in time. Therefore, a managed device may produce and queue for transmission multiple TBLs for the same TBLT.¶
Not all reportable data collections are efficiently represented in a tabular structure. In cases where there is no processing or encoding advantage to a tabular report, a non-tabular representation is needed. This representation is termed the DTNMA report (RPT).¶
A RPT is a snapshot of a collection of data values at a given moment in time. The type, number, order, and other details of these data values is given by a schema called the Report Template (RPTT).¶
Separating the structure (RPTT) and content (RPT) of a general purpose reporting mechanism reduces the size of generated traffic, which is an important property of the DTNMA.¶
The agent autonomy engine requires that managed devices issue commands on themselves as if they were otherwise being controlled by a managing device. The ability to support this type of commanding in the autonomy model is one of the unique requirements of the DTNMA. This approach is not dissimilar to the concept of Remote Procedure Calls (RPCs) that are sometimes used in low- latency, high-availability approaches to network management mechanisms.¶
Command execution in the DTNMA happens through the use of controls and macros.¶
Controls (CTRL) - A control represents a parameterized, predefined procedure that is run by the agent autonomy engine. CTRLs are conceptually similar to RPCs in that they represent parameterized functions run on the managed device. However, they are conceptually dissimilar from RPCs in that they do not have a concept of a return code as they must operate over an asynchronous transport. The concept of return code in an RPC implies a synchronous relationship between the caller of the procedure and the procedure being called, which might not be possible within the DTNMA.¶
The success or failure of a CTRL may be handled locally by the agent autonomy engine. Otherwise, the externally observable impact of a CTRL can be understood through the generation and eventual examination of data reports produced by the managed device.¶
Macros (MACRO) - A Macro represents an ordered sequence of CTRLs execution. They may be implemented as a set of CTRLs, or as a mixed set of both MACRO and CTRL objects. Similar to CTRLs, a MACRO object should support parameterization and should not support a return code back to a caller.¶
The core function of the agent autonomy engine is to apply predetermined responses to predetermined state on a managed device. This involves the ability to calculate predicate expressions and the ability to associate the positive evaluation of these expressions with command execution.¶
There are a few instances within the DTNMA autonomy model where a value must be calculated by the model itself, to include the following.¶
In cases such as these, the DTNMA must support an efficient, configurable syntax for defining expressions, calculating the value of these expressions based on the local state of the managed device, and using the calculated value in an appropriate way.¶
Expression (EXPR) - An Expression is a combination of operators and operands used to construct a numerical value from a series of other data values in the autonomy model.¶
Operator (OP) - An Operator represents a operation performed on at least one operand and returning a single result that, itself, can be used as an operand to some other operator. OPs may represent simple (+, -) or complex (sin, avg) mathematical functions or custom functions defined for the managed device.¶
Operands may be built from any autonomy model object that can be associated with a data value, to include the CONST, LIT, VAR, and EDD types, the result of an OP, and the result of a fully evaluated EXPR.¶
Predicate Expression (PRED) - A Predicate Expression is an EXPR whose evaluated data value is interpreted in a logical way as being either true or false.¶
A stimulus-response system associated stimulus detection with a commanded response. In the DTNMA, this relationship is captured through the definition of rules. These rules may be defined as focused on either the state of the managed device or optimized to only examine how time has passed on the managed device.¶
State-Based Rules (SBRs) - A state-based rule is one whose stimulus is indicated when a given PRED evaluates to true. Since the PRED is a combination of sampled and calculated data values on the managed device, evaluation of the PRED is evaluating the relevant state of the device. A SBR is one of the form:¶
IF PRED THEN MACRO¶
Time-Based Rules (TBRs) - A time-based rule is a specialization of a SBR that is optimized to only consider the passage of time on the managed device. A TBR is one of the form:¶
EVERY interval THEN MACRO¶
Using the autonomy model mnemonics defined in Section 10, this section describes flows through sample configurations conforming to the DTNMA. These use cases illustrate remote configuration, local monitoring and control, multiple manager support, and data fusion.¶
The use cases presented in this section are documented with a shorthand notation to describe the types of data sent between managers and agents. This notation, outlined in Table 3, leverages the mnemonic definitions of autonomy model elements defined in Section 10.¶
Term | Definition | Example |
---|---|---|
EDD# | Enumerated EDD definition. | EDD1 |
V# | Enumerated VAR definition. | V1 = EDD1 + V0. |
ACL# | Enumerated Access Control List. | ACL1 |
DEF([ACL],ID,EXPR) | Define ID from expression. Allow managers in ACL to see this ID. | DEF([ACL1], V1, EDD1 + EDD2) |
PROD(P,ID) | Produce ID according to predicate P. P may be a time period (1s) or an expression (EDD1 > 10). | PROD(1s, EDD1) |
RPT(ID) | A report containing data named ID. | RPT(EDD1) |
These notations do not imply any implementation approach. They only provide a succinct syntax for expressing the data flows in the use case diagrams in the remainder of this section.¶
This is the nominal configuration of network management where a Manager interacts with a set of Agents. The control flows for this are outlined in Figure 3.¶
Serialized Management Control Flow¶
In a simple network, a Manager interacts with multiple Agents.¶
In this figure, the Manager A sends a policy to Agents A and B to report the value of an EDD (EDD1) every second in (step 1). Each agent receives this policy and configures their respective autonomy engines for this production. Thereafter, (step 2) each agent produces a report containing data element EDD1 and sends those reports back to the manager.¶
This behavior continues without any additional communications from the manager and without requiring that there exist a connection back to the manager.¶
This is a challenged configuration of network management where connectivity between Agent B and the Manager is temporarily lost. Flows in this case are outlined in Figure 4.¶
Challenged Management Control Flow¶
In a challenged network, agents store reports pending a transmit opportunity.¶
In this figure, Manager A sends a policy to Agents A and B to produce an EDD (EDD1) every second in (step 1). Each agent receives this policy and configures their respective autonomy engines for this production. Products reports are transmitted when produced (step 2).¶
At some point, Agent B loses the ability to transmit in the network (steps 3 and 4). During this time period, reports continue to be produced, but queued. This queuing might be done by the agent itself or by a supporting transport such as BPv7. Eventually, Agent B is able to transmit in the network again (step 5) and all queued reports are sent at that time.¶
The open-loop control paradigm of the DTNMA does not support a one-to-one relationship between a manager's expression of policy and an agent's reporting of the state of its managed device. This use case illustrates the concept of open-loop control. In this paradigm, agents in the network manage themselves in accordance with policies and build consolidated reports of their state.¶
This flow is shown in Figure 5, where multiple policies configured by a manager are represented in a single reporting activity from an agent.¶
Consolidated Management Control Flow¶
There is not a one-to-one mapping between management policy and device state reporting.¶
In this figure, Manager A sends a policy to Agents A and B to produce an EDD (EDD1) every second (step 1). Each agent receives this policy and configures their respective autonomy engines for this production. Reports are transmitted when produced (step 2).¶
At a later time (step 3) Manager A sends an additional policy to Agent B to also produce an EDD (EDD2) ever second. This policy is received and configured on the autonomy engine on Agent B.¶
Thereafter (step 4) Agent A will continue to produce EDD1 and Agent B will produce both EDD1 and EDD2. However, Agent B may produce these values together in a single report rather than 2 independent reports. In this way, there is no direct mapping between the single consolidated report sent by Agent B (step 4) and the two different policies sent to Agent B that caused that report to be generated (steps 1 and 3).¶
The managed applications on an agent may be controlled by different administrative entities in a network. The DTNMA allows agents to communicate with multiple managers in the network, such as cases where there exists one manager per administrative domain.¶
Whenever a manager sends a policy expression to an agent, that policy expression may be annotated with authorization information. One method of representing this is an ACL.¶
The ability for one manager to access the results of policy expressions configured by some other manager will be limited to the authorization annotations of those policy expressions.¶
An example of multi-manager authorization is illustrated in Figure 6.¶
Multiplexed Management Control Flow¶
Complex networks require multiple managers interfacing with agents.¶
In this figure, both Managers A and B send policies to Agent A (step 1). Manager A defines a VAR (V1) whose value is given by the mathematical expression (EDD1 * 2) and provides an ACL (ACL1) that restricts access to V1 to Manager A. Similarly, Manager B defines a VAR (V2) whose value is given by the mathematical expression (EDD2 * 2) and provides an ACL (ACL2) that restricts access to V2 to Manager B.¶
Both Managers A and B also send policies to Agent A to report on the values of their VARs at 1 second intervals (step 2). Since Manager A can access V1 and Manager B can access V2, there is no authorization issue with these policies and they are both accepted by the autonomy engine on Agent A. Agent A produces reports as expected, sending them to their respective managers (step 3).¶
Later (step 4) Manager B attempts to configure Agent A to also report to it the value of V1. Since Manager B does not have authorization to view this VAR, Agent A does not include this in the configuration of its autonomy engine and, instead, some indication of permission error is included in any regular reporting back to Manager B.¶
Manager A also send a policy to Agent A (step 5) that defines a VAR (V3) whose value is given by the mathematical expression ( EDD3*3). and provides no ACL, indicating that any manager can access V3. In this instance, both Manager A and Manager B can then send policies to Agent A to report the value of V3 (step 6). Since there is no authorization restriction on V3, these policies are accepted by the autonomy engine on Agent A and reports are generated to both Manager A and B over time (step 7).¶
There are times where a single network device may serve as both a manager for other agents in the network and, itself, as a device managed by someone else. This may be the case on nodes service as gateway or proxies. The DTNMA accommodates this case by allowing a single device to run both an Agent and a Manager.¶
An example of this configuration is illustrated in Figure 7.¶
Data Fusion Control Flow¶
A device can house both a Manager and an Agent.¶
In this example, we presume that Agent B is able to sample a given EDD (EDD1) and that Agent C is able to sample a different EDD (EDD2). Node B houses Manager B controlling Agent C, and also Agent B, which is controlled by Manager A. Manager A must periodically receive some new value that is calculated as a function of both EDD1 and EDD2.¶
The sequence of events that can enable this scenario is as follows. Manager A sends a policy to Agent B to define a VAR (V0) whose value is given by the mathematical expression (EDD1 + EDD2) without a restricting ACL. Further, Manager A sends a policy to Agent B to report on the value of V0 every second (step 1).¶
Agent B can requires the ability to monitor both EDD1 and EDD2. However, the only way to receive EDD2 values is to have them reported back to Node B and included in the Node B runtime data stores. Therefore, Manager B sends a policy to Agent C to reports on the value of EDD2 (step 2).¶
Agent C receives the policy in its autonomy engine and produces reports on the value of EDD2 every second (step 3).¶
Agent B may locally sample EDD1 and EDD2 and uses that to compute values of V0 and report on those values at regular intervals as well (step 4).¶
While a trivial example, the mechanism of associating fusion with the Manager function rather than the Agent function scales with fusion complexity. Within the DTNMA, Agents and Managers are not required to be separate software implementations. There may be a single software application running on Node B implementing both Manager B and Agent B roles.¶
This protocol has no fields registered by IANA.¶
Security within a DTNMA MUST exist in two layers: transport layer security and access control.¶
Transport-layer security addresses the questions of authentication, integrity, and confidentiality associated with the transport of messages between and amongst Managers and Agents in the DTNMA. This security is applied before any particular Actor in the system receives data and, therefore, is outside of the scope of this document.¶
Finer grain application security is done via ACLs which are defined via configuration messages and implementation specific.¶