Internet-Draft | semantic-sdn-mom | August 2022 |
Bellavista, et al. | Expires 4 March 2023 | [Page] |
Industrial networks pose unique challenges in realizing a communication substrate on the shop floor. Such challenges are due to strict Quality of Service (QoS) requirements, a wide range of protocols for data exchange, and highly heterogeneous network infrastructures. In this regard, this document proposes a framework for QoS-enabled semantic routing in industrial networks. Such a framework aims at providing loosely-coupled, asynchronous communications, fine-grained traffic management (delivery semantics and flow priorities), and in-network traffic optimization.¶
This note is to be removed before publishing as an RFC.¶
Source for this draft and an issue tracker can be found at https://github.com/fglmtt/draft-bellavista-semantic-sdn-mom.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 4 March 2023.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
This Internet Draft defines a framework for Quality of Service (QoS)-enabled semantic routing in industrial networks. The term "semantic routing" refers to a form of routing based on additional semantics other than mere IP addresses [I-D.draft-farrel-irtf-introduction-to-semantic-routing-03]. Along with the semantics carried in packet headers, such routing may also depend on policy coded in, configured at, or signaled to network devices. A network device is an element that receives/transmits packets and performs network functions on them, such as forwarding, dropping, filtering, and packet header (or payload) manipulation, among others. Network devices may operate in, above, and below the network layer.¶
The framework described in this draft uses the overlay networking to provide a semantic routing substrate that operates both at the application and network level.¶
At the application level, the framework consists of Message-Oriented Middleware (MOM) and Application Gateways (AGWs). The MOM allows decoupling senders and receivers, sorts messages in topics of interest, and provides delivery semantics (e.g., at most once, at least once, and exactly once). The AGWs sit nearby industrial machines that are not natively compliant with the protocols the framework relies on. For example, some legacy industrial machines may not even support IP-based communications. It is worth mentioning that the typical lifetime of industrial equipment is 10 to 15 years (even longer sometimes), and in many cases, the software cannot be updated due to manufacturers' policy. Accordingly, AGWs translate the plethora of (proprietary) protocols that coexist on the shop floor towards the one(s) used by the framework.¶
At the network level, the framework combines two paradigms: Software-Defined Networking (SDN) [RFC7426] and In-Network Processing (INP) [ZILBERMAN2019], [PORTS2019]. Although the MOM enables critical features in message dispatching, it does not control how packets flow through network devices along routing paths. This is where SDN comes in. Specifically, the SDN controller computes optimal routes to meet the QoS requirements and configures network devices accordingly. The term INP refers to executing end-host programs within network devices. Such INP-enabled network devices operate at a line rate, processing packets as they traverse them without increasing the overall network load. Given that the SDN controller holds a network-wide view, it also knows which network devices support INP and which do not. The SDN controller may redirect flows towards target INP-enabled network devices based on the processing functions they provide.¶
The objectives that the framework targets are the following:¶
The remainder of this draft is structured as follows. First, Section 2 details the target scenario. Then, Section 3 provides the requirements of the target scenario. Next, Section 4 presents the principles and design guidelines of the framework. Lastly, Section 5 depicts the architecture of the framework and Section 6 proposes protocols to support it.¶
Traditionally, a shop floor includes industrial machines, Programmable Logic Controllers (PLCs), and Human-Machine Interfaces (HMIs). Typically, industrial machines are equipped with sensors and actuators, PLCs control manufacturing processes, and human operators interact with and receive feedback from industrial machines through HMIs. In such legacy industrial networks, the message dispatching was primarily oriented to monitor operational- and safety-related machine parameters.¶
Nowadays, the shop floor has become more articulated due to the advent of the Industrial Internet of Things (IIoT). On the one hand, IIoT devices enable business-critical services (e.g., predictive maintenance) cost-effectively. On the other hand, they dramatically increase overall network traffic volume, infrastructure heterogeneity, and cyber security threats.¶
The heterogeneity is not only about the industrial equipment itself but also in how such equipment disseminates information. The plethora of (proprietary) protocols that machines use to exchange data makes machine-to-machine communications challenging.¶
Additionally, the shop floor may include dynamic industrial equipment (e.g., automated guided vehicles) that communicate on the move. Such dynamic equipment may abruptly migrate communications across different access points according to the physical location at a given time.¶
Therefore, modern industrial environments stress the network infrastructure more than traditional ones, where network traffic was fairly limited to mission-critical information generated by fixed network equipment.¶
In fulfilling current industrial guidelines for cyber security (e.g., IEC 62443 [IEC62443]), the industrial topology should consist of several shop floor subnets and a control room subnet. Figure 1 depicts an industrial topology compliant with such guidelines.¶
Note that:¶
The network devices interconnecting the subnets form the industrial network backbone. The outcome is a multihop multipath topology providing point-to-point connections with differentiated performance.¶
The framework described in this document targets the scenario depicted in Figure 1. The framework components (i.e., MOM, AGW, SDN, and INP controllers) run within the control room subnet. Note that also other services may run in the control room subnet along with them. Typical examples are the Manufacturing Execution System (MES) and the Enterprise Resource Planning (ERP).¶
The transition from traditional to modern industrial environments raised critical communications challenges exposed in Section 2. In this regard, it is worth remarking that industrial machines typically have long lifetimes (decades), high costs (millions of USD), and restrictive manufacturers' policies in place (e.g., to prevent firmware updates). Accordingly, the communications substrate should face such challenges by fulfilling additional requirements.¶
First, non-mission-critical and mission-critical traffic should be distinguished. Typically, non-mission-critical flows (e.g., monitoring of vibrations) are more massive than mission-critical ones (e.g., alerting human operators about dangerous events), thus the former may easily take network resources at the expense of the latter. This requires per-flow traffic management, ranging from flow prioritization (mission-critical flows go first, then non-mission-critical ones) to data aggregation and filtering to reduce the traffic traversing the network. Since the industrial control typically runs cyclically in millisecond level, the control traffic, especially the mission-critical traffic, demands high QoS in terms of latency, jitter, and extremely low packet loss ratio.¶
Second, the industrial communication demands high reliability. The telecommunication equipment deployed in the Internet typically guarantee the reliability to 99.99%. However, the industrial systems need to be much more reliable, from 99.9999% to 99.99999%, in order to reduce the downtime of the production line. It requires the industrial network to equip extra measures to support it.¶
Third, machine-to-machine communications should be enabled straightforwardly, notwithstanding the plethora of (proprietary) dialects that coexist at the shop floor level, which enables the interoperability of different shop floor devices. This requires connectors to translate such dialects towards a common one and metadata to express the semantics. Intermediate nodes may use semantics to process packet payloads according to the information they carry. For example, an intermediate node may average a given number of consecutive temperature values (data aggregation) rather than drop values of little application interest (data filtering).¶
Lastly, machines should keep communicating on the move without affecting overall performance. For example, an automated guided vehicle may move from a shop floor subnet to another. By doing so, the vehicle changes the WiFi access point (i.e., SGW) used to access the network. As a result, the flows sent out by such a vehicle need to be rescheduled accordingly. This requires not only to reconfigure network devices dynamically, but also to do so in compliance with other flows already in place.¶
In this context, edge computing plays a crucial role in enabling the design and implementation of novel distributed control functions with parts that are hosted on the edge nodes located in the production plant premises and close to the controlled sensors/actuators, primarily to increase reliability and decrease latency. In the following, we discuss a framework for QoS-Enabled Semantic Routing in Industrial Networks capable of synchronizing several entities in a simplified manner via a unique logical configuration interface ("Northbound interface").¶
Future industrial networks will be characterized by an unprecedented degree of heterogeneity and complexity. Traditional solutions, mainly based on the direct interconnection of machines one to each other and machines towards the control room, cannot provide the required degree of flexibility. This leads to exploring novel solutions to manage the deluge of data generated by IIoT devices and provide QoS-driven network (re)configuration.¶
By considering the momentum of MOM as an enabler of the Industry 4.0 vision, we believe it will become a pillar of future industrial ecosystems. Although it enables critical features to facilitate message dispatching independent from actual machine location, it does not control how packets flow through middle network devices along the routing path. In fact, once a message is sent from a broker to a consumer (or vice versa, from a producer to a broker), the path the message traverses is beyond the MOM's control. However, the ability to dictate the behavior of middle network devices is essential to satisfy stringent QoS requirements. This is where the SDN paradigm comes in.¶
The SDN controller eases configuration and management of network devices, which act as the (distributed) communication substrate between the machines and the MOM. In addition, the SDN controller provides network-wide abstractions to define and enforce fine-grained network policies.¶
At the top level, the MOM identifies the destination nodes a message should be dispatched, along with the delivery semantics (e.g., at most once, at least once, or exactly once) to be applied. At the bottom level, AGWs deployed close to machines act as intermediaries between the machines (and the plethora of protocols they speak) and the MOM. In the middle level, the SDN controller exploits its network-wide view to (re)configure the network devices according to the QoS requirements.¶
Based on the MOM-SDN interplay, network devices can be properly configured:¶
For example, by considering two traffic flows between the MOM broker and a machine, proper routing table management allows to forward traffic flows tagged as "mission-critical" via a large-bandwidth low-latency path (if available). Besides, traffic flows tagged as "not-urgent" may be delayed, where the magnitude of the imposed delay may also depend on the current level of network saturation. Finally, an INP-enabled network device may exploit the semantics about the carried data to provide content-based message management. For instance, it is possible to forward packets only if they satisfy a given rule, e.g., if they carry temperature values greater than a given threshold, or to apply functions to send pre-processed values, e.g., sending only one packet with the average temperature resulting from a series of received temperature values. Note that content-based message management enables decisions on what is carried within packet payloads rather than only on packet headers (mere forwarding). However, since payload inspection and manipulation may introduce additional delays, content-based message management should be enforced as much as possible but without burdening mission-critical traffic flows.¶
From a functional point of view, the INP level sits atop the data forwarding level. As in the case of SDN deployment, we do not argue that all the network devices should be INP-enabled. Instead, we promote a pragmatic approach where legacy and novel solutions cooperate effectively. Since the SDN controller holds a network-wide view, it knows which network devices offer INP and which do not. Therefore, traffic can be optimally handled by maximizing INP (e.g., routing of packets carrying values that can be averaged towards network devices providing that aggregation function) while ensuring QoS requirements.¶
The proposed architecture, mostly working at the application layer, adopts the typical SDN approach by identifying two main areas: Control Plane and Data Plane. In the Control Plane, the following components are deployed: the MOM controller, interacting with the MOM broker; the In-Network Processing (INP) controller, managing the INP units; the SDN controller, controlling network elements; and the Gateway controller, managing the many application gateways deployed in the environment. The Data Plane consists of the implementation of the MOM, the INP units, the SDN-enabled network elements, and the Gateway components.¶
Each component has different duties and responsibilities:¶
Figure 2 depicts a schematic of the entire infrastructure. Dashed paths between controller entities in the control plane (Protocol E), and between control and data planes represent the management/configuration data exchanges that are logically separate from data flows (Protocols A, B, C, D). Data flows start from the Gateways (connected to the machinery via the machine-specific protocols) and are sent through the SDN Component, which traverses the entire platform.¶
The proposed platform can be seen as an integration of several software architectures in a unique system capable of interacting with them in a uniform and controlled way. In this draft, we omit our specific implementation of D and E protocols, and we ask the RFC community for possible implementations capable of satisfying each step's necessities and requirements. Although certain interfaces can be easily implemented using standard de facto protocols, for instance, Protocol B can be found in Open Networking Foundation, "OpenFlow Switch Specification", Version 1.5.1, October 2015, https://opennetworking.org/wp-content/uploads/2014/10/openflow-switch-v1.5.1.pdf, and Protocol C can be The P4 Language Consortium, "P416 Language Specification", Version 1.2.1, June 2020, https://p4.org/p4-spec/docs/P4-16-v1.2.1.html, the others interfaces remain open issues and must be implemented as ad-hoc solutions.¶
Section 5 provided an overview of the architectural components and their links. This section proposes a custom message header (see Section 6.1) that gateways should attach to messages sent by machines that are not natively compliant with the protocols the framework relies on and describes the protocol between the Gateways and the Gateway Controller (see Section 6.2).¶
DataHeader +-----------------------+ | flowId: int16 | | machineId: int16 | | machineSerial: string | +-----------------------+¶
Each component that receives a new unforeseen DataHeader sends it to its controller and waits for the routing/processing/flow rules to be set.¶
Protocol A - From Gateway Controller to Gateways +-----------------------------+ | header: DataHeader | +-----------------------------+ | crud: 2bit | | ttl: uint32 | | ipFrom: ipAddr | | ipTo: ipAddr | | destTopic: string | | semanticDelivery: 3bit | | machineProtocol: string | | machineUrl: string | | pollingInterval: int8 | | geoPosition: geoURI-RFC5870 | | applicationType: string | +-----------------------------+¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
While this Internet Draft is not primarily focused on addressing security issues, it is of paramount importance to provide some security considerations. In particular note that since the proposed solution should be adopted in industrial environments, possible security threats could cause not only issues related to the IT domain, such as service unavailability and data leak, but also to the OT domain, thus also including potential impact to the safety of human operators. To this purpose, we consider of paramount importance (and push for) the adoption of best practices in terms of security and safety of industrial environments and thus we advise the application of the IEC 62443 family standard as a prerequisite for the deployment of the proposed solution. In addition, by focusing on the proposed solution we recognize that while it is suitable to maximize the QoS of higher priority industrial applications, it should not be achieved to the total detriment of lower priority industrial applications, whose packets should be anyway delivered.¶
This document has no IANA actions.¶
TODO acknowledge.¶