Internet-Draft | KIRA | October 2023 |
Bless | Expires 25 April 2024 | [Page] |
This document describes the Kademlia-directed ID-based Routing Architecture KIRA. KIRA offers highly scalable zero-touch IPv6 connectivity, i.e., it can connect hundred thousands of routers and devices in a single network (without requiring any form of hierarchy like areas). It is self-organizing to achieve a zero-touch solution that provides resilient (control plane) IPv6 connectivity without requiring any manual configuration by operators. It works well in various topologies and is loop-free even during convergence. The architecture consists of the ID-based network layer routing protocol R²/Kad in its routing tier and a Path-ID-based forwarding tier. The topological independent IDs can be embedded into IPv6 addresses, so that KIRA provides zero-touch IPv6 connectivity between KIRA nodes.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 25 April 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
KIRA is a scalable zero-touch distributed routing solution that is tailored to control planes, i.e., in contrast to commonly used routing protocols like OSPF, ISIS, BGP etc., it prioritizes resilient connectivity over route efficiency. It scales to 100,000s of nodes in a single network, it uses ID-based addresses, is zero-touch (i.e., it requires no configuration for and after deployment) and is able to work well in various network topologies. Moreover, it offers a flexible memory/stretch trade-off per node, shows fast recovery from link or node failures, and is loop-free, even during convergence. Additionally, it includes a built-in Distributed Hash Table (DHT) that can be used for simple name service registration and resolution, thereby helping to realize autonomic network management and control and zero-touch deployments.¶
Please note, that is version of the Internet-Draft is not complete yet. Future versions will complete the specification.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
KIRA's main objective is to provide self-organized robust control plane (CP) connectivity on top of a link-layer topology. The CP is typically used to configure, monitor, manage, and control networked resources (switches, routers, end-systems). The goal is to never lose control over the resources as long as there exist paths leading to them. KIRA is structured into a two-tier architecture consisting of a Routing Tier and a Forwarding Tier (see Figure 1. KIRA runs the zero-touch, distributed, highly scalable, ID-based routing protocol R²/Kad in the Routing Tier to find viable paths to destinations. The core concept of R²/Kad is that it discovers paths in the underlying topology by using an ID-based overlay routing scheme combined with source routing between overlay hops. KIRA nodes employ this information to construct fast path forwarding tables in the Forwarding Tier for CP data traffic (e.g., packets from control plane applications).¶
R²/Kad employs a flat ID-based addressing scheme to easily support zero-touch operation, self-organization as well as mobility and multi-homing. ID-based routing (sometimes also denoted as name-independent routing or routing with flat identifiers) has the advantage of providing stable addresses (called NodeIDs) to upper layers. Thus, in case (virtual) resources are moved within the topology, any control connection to them stays alive. In contrast to other ID-based addressing approaches, KIRA is a genuine ID-based approach, because it does not use topological addresses at all and thus does not require any additional identifier-locator mapping (increased risk of non-consistency) and associated protocols (additional overhead and convergence time).¶
As just motivated, R²/Kad uses topologically independent NodeIDs, generated by the KIRA nodes themselves, so address assignment is performed in a distributed manner by each node autonomously. Typically, NodeIDs are taken from a 112 bit address space, but depending on the network size, smaller NodeIDs are possible. KIRA uses IPv6 packets for its messages and CP data packets, because NodeIDs can be easily embedded into an IPv6 address (e.g., 16 bit prefix + 112 bit NodeID) and existing hardware-based and software-based forwarding mechanisms can be leveraged. Mainly very basic IPv6 features like link-local addresses, the packet format, fragmentation, and neighbor discovery are used, e.g., it does not require any address configuration features or router discovery.¶
The Forwarding Tier is able to forward IPv6 packets, so that (control) applications can use IPv6 and all corresponding transport protocols above. R²/Kad messages use source routing based on NodeIDs whereas traffic in the Forwarding Tier uses NodeIDs and PathIDs for its forwarding decision. PathIDs are conceptually a hash from a sequence of NodeIDs that build a path segment. PathIDs are unique (with high probability) for a path segment. To carry PathIDs in addition to the final destination NodeID, the original IPv6 packet becomes encapsulated, e.g., using a GRE header that contains the PathID for the path segment that the packet should traverse next. Intermediate nodes simply exchange the PathID at each hop with the PathID of the remaining path segment (similar to label switching), or they strip the outer header when the next node is the end of a path segment. PathIDs are precomputed in a 2-hop vicinity of a KIRA node and are installed by R²/Kad signaling in some intermediate nodes on demand for paths longer than five hops. To forward CP packets KIRA nodes only need to perform lookups in their NodeID forwarding table and/or PathID forwarding table, perform encapsulation or decapsulation and rewrite PathIDs.¶
Routing Tier ┌──────────────────────────────────────────────────┐ ┌─────────────┐ │R²/Kad │ │ Control │ │ ┌───────────────────┐ ┌───────────┐ ┌──────────┐ │ │ Plane │ │ │- Path Discovery │ │ Routing │ │Path │ │ │ Applications│ │ │- Routing │ │ Table │ │Management│ │ │ │ │ │- Failure Recovery │ │ │ │ │ │ │ │ │ └───────────────────┘ └───────────┘ └────┬─────┘ │ │ │ │ │ │ │ │ └───▲──────────────────────────────────────┼───────┘ └─────▲───────┘ │ │ │Trans- │ ┌────────────┬─────────┘ │port R²/Kad │ Forwarding │ │ │over Messages│ Tier │ │ │IPv6 │ ┌─┬─────────▼─┬──┬───────▼─────┬─┬─────────┬─┬─────▼─────┬─┐ │ │ │NodeID │ │PathID │ │Encaps./ │ │Node Local │ │ │ │ │Forwarding │ │Forwarding │ │Decaps. │ │Forwarding │ │ │ │ │Table │ │Table │ │ │ │ │ │ │ │ └───────────┘ └─────────────┘ └─────────┘ └─▲─────────┘ │ │ │ │ │ │ │ ┌─────────────────────────────────┴─┐ │ │ │ ┌─►│ Fast Forwarding ├──┐ │ │ └─────────┼──┴───────────────────────────────────┴──┼──────┘ ▼ IPv6 ──┘ └─► IPv6 UDP+IPv6
CP applications (e.g., SDN controllers, Kubernetes Cluster Controllers, Virtual Infrastructure Manager, traditional OAM applications and so on) simply use NodeIDs as addresses for the resources/devices they want to control or for other controllers they want to exchange state with. Therefore, CP applications can transparently use the connectivity established by KIRA. Since NodeIDs are randomly generated, KIRA provides a simple built-in key/value store (Distributed Hash Table – DHT) that can be used as name service. KIRA nodes and services can dynamically register their NodeID under a certain well-known name and other KIRA nodes can lookup their corresponding NodeIDs. The DHT functionality will be specified as a separate KIRA module and corresponding application interfaces are out-of-scope for this specification.¶
KIRA nodes possess relatively small routing tables, that grow with O(log(n)), where n is the number of KIRA nodes in the network (see [KIRA-Networking-2022] for evaluation results). The advantage of small routing tables is scalability, but comes at the cost of path stretch. That is, packets to destinations that are not kept in the routing table of a node take a longer path than the shortest possible path, because they are using the ID-based overlay routing strategy. However, KIRA nodes will learn the shortest paths to all 'contacts' in their routing tables and it is a node local decision how large the routing table can be. For example, a controller node may add all KIRA nodes that it controls as contacts to its routing table. Because KIRA uses source routing in R²/Kad and PathID-based forwarding in its forwarding tier, it can easily support multi-path routing and keeping backup paths for fast failover reactions.¶
KIRA uses a mixture of reactive and periodic mechanisms to cope with link and node failures. Error messages that indicate failed links usually trigger routing updates and a path rediscovery procedure. However, routing updates are not flooded to all KIRA nodes, so some nodes may still have obsolete path information. These inconsistencies will be detected either when using the obsolete path to a contact (triggering an error message from the node before the broken link) or by a maintenance procedure that is carried out periodically. These periodic maintenance procedures test the validity of the currently known paths and may also trigger a rediscovery procedure to find alternative paths.¶
Moreover, KIRA also possesses a specific end-system mode, where KIRA nodes are part of the KIRA network, but they are not exchanging routing information and are not forwarding packets for other KIRA nodes.¶
Finally, KIRA also supports a domain concept. A KIRA node may be member of one or multiple domains. Unless configured otherwise, a KIRA node is member of the global domain with DomainID=0 by default. KIRA nodes keep their NodeID for all domains, the only difference is that routes are guaranteed to run inside a domain D in case source and destination node are both members of this domain D. This allows for using domains for administrative purposes (e.g., all KIRA nodes inside the same Autonomous System could be part of the same domain) or to use domains to build clusters of KIRA nodes that are grouped by closeness in the underlying network topology.¶
This section gives on overview of the main concepts with respect to the R²/Kad protocol operation. First KIRA's ID-wise addressing concept is introduced, then the routing table structure is presented. After that several procedures are described, beginning with node startup, vicinity discovery, and the join procedure to populate the routing table, followed by path discovery, overhearing and rediscovery mechanisms.¶
Every KIRA instance uses a single NodeID as its address. The NodeID is taken from a larger unstructured address space [0..2^B-1] (typically B=112). KIRA uses the XOR (logical exclusive or) metric in this address space to define the distance between two NodeIDs X and Y (see also [Kademlia2002]). The distance function d(X,Y)= X XOR Y is interpreted as integer and fulfills all properties of a metric (d(X,Y)>=0, and d(X,Y)=0 <=> X=Y; d(X,Y)=d(X,Y); as well as the triangle inequality d(X,Y)<=d(X,Z)+d(Z,Y)). This distance function largely corresponds to a prefix bit distance metric d_p(X,Y) = B-lcp(X,Y), where lcp(X,Y) denotes the length longest common prefix in bits. The XOR metric is finer than the d_p(X,Y) metric though, because when there are X,Y, and Z with d_p(X,Y)=d_p(Y,Z)=d_p(X,Z), the XOR metric can uniquely determine whether Y or Z are closer to X. More generally speaking, for a given distance d(X,Y)=d there exists exactly one Y so that d(X,Y)=d. This property is important since it allows to unambiguously determine which NodeID is ID-wise closer to a given other NodeID and it provides the basis for KIRA's loop-freedom. Note that the ID space with this metric is not cyclic (i.e., a node with a very small NodeID is not close to a node with a very large NodeID).¶
The XOR metric defines an overlay structure across all KIRA nodes in the ID space: KIRA nodes establish logical connections with their ID-wise closest overlay neighbors (which are typically different from the physical neighbors) with respect to the XOR metric as distance metric, i.e., the ID-wise closer neighbors have a smaller distance according to XOR. KIRA uses this metric to determine the next KIRA node that a KIRA message is forwarded to in order to reach a certain destination NodeID. R²/Kad messages are forwarded by using source routing between overlay hops.¶
In addition to NodeIDs, KIRA can use any ID from the ID space as destination address. Typically, names of objects can be hashed to result in a key value, which is called Resource ID. In this case, the ID-wise closest KIRA node will be found as responsible node for storing a value (or a referral) for this key. This makes it possible to provide an integrated DHT for name-to-NodeID registration and lookup.¶
NodeIDs are randomly generated and are taken from a 112 bit address space. Future versions of this specification will detail an algorithm to create self-certifying NodeIDs as using certain hash functions from a public key. NodeIDs are unique with high probability. However, in case two nodes possess the same NodeID, protocol mechanisms can be used to detect this situation and the conflict can be resolved by letting one side generate a new NodeID. Depending on a KIRA node's capabilities, NodeIDs (together with other protocol parameters) may be stored in non-volatile memory so that nodes keep their NodeID even after restart. Other KIRA nodes may choose to generate a new NodeID on every restart.¶
The entries in the routing table (RT) are called 'contacts'. Contact data contains the NodeID of the contact as well a set of discovered paths that lead to this contact (besides other node state and attributes). A path to a contact is stored as path vector that contains a complete sequence of NodeIDs, which can be traversed to reach the contact. Except for contacts in its routing table, a KIRA node does not know paths to other destinations, but they can be discovered by using a recursive overlay routing strategy: a KIRA node source routes a packet to the contact (using the known path) that is the ID-wise closest to the destination ID according to its routing table. The next overlay hop performs the same action until the destination node is reached. After the initial discovery phase, only PNs and some contacts from the 3-hop physical neighborhood are stored in the routing table.¶
R²/Kad's efficiency and flexibility is closely related to its routing table. It is structured as tree of k-buckets as in [Kademlia2002]. A k-bucket in the routing table contains a list of (at most) k contacts in distance between 2^i and 2^{i+1} (i.e., the bucket's range, where 0<=i<112) from this node. Usually, k>=20 is constant and the same for all buckets and nodes, but it can also be varied per node (k=40 is RECOMMENDED as default for R²/Kad). Buckets at deeper levels share more prefix bits with the node's own ID, however, buckets for small values of i are generally empty as no appropriate nodes exist in this address space. Thus, the highest bucket (depth 1) contains contacts from half of the ID space whose highest NodeID bit differs from the node's ID, whereas the deepest buckets contain all nodes that are ID-wise closest to the node (i.e., the ID-wise closest overlay neighbors).¶
If KIRA node X learns a new contact Y, it puts it into the corresponding k-bucket b_l in case it still has capacity left. The bucket index l is determined by calculating the common prefix length between X and Y (number of high-order zero bits of d(X,Y)). If the bucket contains k entries already, it is split into two new buckets (and the contained entries moved to them accordingly) in case X falls into the bucket's range. Otherwise, a selection algorithm determines whether the new contact should replace an existing entry in this bucket. In our case we use Proximity Neighbor Selection (PNS) so that contacts with shorter path lengths are preferred. In case path lengths are equal, nodes with a higher degree are preferred as this results in shorter paths. An additional mechanism for path selection improves path diversity and prevents route flapping: in case an alternative path of equal length has been discovered for an already known contact, this path replaces the previous path only if the hash sum of the path elements is closer to the node's own NodeID. Due to the uniqueness property of the XOR metric, the path selection will always unambiguously converge to a unique path. Physical neighbors are kept in special buckets that have no capacity limit, i.e., they will never be preempted. In general, routing also works without this PN buckets extension of [Kademlia2002], but the resulting stretch will be slightly higher.¶
X identifies its closest known contact in its RT by locating the k-bucket that corresponds to the longest matching prefix of destination Z with its own NodeID X by using d(X,Z). It then selects a contact with the shortest path vector from within the bucket; this is called Proximity Routing (PR). The XOR metric is used to uniquely select the closest contact if all paths have equal length or if no prefix-wise progress can be made by the bucket (e.g., it is the deepest bucket).¶
KIRA nodes generate their NodeID first (see Section 3.2). After that they start an initial discovery phase to explore their physical vicinity. After that, a join procedure and continuous discovery are periodically repeated. To let the node stay connected to its overlay and to improve the quality of discovered routes.¶
In its initial discovery phase, a KIRA node discovers its physical adjacencies, i.e., its physical neighbors (PNs) that can be directly reached via link-local communication on one of their network interfaces. KIRA nodes periodically send PNHello messages to a well-known link local multicast address ALL-KIRA-NODES, and receiving nodes may reply with a PNDiscoveryReq to set up an adjacency. This PNDiscoveryReq MUST be answered by a PNDiscoveryRsp to establish an adjacency. The protocol exchange ensures that bidirectional communication is possible between directly adjacent nodes. PNs may be put into the PNTable either after receiving a PNDiscoveryReq as answer to a transmitted PNHello or after receiving a PNDiscoveryRsp as answer to a transmitted PNDiscoveryReq. The PNTable contains a mapping from NodeID to the link local unicast address of the PN. This address is either taken from the source address of the PNDiscoveryReq or the PNDiscoveryRsp respectively.¶
KIRA nodes also discover all nodes in their 3-hop physical neighborhood to populate their routing table, but the 2-hop vicinity is fully stored in a local graph structure. The latter is used to precompute PathIDs for the 2-hop vicinity. PNDiscoveryReq and PNDiscoveryRsp messages contain a list of physical neighbors so that all PNs of PNs will be learned, i.e., the 2-hop vicinity. Additionally, all nodes in the 2-hop vicinity are queried for their PNs by using a QueryRouteReq. QueryRouteReq/QueryRouteRsp messages are used to get PNs or RTable objects from nodes in the vicinity. Depending on their NodeID, nodes from the 3-hop vicinity will be stored as contacts into the routing table.¶
A KIRA node continues to populate its routing table by sending FindNodeReq messages to certain nodes and also to randomly chosen destinations (i.e., a randomly chosen ID from the NodeID space, a KIRA node does not need to exist for the chosen ID, the FindNodeReq will simply end at the KIRA node that is ID-wise closest to the destination ID). FindNodeRsp messages return a set of contacts that are ID-wise closest to the destination NodeID from the viewpoint of the responding KIRA node. The returned information is analyzed whether it can improve the local routing table, e.g., new and 'better' contacts or 'better' paths to already known contacts.¶
As result of the vicinity discovery, all PNs of X and some nodes within its 3-hop radius will populate X’s RT. However, in order to get network connectivity and to contribute to connectivity, the node needs to find its ID-wise closest overlay neighbors and make itself known to them. Thus, to join the network KIRA node X simply "searches" for the k closest nodes to its own NodeID: X sends a FindNodeReq for its own NodeID X and the closest neighbor replies with FindNodeRsp. This is repeated with a limited exponential backoff in order to detect or heal any network partitioning. In case node X finds itself in situations where it needs to respond with "Dead End" Error messages to FindNodeReqs, it resets the backoff timer again, because it may be a hint for network partitioning or other inconsistencies. In order to let a joining node X quickly learn all existing ID-wise closest overlay neighbors, X sends a QueryRouteReq to every newly learned contact that enters X’s currently deepest k-bucket. The queried contact replies with a QueryRouteRsp returning its RT entries for the k closest contacts to node X. The so returned contacts will very likely also fall into X’s deepest bucket, possibly leading to a further split of its deepest bucket. Therefore, X will quickly populate its set of ID-wise closest overlay neighbors, which are needed for consistent overlay connectivity.¶
Consider the exemplary topology in Figure 2. Assume node X needs to send a message to node Z. In case Z is a known contact of X, a path vector is stored already in the routing table that can be used for strict source routing in order to reach Z. Otherwise, a path to Z must be discovered using ID-based overlay routing. R²/Kad uses a recursive version of Kademlia (hence its name Routing with Recursive Kademlia – R²/Kad). The Path Discovery procedure uses a request/response message pair, FindNodeReq/FindNodeRsp. In this example, assume that X identifies its contact Y (learned from PN A) as next (ID-wise closest) overlay hop toward Z. In order to discover a path to Z, X creates a FindNodeReq message that contains destination NodeID Z and source route r=⟨X, A, Y⟩ using path vector ⟨A⟩ of contact Y. The FindNodeReq is forwarded along r (strict source route). Message forwarding between overlay neighbors requires source routing, because the path in the underlay may lead via nodes that are (ID-wise) further away from the destination (e.g., Y routes via ⟨A, Q, M⟩ to Z in Figure 2): using the ID-based overlay routing scheme on a hop-by-hop basis (i.e., between directly adjacent nodes in the underlay) would inevitably lead to forwarding loops in most cases.¶
Y / X--A--Q--M--Z \ / B----/
When the FindNodeReq arrives at Y, the same procedure is repeated (since it is a recursive variant of Kademlia). Node Y tries to find a contact closer to Z than Y itself. If this contact exists, source route r is appended by the corresponding path vector and the FindNodeReq is forwarded to this contact. In the given example of Figure 2), we assume that Y knows Z as its contact with path vector ⟨A, Q, M⟩. It extends source route r of the FindNodeReq by ⟨A, Q, M, Z⟩ and forwards it to A as next hop in the source route. If routing information has been converged, this ID-based routing scheme guarantees progress in the ID space [Kademlia2002] during forwarding and eventually finds node Z.¶
The destination node Z responds with a FindNodeRsp message along the reversed source path with any cycles removed (⟨Z, M, Q, A, X⟩). Due to XOR’s symmetry, the responding node Z also learns the new contact X as neighbor. The FindNodeRsp returned to X not only provides a path to Z, but also a list of k closest contacts to Z together with their path vectors. This list is used to improve X’s routing table.¶
In case that the FindNodeReq arrives at Y and it cannot find a contact closer to Z, the FindNodeReq is terminated at Y and a response is sent back depending on the "ExactFlag" Section 4.4.1 in the FindNodeReq. If the ExactFlag was not set, Y sends a FindNodeRsp back to originator X that contains an RT excerpt of Y’s (at most) k closest contacts to Z in a so called RTable object. This enables finding the responsible node for a destination ID Z if used as object key. The latter allows for so called key-based routing that is used to realize DHTs. If exact was set, X assumed that a node with ID Z must exist, but the current node is the ID-wise closest node to Z and does not know Z as contact. Consequently, the node cannot forward the FindNodeReq closer to Z and returns a "Dead End" Error message (which may happen occasionally during convergence). In case source route r contains a broken link or unreachable node, a "Segment Failure" Error message will be sent back to X along the reversed source route.¶
KIRA nodes use overhearing mechanisms for R²/Kad messages. This is an important mechanism for KIRA to learn new contacts and better or improved paths.¶
Nodes that forward R²/Kad messages SHOULD use the contained source route to improve their own routing information: they may learn new contacts or shortcut routes to known contacts. However, only the so far traversed path is considered as it can be assumed that all traversed links worked recently. Additionally, NotViaList information (see Section 4.4.2.3) is used to invalidate contacts that have an active path vector containing a link from the NotViaList.¶
The source routing path of incoming requests and responses is also considered for improving the RT. Some messages like FindNodeRsp, QueryRouteRsp or UpdateRouteReq contain RTable objects that are evaluated likewise.¶
Bypassing QueryRouteRsp messages contain RTable objects (as requested by QueryRouteReq) and are inspected for interesting contacts and paths.¶
Periodic Path Probing aims at reliably detecting any RT inconsistencies (e.g., seemingly valid contacts with paths that contain recently failed links). Each node periodically checks the path validity for all of its contacts by sending a ProbeReq message to them. ID-wise closest neighbors are probed more often than other contacts and those recently contacted (≤ 2s) are not probed. In case a path has a link or node failure, the ProbeReq will elicit a "Segment Failure" Error message from an intermediate node along the broken path, notifying about the failed link. The contact’s state will be set to invalid and a rediscovery process is scheduled (see Section 3.9.1).¶
In order to improve R²/Kad’s robustness against link or node failures we introduce a recovery procedure that notifies about failures and actively tries to find alternative paths that route around the failure. This procedure is highly robust and achieves a fast convergence. R²/Kad nodes detect link and node failures of PNs by link layer notifications, missing PNHello or PNDiscoveryRsp messages as well as "Segment Failure" errors anytime during forwarding along source routes. To recover from such failures, R²/Kad’s recovery procedure uses the following mechanisms:¶
The ID-based overlay routing scheme is used for rediscovery of a route, because NodeIDs are randomly distributed all over the underlying topology. Therefore, a rediscovery uses different paths that are likely not affected by the failure. However, if overlay nodes still have obsolete routing information, i.e., they would normally route via the failed link, they can detect the need to update their routes as well by seeing the more current NotViaList information.¶
A node X that detects its PN B (cf. Figure 2) or the corresponding link ⟨X, B⟩ has failed, reacts as follows (unless isolated by that failure):¶
Since UpdateRouteReqs have notification character only, they do not create any responses (even no error messages if dropped). The rediscovery process simply sends a FindNodeReq for all invalid contacts (all invalid contacts will be ignored in finding the next hop). This FindNodeReq for rediscovery (also denoted as rediscovery message) contains a set ExactFlag and the failed link ⟨X, B⟩ as additional NotViaList information. It is sent to X’s currently known ID-wise closest neighbors of the invalid contact (e.g., A in the example), which will then try to forward the FindNodeReq further toward the failed contact. The NotViaList information avoids that nodes use obsolete routing information when forwarding the rediscovery message, i.e., paths that contain the failed link will not be used for forwarding. Node A may not have heard yet about the broken link and thus will invalidate contact B if its prior preferred path is via ⟨X, B⟩. In order to ensure that only current NotViaList information is considered, every link contained in the NotViaList is also accompanied by a related age value ∆T, specifying in milliseconds how long ago the sender heard about the failed link. In case a FindNodeRsp is returned by B, a valid path has been discovered and the contact’s state is set to valid (triggering subsequent UpdateRouteReqs with the new path as mentioned before).¶
Nodes receiving UpdateRouteReqs or FindNodeReqs containing the failed link also set their corresponding affected contacts to invalid and trigger a rediscovery process of the routes (like Z in the example). The actual rediscovery messages are sent after different randomly chosen waiting times from an interval [0.5t_p, 1.5t_p]. The mean value t_p is set as follows: for invalidated PNs 100ms, affected ID-wise near contacts (in the deepest buckets) 500ms, for contacts affected by the failure of a link to a PN 1s and for all other affected contacts 2s. Rediscovery messages are sent simultaneously to two different (overlay) neighbors of the affected contact at a time, until k neighbors have been tried unsuccessfully to rediscover a path to the currently invalid contact. In the latter case, a new round of rediscovery attempts will be initiated with exponential backoff until a certain limit of retry rounds (default: 6) have been made without any success, after which the contact will be deleted. Although there is no guarantee that a viable alternative route can be found, our simulation results show that connectivity is very quickly restored after a failure even in drastic failure scenarios (i.e., where a larger part of links, such as 15%, fail simultaneously and randomly).¶
Node B at the other end of the failed link ⟨X, B⟩ also tries to rediscover X and thus sends an UpdateRouteReq to its ID-wise closest overlay neighbors (e.g., A). Thereby, it may inform A as well as X about a new alternative route via M.¶
R²/Kad uses state sequence numbers and aging to prevent obsolete routing information from spreading or settling. Messages carry routing information in an RTable object that contains a list of contacts n_j , and for each contact n_j the corresponding path vector p_j leading from the reporting node to the contact, its state sequence number s_j and the age ∆T_j of this information. The currentness of contact information can always be assessed by s_j. However, s_j alone does not suffice to assess the currentness of the associated path to this contact as intermediate links may have been failed/repaired. Therefore, each reported path, as well as NotVia links, carry an associated age value ∆T_j ≥ 0 that corresponds to the time period when the path information was updated last at the originating node. This avoids spreading and wrongfully accepting obsolete routing information. The age for physical links of a node is always 0ms, because related information is always current at this node. A node simply sets a "last modification" timestamp t_j for the contact n_j to t_j := t_now − ∆T_j and reports n_j’s age as t_now-t_j in messages with RTable objects (e.g., UpdateRouteReq, FindNodeRsp, QueryRouteRsp). A contact’s timestamp t_j is also updated by messages that allow to infer that the traversed path is current, e.g., incoming ProbeReq, ProbeRsp , FindnodeRsp, QueryRouteReq, and QueryRouteRsp messages. A path is updated only if the contact’s state sequence number is larger than the prior known sequence number for this contact, or, in case of equal sequence numbers, the received path information must be more recent when comparing their age values. Since age values are relative, they can be compared even if they stem from different nodes, i.e., synchronized clocks are not required.¶
The previously described mechanisms cannot guarantee notification of all affected nodes about link failures in their path vectors. In order to reliably detect such inconsistencies, each node periodically probes the paths to all its contacts as described in Section 3.8. Nevertheless, if a node tries to use an obsolete path with a failed link, a viable path will be rediscovered immediately after receipt of the Error message from the node before the broken link.¶
A potential drawback of R²/Kad is its use of source routing to forward between two overlay hops. Handling a (potentially long) list of source routing hops is currently not as efficiently realized as regular destination-based routing. Moreover, source routing increases per-packet overhead. To forward data packets more efficiently, the Forwarding Tier (see Figure 1) leverages an approach similar to label switching, whereas the Routing Tier still uses source routing for R²/Kad messages to remove cycles, detect shortcuts, and so on. Every source routing path (that consists of NodeIDs) to a contact is represented by up to two PathIDs that correspond to path segments. A PathID is a hash value of all NodeIDs along the corresponding path segment, e.g., PathID(⟨A, Q, M, Z⟩)=H(A|Q|M|Z). It serves as unique label for the path segment. The uniqueness is an important distinction from common label switching approaches where nodes assign labels of node local scope. It enables KIRA nodes to distributedly compute a set of PathIDs in advance. This avoids explicit path setup signaling for PathID installation in many cases. Only for paths longer than 5 hops PathID mappings have to be installed in some intermediate nodes. Another feature of PathIDs is their automatic aggregation toward a sink, i.e., paths that merge in a certain node and use the same residual path to a destination use the same PathID. The Forwarding Tier uses IPv6 GRE [RFC7676] to carry PathIDs in addition to source and destination NodeIDs (other encapsulation methods, e.g., using segment routing [RFC8754] are possible and can be defined later).¶
In detail, KIRA implements fast forwarding as follows:¶
1.[X,H(Q|M|Z)] 2.[X,H(Z|C|E|T)] 1.[X,H(Z|C|E|T)] 3.[X,S] 2.[X,S] | | 1.[X,H(E|T)] | | 2.[X,S] | | | X --> A --> Q --> M --> Z --> C --> E --> T | | | | | | | 1.[X,S] | | 1.[X,H(C|E|T)] | | 2.[X,S] | 1.[X,H(M|Z)] | 2.[X,H(Z|C|E|T)] | 3.[X,S] | 1.[X,H(A|Q|M|Z)] 2.[X,H(Z|C|E|T)] 3.[X,S]
Each node computes all PathIDs for its 2-hop vicinity to avoid path setup signaling, because it allows all nodes to assume that PathIDs exist for all source paths of length ≤3 hops. PathID precomputation for the full 2-hop vicinity provides a good trade-off between the number of a priori computed PathIDs and required path setup signaling. Intermediate nodes along a source route may not have computed the necessary PathIDs for others. Nodes explicitly setup paths via PathSetupReq only for paths >5 hops. In the example of Figure 3, only nodes A and Z must install additional forwarding states when receiving a PathSetupReq, because all other nodes have precomputed the PathIDs already. The PathSetupReq is answered with a PathSetupRsp by the node that marks the beginning of the second path segment. The forwarding states are implemented by soft states: contact probing also refreshes any required PathIDs in the intermediate nodes and so called "foreign" entries (i.e., those that are neither locally used nor precomputed) are deleted after three refresh intervals have passed without any refresh.¶
The routing information from the Routing Tier is used by the Forwarding Tier to generate two forwarding tables inside each node: one based on the calculated PathIDs and one based on NodeIDs (generated from RT). One can employ common longest prefix matching for both tables. For NodeIDs the matching prefix length corresponds to the bucket depth. Thus, required prefix length is typically much shorter than the full length of the NodeIDs. The PathID forwarding table size comprises at least all stored contacts, but it is usually larger due to the number of precomputed and foreign entries.¶
This will be specified in future versions of this draft.¶
This section defines the message syntax and node behavior for the R²/Kad protocol.¶
R²/Kad messages use the IPv6 packet format and are sent between KIRA nodes by using link-local addresses of the respective interfaces as source address and corresponding unicast or multicast addresses as destination address.¶
An R²/Kad message MUST be sent in the body of an IPv6 UDP datagram, with network-layer hop count set to 1, destined to the well-known KIRA multicast address or to an IPv6 link-local unicast address. Both the source and destination UDP port are set to a well-known port number. An R²/Kad packet MUST be silently ignored unless its source port and destination is the well-known R²/Kad port (to be allocated by IANA, use 19219 for experimentation). It MAY be silently ignored if its destination address is a global IPv6 address.¶
R²/Kad messages consist of a common header and an optional serious of type-length-value (TLV) encoded protocol objects. A single R²/Kad message is limited in its size by the maximum length of an IPv6 payload minus the UDP header size of 8 bytes, because IPv6 fragmentation can be used between two R²/Kad nodes. Larger message payloads can be transferred by using R²/Kad fragmentation.¶
R²/Kad messages use CBOR encoding [RFC8949] for the individual message fields. The lengths in the message specifications do not reflect the size after CBOR encoding on the wire. However, the final message after CBOR encoding must fit into the UDP payload (fragmentation of larger messages will be defined in later versions).¶
Messages consisting of multiple fields or objects are denoted by their Message Name and the message content enclosed in { }. The specification (L) after a field name indicates the length in bits of the corresponding field, (..) denotes a variable length. Optional fields are enclosed in [ ]. Optional repetitions of field or objects are denoted by ... after the field. Specific types are separated by a : after the message field name, e.g., enum is the typical enumeration type, bitfield is a bitfield specification where the assigned value denotes the number of the corresponding bit.¶
The overall message format consists of a common header that MUST be present in every message and a sequence of optional protocol objects that immediately follow the common header.¶
R²/Kad Message { Common Message Header, [Protocol Object ...] }
The common message header has a fixed size and is present in every R²/Kad message.¶
Common Message Header { Version (8), Message Type (8), Flags (8), Message Length (16), Destination ID (112), Source NodeID (112), DomainID (64), MessageID (64), State Sequence Number (16), Source Node Degree (16) }
This field indicates the message type. Requests are odd numbers, Responses or Indications are even numbers.¶
Message Type : enum { PNHello = 0x01, PNDiscoveryReq = 0x03, PNDiscoveryRsp = 0x04, FindNodeReq = 0x09, FindNodeRsp = 0x0a, QueryRouteReq = 0x0b, QueryRouteRsp = 0x0c, UpdateRouteReq = 0x11, ProbeReq = 0x21, ProbeRsp = 0x22, Error = 0x70, PathSetupReq = 0x81, PathSetupRsp = 0x82, PathTearDownReq = 0x83, PathTearDownRsp = 0x84 }¶
Flags : bitfield { ExactFlag = 0, EndSystemFlag = 1, Reserved = 2..13, DiagnosticFlag = 14, Reserved = 15 }¶
Every Protocol Object starts with a common object header and has a specific content.¶
Protocol Object { Common Object Header (24), Object Contents (..) }
The common object header contains the type and the length of the following object. The Object Length excludes the Common Object Header, so a length of 0 indicates that no further object contents follows.¶
Common Object Header { Object Type (8), Object Length (16) } Object Type : enum { Source Route Object = 0x01, NotViaList Object = 0x02, ContactList Object = 0x03, RTable Request Type Object = 0x04, RTable Object = 0x05, RTable Update Info Object = 0x06 }
This object contains a source route in form of a list of NodeIDs and an index that points to the current NodeID when receiving and to the next NodeID when sending a message.¶
Source Route Object { Common Object Header, Index (10), NodeID (112) ... }
In case Index points behind the end of the list of present NodeIDs, a parameter problem error message MAY be sent back to the previous hop. Typically, when forwarding an R²/Kad message, the Index pointer is advanced to the next entry in the NodeID list. In case the last node of the list received this object and the final destination has not been reached, it will append a path to the existing list that leads to the next overlay hop.¶
NotViaList Object { Common Object Header, Failed Link List : LinkListType ... } LinkListType = { NodeID_1 (112), NodeID_2 (112), AgeInfo (32) }¶
A LinkList is a sequence of NodeID pairs plus the AgeInfo value. AgeInfo specifies the age of this information in milliseconds.¶
ContactList Object { Common Object Header, Contact List : Contact Entry Type ... } Contact Entry Type = { NodeID (112), StateSequenceNumber (16), AgeInfo (32), NodeDegree (16) }¶
The list of contacts contains for each entry a NodeID, the corresponding known StateSequenceNumber, an AgeInfo that specifies how old the contact info is (time since last updated) and the known node degree.¶
RTable Request Type Object { Common Object Header, RTable Request Type (8), Radius (8) } RTable Request Type : enum { None = 0x00, ContactsOnly = 0x01, OverlayNeighbors = 0x02, OverlayNeighborsSource = 0x03, PNVicinity = 0x04 }¶
The following Request Types can be used: None will not return any Routing Table information. This is useful in case a FindNodeRsp should only report the source route back. ContactsOnly reports only contacts without their paths. OverlayNeighbors reports the ID-wise closest contacts to the destination ID of the FindNodeReq. OverlayNeighborsSource reports the ID-wise closest contacts to the source NodeID of the FindNodeReq. PNVicinity requests the physical neighbors within the given radius. Radius specifies the number of entries to be returned and SHOULD be set to the bucket size k by default. A value of 0xff means to return the full routing table (for any request type other than None).¶
RTable Object { Common Object Header RTableLength (16), RTableEntry : RTableEntryType ... } RTableEntryType = { NodeID (112), Path : PathVectorType (..), StateSequenceNumber (16), AgeInfo (32), NodeDegree (16), [NodeAttributes Object], [PathAttributes Object], [LinkAttributes Object] ... } PathVectorType = { PathLength (16), NodeID (112) ... }¶
The RTable contains a list of routing table entries. The list is preceded by RTableLength that provides the number of the following entries. For each entry the NodeID of the contact is given, the path from the reporting node to the NodeID as sequence of NodeIDs as well as the corresponding known StateSequenceNumber, an AgeInfo that specifies how old the contact info is (time since last updated) and the known node degree. Optional attributes for the node, the path or individual links along the path follow. The PathVectorType is basically a counter that specifies the number of the following node IDs.¶
RTable Update Info Object { Common Object Header RTableLength (16), RTableUpdateEntry : RTableUpdateEntryType ... } RTableUpdateEntryType = { NodeID (112), Path : PathVectorType (..), StateSequenceNumber (16), AgeInfo (32), NodeDegree (16), RouteUpdateAction (8), [NodeAttributes Object], [PathAttributes Object], [LinkAttributes Object] ... } RouteUpdateAction : enum { Announce = 0x00, WithDraw = 0x01, Change = 0x02, Unreachable = 0x03 }¶
The RTable Update Info Object is similar to the RTable Object (Section 4.4.2.6) contains a list of routing table entries with associated an update action. Announce means that the corresponding contact is a new entry in the routing table. WithDraw means that the contact has been deleted from the routing table. Change means that the path to the contact has been changed. Unreachable means that the contact is not reachable.¶
Note: further objects will be detailed in future versions of this draft¶
This sections describes the R²/Kad messages.¶
PNHello messages are periodically sent (randomized) to each interface to indicate presence of a KIRA node. Other nodes that want to establish an adjacency SHOULD respond with a PNDiscoveryReq after a randomized waiting time. At node startup or when a new link comes up a PNHelloMinInterval of 200ms is used per link. The sending interval for subsequent PNHello messages is doubled up to PNHelloMaxInterval of 30s. The sending interval is reset to PNHelloMinInterval for a link that comes up after it failed.¶
PNHello { Common Header }
A PNDiscoveryReq is either sent as response to a PNHello message or sent to test liveness of an already known PN or to resynchronize state with a PN. The sending node fills its currently known PNs into the PNList on first contact or each time its state sequence number has changed.¶
PNDiscovery Request Message { Common Header, [ PNList : ContactList ] }
A PNDiscoveryRsp is sent as answer to a previous PNDiscoveryReq. The sending node fills its currently known PNs into the PNList on first contact or each time its state sequence number has changed.¶
PNDiscovery Response Message { Common Header, [ PNList : ContactList ] }
Detailed description TBD.¶
FindNode Request Message { Common Header, Routing Table Request Type, Source Route Object, NotViaList Object }
Detailed description TBD.¶
FindNode Response Message { Common Header, Source Route Object, NotViaList Object, [RTable Object] }
Detailed description TBD.¶
QueryRoute Request Message { Common Header, Routing Table Request Type, Source Route Object, NotViaList Object }
Detailed description TBD.¶
QueryRoute Response Message { Common Header, Source Route Object, NotViaList Object, [RTable Object] }
Detailed description TBD.¶
UpdateRoute Request Message { Common Header, Source Route Object, NotViaList Object, RTable Update Info Object }
Detailed description TBD.¶
Probe Request Message { Common Header, Source Route Object }
Detailed description TBD.¶
Probe Response Message { Common Header, Source Route Object }
Detailed description TBD.¶
Error Message { Common Header, Source Route Object, Error Type (8), Additional Error Information (..) } Error Type : enum { NoError = 0x00, NodeUnreachable = 0x01, MalformedMessage = 0x02, ParameterProblem = 0x03, HopLimitExceeded = 0x04, SegmentFailure = 0x05, PathIDUnknown = 0x06, RouteFailureDeadEnd = 0x0a }
Detailed description TBD.¶
Path Setup Request Message { Common Header, Source Route Object }
Detailed description TBD.¶
Path Setup Response Message { Common Header, Source Route Object }
Detailed description TBD.¶
Path TearDown Request Message { Common Header, Source Route Object }
Detailed description TBD.¶
Path TearDown Response Message { Common Header, Source Route Object }
KIRA uses hash functions in various contexts. The used hash function is SHAKE256 with 128bit length output.¶
This memo currently includes no request to IANA yet. This may change in the future.¶
There are various attacks that need to be considered. Future versions of this draft will have more detailed security considerations.¶
KIRA has been developed as joint work with Zoran Despotovic, Artur Hecker, and Martina Zitterbart. Hendrik Mahrt and Paul Seehofer are still contributing to KIRA's evolution.¶