Internet-Draft | BGP Blockchain | March 2023 |
McBride, et al. | Expires 7 September 2023 | [Page] |
A variety of mechanisms have been developed and deployed over the years to secure BGP including the more recent RPKI/ROA mechanisms. Is it also possible to use a distributed ledger such as Blockchain to secure BGP? BGP provides decentralized connectivity across the Internet. Blockchain provides decentralized secure transactions in a append-only, tamper-resistant ledger. This document reviews possible opportunities of using Blockchain to secure BGP policies within a domain and across the global Internet. We propose that BGP data could be placed in a blockchain and smart contracts can control how the data is managed. This could create a single source of truth, something for which blockchains are particularly well suited.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 7 September 2023.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
There have been many proposed solutions to help secure the Border Gateway Protocol (BGP) [RFC4271] including securing TCP, CoPP, IPSec, Secure BGP, Route Origination Validation (ROV), BGPSec along with many variations. Could we also use Distributed Consensus Systems (DCS) such as Blockchain to secure BGP? This document provides a review of how such DCSs could be used to secure BGP particularly as supplements to existing solutions. Many of the proposals can be extended to any routing protocol but the focus here is with BGP. The potential attractiveness of adding DCS capabilities to BGP is that it adds additional security without changes to the BGP protocol. Blockchain for BGP proposals are out of band to BGP, similar to RPKI, and not suggesting new encodings. This analysis does not consider external factors such as the energy demands of deploying such solutions.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].¶
Smart contracts are programs (state machines), executed within a DCS, that run when predetermined conditions are met. These contracts are executed automatically without an intermediary's involvement. Smart contracts may be used in financial, real estate, etc environments to automatically trigger predefined agreements between parties. A DCS implements a smart contract in the form of a distributed state machine, i.e., actions over a pool of information, where distributed DCS nodes maintain the evolving state information over time, utilizing proof techniques, such as proof-of-work, proof-of-stake, and others, to ensure consensus over the latest valid information pool (and thereby the latest state of the smart contract). In popular Blockchain systems, this information pool is represented by the longest blockchain that can be retrieved from the system by a client, i.e., representing the current consensus among the DCS nodes being queried by the client.¶
With this in mind, we can now describe a simple BGP DCS as one consisting of N miners, which implement the distributed consensus for a desired smart contract, utilizing a suitable proof technique for the consensus. A DCS may implement more than one smart contract, representing, e.g., different BGP capabilities as outlined later in Section 3.¶
In addition, there are M clients inserting transactions into the system. Those transactions relate to the desired smart contract or may be retrievals of the latest valid consensus information.¶
Clients and miners may be different entities or they may the same, whereby in the latter case M=N.¶
The figure below outlines a simple BGP DCS architecture, with BGPs providing clients to the DCS system.¶
In our context of BGP, we can see actions over BGP information, such as BGP origins, routing policies or others, as smart contracts over which distributed consensus needs to be achieved; Section 3 elaborates on those examples. Through using such smart contracts (over BGP information), a DCS for BGP would avoid BGP human configuration errors or hijacks as common threats for BGP, instead storing transaction information in the DCS where the consensus here represents the latest valid BGP information.¶
In terms of trust assumptions, a DCS for BGP may require authentication to prevent fraudulent DCS transactions, such as fraudulent BGP announcements being made. For this, the existing RPKI system could be used to authorize any client before sending suitable smart contract transactions into the DCS. If not using RPKI, the DCS would need to check a separate IRR prefix/AS database, if one were to exist, in order to validate incoming transactions on the main DCS before executing them; such separate IRR database could be realized as a DCS itself. Furthermore, ROA entries could be added to the DCS as secure transactions and those transactions would be relied upon by route validators as authoritative. Perhaps DCS validation information could be added as a new ROA field.¶
In terms of openness of the system, a permissioned system would restrict both clients and miners to, e.g., AS owners, through suitable verification steps upon joining the DCS. A permissionless realisation, on the other hand, could more widely distribute the BGP origin information, still relying on the detection of fraudulent announcements through the above steps before executing a transaction.¶
A key requirement for realizing a suitable DCS for BGP is the latency requirement for achieving consensus, i.e., retrieving the latest valid information from the DCS. This requirement will need reflection in choosing the appropriate proof technique for consensus.¶
In the next section, we list several opportunities for using DCS in BGP by expressing those opportunities in smart contract language, i.e., allowing for being formulated as a distributed state machine with a distributed information pool representing the latest valid state of the system.¶
As we will show, having a blockchain as part of the bgp control plane is probably not a good thing due it’s low transaction speed. However, having blockchain integrated into BGP may be useful to show proof of ownership of network assignments (addr, prefix, AS, etc). BGP policy may be a good area of focus for blockchain including address delegation contracts, billing, ARIN/RIPE databases, etc.¶
There are various ways DCSs could be used in the context of BGP that we will explore in this section, keeping in mind the questions of the previous section.¶
BGP origin information is at the heart of BGP to ensure reachability in the global Internet, while preventing any fraudulent announcement of a BGP origin is an additional security aspect in providing this global reachability.¶
Announcements (of BGP origins) here represent smart contracts in a DCS, amending a distributed state (the BGP routing table), while securing those transactions prevents fraudulently doing so.¶
For anomaly detection purposes, we could further secure BGP origin information by comparing what's in a BGP blockchain table against what's in the BGP table or the forwarding table. Additional reliance upon BGP blockchain table could potentially help prevent high frequency updates from causing routing disruptions.¶
This is very similar to the previous aspect whereas BGP origin may not just be announced but updated, represented through a different state machine to manipulate the distributed BGP information in the DCS.¶
And according to RIPE labs, BGP route updates tend to converge globally in a few minutes. The propagation of newly announced prefixes happens almost instantaneously, reaching 50% visibility in under 10 seconds. Prefix withdrawals take longer to converge and generate nearly 4 times more BGP traffic, with the visibility dropping below 10% after approximately 2 minutes.¶
Although a DCS will likely not help with BGP updates, withdrawals may be completed faster than in existing BGP systems.¶
Furthermore, networking innovations that link DCS operations, like its ledger diffusion, more directly to emerging network capabilities, as suggested in [IIC_whitepaper], may improve the DCS' transaction completion latency and thereby provide a suitable alternative even for update operations. This provides an opportunity for more research and testing.¶
In addition to the prefix to AS match information being stored in the DCS, the routing policy of those routes could also be stored as part of the DCS information. As long as the policy was correctly added to the chain, the path policies cannot be altered except by those authenticated to do so.¶
The DCS information could also be used to store configuration files within an AS in order to prevent malicious config tampering and to prevent misconfiguration.¶
This protection could be provided within a private, i.e., permissioned, DCS where only authorized users have access to the DCS data. This could also be used within a trusted external peering environment to build a distributed database of BGP files such as communities for use between BGP neighbors. Peers can use the DCS data to understand the necessary peering relationship and act on the communities in a consistent manner.¶
BGP stores multiple paths to a destination in the BGP table. The BGP table contains all of the routes from all of the neighbors. Only the best route gets installed in the routing table. To help further secure the BGP table, all of those routes/paths could be installed in a DCS. Some mechanism could be used to validate these routes/paths, that reside in the DCS, prior to one being selected as the path in the routing table. This could also be extended to provide proof of transit across certain expected paths.¶
BGP-LS is used to provide BGP topology information to a Controller. That topology information could be added to a DCS to ensure that the topology data is not compromised. PCEP, or other protocol, could be used by a controller to validate any update of a BGP forwarding table using this same (or separate) DCS. The latest forwarding rules would be maintained in a DCS, which is built using BGP-LS data and authorized users as an input. Without the proper credentials it would be very difficult to update the forwarding rules in the DCS and a record would be kept with all update attempts.¶
Furthermore, the DCS could be permissisoned, thereby restricting the nodes holding as well as accessing information to trusted members of the community.¶
The attractiveness of DCS applications, such as Bitcoin and Ethereum, are that they are highly decentralized and more resistant to attack. This has opened the way for securing monetary transactions using crytocurrencies and their underlying blockchain technology.¶
Blockchains mining power, however, is centralized with mining pools concentrating within certain regions and Autonomous Systems. This also creates a more centralized routing situation which could become vulnerable to BGP vulnerabilities where IP addresses of the mining pools are hijacked. Therefore helping to further secure BGP will help to secure blockchain's centralized mining pools, creating a circular dependency where the use of blockchains in BGP will in turn secure blockchains themselves.¶
Let us now discuss the challenges arising from operating a DCS, particularly when seen in the context of being utilized for BGP opportunities, such as those outlined in the previous section. Here, we can identify three key aspects, namely the latency for convergence (Section 4.1), the associated communication costs for achieving consensus (Section 4.2), and finally the working on inconsistent state during an ongoing convergence process (Section 4.3). We discuss each of those aspects in more detail in the following respective subsections.¶
In order to understand the aspect of latency better, a bit of background is needed on how a DCS generally working (the reader is referred to more detailed material in references such as [Nakamoto2008][Howard2014][Aspnes2022] for more details).¶
Key to achieving a distributed consensus (over some agreed information, such as BGP route changes) is the distribution of the information to the peers participating in the DCS. The majority rule, formulated by von Newman [Newman1956], states for being able to consent over the information, it needs distribution to at least N/2 peers with N being the size of the set of peers in the DCS; this distribution is often referred to as diffusion in order to result in a majority rule.¶
Given the overlay nature of DCSs, this diffusion is realized by peer-side unicast replications, limited in number of peers to which each peer diffuses the information. Systems, such as Ethereum, rely on default configurations for the number of peers to diffuse to; expected to be 50 at the time of writing this document. Each receiving peer further distributes the information itself to another set of peers.¶
Those sets of peers maintained at each peer are randomized through mechanisms further detailed in, e.g., [Guzman2022]. The result is an iterative diffusion process, creating an increasing number of peers per iteration to which the information is distributed.¶
When visualizing this process, as done in Figure 2, we can see that this process, bounded at the top by the desired majority of peers, is bounded on the time axis by the convergence latency t_c, after which the desired majority has been reached; it is important to note that convergence, however, is not guaranteed and heavily dependent on the random nature of each individual peer-initiated diffusion.¶
This latency t_c is crucial for the overall system to operate on ensured, i.e., consistent state, and largely depends on three key factors: (i) the number of peers in the DCS (ii), the number of required iterations, and (iii) the latency for each iteration to happen for a given configuration for the diffusion size. For (i), the specific application of the DCS is largely a factor. With large DCS systems, such as Ethereum, this number reaches beyond 500k peers. If we assumed at least one peer per AS for many of our BGP examples, the number of peers would also exceed 100k as per latest AS numbers for the Internet. Item (iii) is a function of communication latency and thus mainly depend on the nature of peer distribution and its resulting network latency, while item (ii) is harder to bound since it depends on the randomization of the individual peer-initiated diffusions. Ideally, each peer shall distribute the information to unique peers, not served by any other peer. Given the lack of central coordination of the individual diffusions, however, peers may well receive information multiple times from differing peers in reality, thus not progressing the goal of reaching a majority rule but prolonging t_c instead.¶
In systems, such as Ethereum, the diffusion and validation of information can thus reach from about 1 minute to ca. 50 minutes in 90 percent of the cases [Pacheco2022] (Table 4). Translated onto our BGP usages, this would lead to long periods of needing to work on inconsistent state, an aspect being discussed in Section 4.3.¶
The approach to increase the number of peers to which each peer diffuses is one dependent on the costs for each individual peer diffusion. Existing systems, such as Ethereum, limit the number to a relative small number of 50 by default; we discuss next some of the reasons for doing so.¶
The main takeaway is that the convergence latencies in large-scale DCSs may cause significant issues with a number of BGP functions where working on consistent state is crucial.¶
While much has been reported on the computational costs for blockchain and DCS technologies in general [Drusinsky2022], only recently has the communication costs (and its underlying causes) been studied [Guzman2022]. Key here is the randomization of the set of peers (at each peer) to diffuse information to. For this, each peer maintains a pool of peers as a combination of outgoing (i.e., actively sought connections) and incoming (i.e., passively received connections from other peers) relationships. For each such relation, a process of reachability checks, transport establishment, and capability exchange is performed, all while a constant refreshing of the (limited size) pool is ongoing, resulting in a constant evacuation of peers from the pool.¶
As quantified in [Guzman2022], this has a significant impact on the number of connection requests made by each peer as well as the costs in terms of transferred but yet not used data during the relation establishment. Further, the time to determine the desired number of peers may take several minutes (on average 20 for the Ethereum studies in [Guzman2022]), and thus defines the minimal time a peer may need to wait until a single diffusion step can be finished; once an initial pool has been built, refreshment will happen faster, however, yet constant refreshing continues.¶
The insights provided in [Guzman2022] well explain the limits imposed on diffusion steps, such as 50 in Ethereum systems, since the pool maintenance process even for those fairly small sizes create already significant costs per peer. Although the overall DCS platform costs `merely' for the diffusion of information to create the desired consensus and thus consistent state are subject to system-level studies, initial work such as [Guzman2022] may provide the base data as input. Extrapolating from this pure peer-centric available data at this stage, however, lets us expect a very significant communication cost for an Internet-scale DCS that could serve BGP.¶
The main takeaway is that communication costs for a large-scale DCS may be significant and thus may pose a challenge for building a large-scale DCS at the right cost point for all AS providers to participate.¶
As outlined in Section 4.1, achieving consensus in a DCS, and thus consistent state over which to operate the desired application, experiences significant latencies, often in the range of many minutes for large-scale systems. For this reason, various DCS platforms have devised methods to work on inconsistent state, akin to operating on a constantly amended ledger during an open accounting period in a company.¶
Here, information that is in the process of diffusion is partially operated over in so-called 'proof' operations to generate temporary 'truths', representing inconsistent yet agreed upon state until the consistent state is available. In order to avoid collusion in generating false truths, methods such as 'proof of work' [Nakamoto2008] or 'proof of stake' [Dimitri2022], where the trust in the truth is derived from the commitment to, e.g., spend significant costs on computational operations over the incomplete state, which in turn is responsible for much of the reported cost factor of popular DCS systems, in particular for cryptocurrencies.¶
The main takeaway is that while this cost itself is something that needs consideration for using for a system like BGP, it is to be considered if the notion of working on inconsistent state is acceptable for those largely independent AS operators that partake in the overall BGP operation.¶
Considering our challenges in the previous section in the context of BGP, it seems desirable that any DCS for BGP operations shall experience a minimal convergence latency, while keeping communication but also computational costs at minimum, with the latter often stemming from the proof methods arising from needing to operate on inconsistent state durin the long convergence latency period.¶
In the following, we sketch some ideas at the level of network technologies that may help addressing these challenges for making the usage of a DCS for BGP more palatable, constrasting against current methods of purely endpoint-centric techniques with its drawbacks well observed in [Guzman2022].¶
When looking closer at the workings of a DCS, one can map the interactions (i.e., inserting information, diffusing for consensus and retrieving the consensus result) onto those of a service-based systems. Here, peers can be seen as invoking those services at other peers of the overall DCS. More formally, we can identify the insertion service I, the diffusion service D and the query service Q in a typical DCS. While I and Q are invoked only to a set of peers with no responses given to I and responses gathered in Q (with the largest returned information set representing the current consented information), the diffusion service D is invokved recursively.¶
While specific DCS platforms use particular endpoint-initiated methods realizing those interaction patterns, such as for discovering other peers (and thus services) and diffusing any new information, as investigated in [Guzman2022], one possible improvement to those endpoint-centric methods is to utilize insights for network-supported service routing instead.¶
The proposal in [I-D.trossen-rtgwg-rosa] postulates an approach to route over service addresses, where those addresses (unlike routing identifiers) are used to steer the traffic to the appropriate network location, particularly for scenarios in which more than one network location choice is possible (in the case of the DCS, peers executing the services above). This idea well maps onto the working of a DCS, when mapping the specific services onto the above introduced distinct service addresses, while amending the anycast forwarding behaviour outlined in [I-D.trossen-rtgwg-rosa] with a diffusion-based on.¶
With such approach, the endpoint-centric replications, together with the frequent churn of relations as observed in [Guzman2022], would be replaced with a service announcement based approach for discovery of other peers. The resulting in-network diffusion retains the desired random nature of the diffusion. Although we must still assume churn for those peers of a DCS that may leave during the lifetime of the DCS operations, buty the churn observed in [Guzman2022] is caused through the replenishment of the diffusion pool, which is now replaced with the in-network diffusion over a (ephemerally) stable set of service, i.e., peer, announcements to the ROSA network.¶
Furthermore, the lower cost for in-network replication (due to the missing churn for the discovery process) may allow for larger diffusion sets being used, thus possibly reducing the overall convergence latency - although a deeper analysis in those costs would be required for a possible bounding of the diffusion size and thus the possible minimal convergence latency in dependence of the overall number of DCS peers.¶
Deployment-wise, the ROSA approach also aligns with the overlay approach used for a DCS today in that the 'ROSA domain' [I-D.trossen-rtgwg-rosa] forms an overlay across possibly many network domains, thus allows for separating the role of DCS provider (which may operate the ROSA domain) from that of the network operators used for bit transfer.¶
Another network-level technology that may improve on the efficacy of a DCS is that of 'compute-aware traffic steering' (https://datatracker.ietf.org/wg/cats/about/). This recently approved effort in the IETF foresees the use of computational information for steering traffic in various places of the network (including, possibly, the application itself). Computational information here may include capabilities of service resources, e.g., max connectivity speed or HW capabilities such as GPU availability, but also dynamic information, e.g., on server CPU load, available memory.¶
With this, currently deployed traffic steering decisions, mainly relying on network information, could be supplemented with such computational information and thus allow for sending traffic not to the shortest (network) path but the, e.g., least loaded compute resource instead.¶
In the context of a DCS, such computational awareness could complement the aforementioned service routing capabilities in that the peers chosen for diffusion (of information) may be further constrained by computational capabilities (e.g., diffusing to least loaded peers) as well as static capabilities (e.g., diffusing to well-connected peers when retrieving relatively large, often TB, blockchains). This overall may further improve latencies by choosing `appropriate' resources instead of randomly chosen ones only. One key route of investigation is the avoidance of collusion, here for instance through announcing fake computational capabilities and metrics for `attracting' more traffic in a DCS than a purely random diffusion would result in.¶
This document discusses the use of distributed consensus system (DCS) techniques to complement and further secure BGP overall.¶
Although no specific recommendation on solutions is made, this document aims at providing first insights to think more broadly on a DCS-based infrastructure that may further enhance the capabilities of BGP as a key protocol for the Internet. The authors are convinced that a distributed ledger, such as blockchain, is not appropriate to be used as part of the BGP control plane but can be useful to provide additional security to BGP route information.¶
N/A¶
Blockchains have inherent authentication through the use of public-private keys. Any action that changes the state of the blockchain ledger requires a signature, which authenticates the entity (only someone with the private key could have created the signature). If you need some method of relating a blockchain address to a real-world entity, then that is something that would need to be added-on. But any blockchain solution should take advantage of the inherent authentication provided by the use of public keys.¶
If the smart contract is only checking membership in the authorised set, then the users would have the capability to perform many actions beyond what they should. Accidental errors (or compromised accounts) could lead to harm. A secure blockchain system will place as much of the logic controlling/restricting access in the code of the smart contract itself as possible as this is the least corruptible part of the system.¶
To apply this to BGP, it could be possible to use another thing that blockchains do very well: namely assigning individual owners to resources. NFTs gets a lot of deserved ridicule for the associated hype and unethical behaviour, but the technology allows a verifiable single source of ownership to be determined. This is something that a PKI cannot do. It is possible to have multiple conflicting chains of certificates signed (e.g., through error or attack). The natural application of blockchains to BGP would be to consider prefixes as tokens assigned to AS blockchain addresses. The unique owner of any prefix could be determined with high confidence. This, plus the signing of peering relationships by the relevant ASes, could solve a lot of the problems with fraudulent announcements. If the smart contract is written correctly (big if, obviously), then it would be impossible for any entity to announce a route they were not authorised to.¶
There could be new blockchain related attacks that BGP would experience if blockchain were to be added into BGP's policy system. These attacks include trying to replace the trusted chain with a fradulent chain. We will explore some of those here or in a new draft.¶