Internet-Draft | NTP Extention with Khronos | September 2022 |
Rozen-Schiff, et al. | Expires 31 March 2023 | [Page] |
The Network Time Protocol version 4 (NTPv4), as defined in RFC 5905, is the mechanism used by NTP clients to synchronize with NTP servers across the Internet. This document specifies an extension to the NTPv4 client, named Khronos, which is used as a "watchdog" alongside NTPv4, and provides improved security against time shifting attacks. Khronos involves changes to the NTP client's system process only. Since it does not affect the wire protocol, the Khronos mechanism is applicable to any current or future time protocol.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 31 March 2023.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
NTPv4, as defined in RFC 5905 [RFC5905], is vulnerable to time shifting attacks, in which the attacker's goal is to shift the local time at an NTP client. See [Khronos_paper] for details. Time shifting attacks on NTP are possible even if NTP communication is encrypted and authenticated. A weaker man-in-the-middle (MitM) attacker can shift time simply by dropping or delaying packets, whereas a powerful attacker, who has full control over an NTP server, can do so by explicitly determining the NTP response content. This document introduces a time shifting mitigation mechanism called Khronos. Khronos can be integrated into NTPv4-compatible servers as an NTPv4 client's "watchdog" against time shifting attacks. An NTP client that runs Khronos is interoperable with [RFC5905]-compatible NTPv4 servers. The Khronos mechanism does not affect the wire mechanism and is therefore applicable to any current or future time protocol.¶
Khronos is a mechanism that runs in the background, continuously maintains a virtual "Khronos" clock, and compares this clock's reading to NTPv4's clock updates. When the gap between the two clocks exceeds a certain threshold (specified in Section 4), this is interpreted as the client experiencing a time shifting attack. In this case, Khronos is used to update the client's clock, and the conventional NTPv4 client time-synchronization algorithm is run in the background until the gap between the two algorithms is again below this threshold, and hence the conventional NTPv4 client algorithm is deemed safe to use again.¶
When the client is not under attack, Khronos is passive, allowing NTPv4 to control the client clock and providing the ordinary high precision and accuracy of NTPv4. When under attack, Khronos takes control over the client's clock, mitigating the time shift, while guaranteeing relatively high accuracy with respect to the UTC (error is bounded by 100 ms when using the recommended parameters) and precision, as discussed in Section 6.¶
By leveraging techniques from distributed computing theory for time-synchronization in the presence of Byzantine attackers, Khronos achieves accurate synchronization even in the presence of powerful attackers who are in direct control of a large number of NTP servers - up to 1/3 of the servers in local Khronos pool (where a local Khronos pool may consist of hundreds of servers). In contrast, NTPv4, which employs an algorithm that is not designed to withstand attacks by Byzantine servers, and, in particular, typically relies on a small subset of the NTP server pool (e.g., 4 servers) for time synchronization, is much more vulnerable to time shifting attacks. Khronos is carefully engineered to minimize the load on NTP servers and the communication overhead.¶
A Khronos client iteratively "crowdsources" time queries across NTP servers and applies a provably secure algorithm for eliminating "suspicious" responses and for averaging over the remaining responses. In each poll interval, the Khronos client selects, uniformly at random, a small subset (e.g., 10-15 servers) of a large server pool (containing hundreds of servers). To minimize the load on NTP servers and the communication overhead, the frequency of Khronos poll intervals should be much less dense than that of standard NTPv4 clock updates, e.g., the Khronos clock can be updated once every 10 NTPv4 clock updates. Khronos' security was evaluated both theoretically and experimentally with a prototype implementation. According to this security analyses, if a local Khronos pool consists of, for example, 500 servers, 1/7 of whom are controlled by a man-in-the-middle, attacker and Khronos queries 15 servers in each Khronos poll interval (around 10 times the NTPv4 poll interval), then over 20 years of effort are required (in expectation) to successfully shift time at a Khronos client by over 100 milliseconds from the UTC. The full exposition of the formal analysis of this guarantee is available at [Khronos_paper].¶
Khronos introduces a watchdog mechanism that is added to the client's system process and maintains a virtual clock value that is used as a reference for detecting attacks. The virtual clock value computation differs from the current NTPv4 in two key aspects. First, Khronos periodically synchronizes, in each Khronos poll interval, with only a few (tens) randomly selected servers out of a pool consisting of a large number (e.g., hundreds) of NTP servers, thereby providing high security while minimizing the load on the NTP servers. Second, the selection algorithm of the virtual clock uses an approximate agreement technique to remove outliers, thus limiting the attacker's ability to contaminate the "time samples" (offsets) derived from the queried NTP servers. These two elements of Khronos' design provide provable security guarantees against both man-in-the-middle attackers and attackers capable of compromising a large number of NTP servers.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].¶
Describing Khronos algorithm, the following notation is used.¶
Notation | Meaning |
---|---|
n | The number of candidate servers in Khronos pool that Khronos can query (potentially hundreds) |
m | The number of servers that Khronos queries in each poll interval (up to tens) |
w | An upper bound on the distance of the local time from any NTP server with an accurate clock (termed "truechimer" in [RFC5905]) |
Cest | The client's estimate of the time that has passed since its last synchronization with the Khronos pool (sec) |
B | An upper bound on the client's time estimation error (ms/sec) |
ERR | An upper bound on the client's error regarding its estimate of the time that elapsed from the last update, which equals to B*Cest (ms) |
K | Panic trigger - the number of Khronos pool re-samplings until reaching "Panic mode" |
tc | The current time [sec], as indicated by the virtual clock value that is computed by Khronos |
The recommended values are discussed in Section 3.3.¶
A client that runs Khronos as a watchdog uses NTPv4 as in [RFC5905] and in the background runs a modification to the elements of the system process described in Section 11.2.1 and 11.2.2 in [RFC5905] (namely, the Selection Algorithm and the Cluster Algorithm). The NTPv4 conventional protocol periodically queries p (3-4) servers in each poll interval. In parallel, the Khronos watchdog periodically queries a set of m (tens) servers from a large (hundreds) server pool in each Khronos poll interval, where the m servers are selected from the server pool at random. Based on empirical analyses, to minimize the load on NTP servers while providing high security, the Khronos poll interval should be around 10 times the NTPv4 poll interval (i.e., a Khronos clock update occurs once every 10 NTPv4 clock updates). In each Khronos poll interval the Khronos virtual clock value is compared with the NTPv4 clock value, and if the difference exceeds a predetermined value, an attack is detected.¶
Under Khronos, unless an attack is detected, only one sample from each server is used (avoiding "Clock Filter Algorithm" as defined in Section 10 in [RFC5905]). When under attack, Chornos uses several samples from each server, and executes the "Clock Filter Algorithm" for choosing the best sample from each server, with low jitter. Then, given a sample from each server, the client discards outliers by executing the procedure described in this section and the next. Then, the NTPv4 "Combine Algorithm" is used for computing the system peer offset, as specified in Section 11.2.3 in [RFC5905].¶
At the first time the Khronos system process is executed, calibration is needed. The calibration process generates a local Khronos pool of servers the client can synchronize with, consisting of n servers (up to hundreds). To this end, the NTP client executes the "Peer Process" and "Clock Filter Algorithm" as in Sections 9,10 in [RFC5905] (respectively), on an hourly basis, for 24 consecutive hours, and generates the union of all received NTP servers' IP addresses. Importantly, this process can also be executed in the background periodically, once in a long time (e.g., every few weeks/months). The servers in the Khronos pool should be scattered across different regions to make it harder for an attacker to compromise, or gain man-in-the-middle capabilities, with respect to a large fraction of the Khronos pool. Therefore, Khronos calibration is with respect to the general NTP server pool (for example pool.org), and not only with respect to the servers in the client's state or region.¶
In each Khronos poll interval the Khronos system process randomly chooses a set of m (tens) servers out of the Khronos pool of n (hundreds) servers. Khronos server polling is allowed to spread normally, similar to NTPv4. Servers which do not respond during the Khronos poll are filtered out. If less than 1/3 of the m servers are left, resampling takes place.¶
Next, out of the time-samples received from this chosen subset of servers, the lowest third of the samples' offset values and highest third of the samples' offset values are discarded.¶
Khronos checks that the following two conditions hold for the remaining samples:¶
(where w, ERR are as described in Table 1.¶
In the event that both of these conditions are satisfied, the average of the remaining samples is set to be the "final offset". Otherwise, a new subset of servers is sampled, in the exact same manner. This process ensures that the Khronos client's queries are spread across servers so as to both yield improved security against strategic and Byzantine attacks (as discussed in Section Section 4.3) and to mitigate the effect of a DoS attack on NTP servers that renders them non-responsive. This resampling process continues in subsequent Khronos poll intervals until the two conditions are both satisfied or the number of times the servers are re-sampled exceeds a "Panic Trigger" (K in Table 1), in which case Khronos enters a "Panic Mode". Note that whether the client allows panic mode or not is configurable.¶
In panic mode, Khronos queries all the servers in its local Khronos pool, orders the collected time samples from lowest to highest and eliminates the lowest third and the highest third of the samples. The client then averages over the remaining samples, and sets this average to be the new "final offset".¶
As in [RFC5905], the final offset is passed on to the clock discipline algorithm for the purpose of steering the Khronos virtual clock to the correct time. The Khronos virtual clock is then compared to the NTPv4 clock as part of the watchdog process.¶
According to empirical observations (presented in [Khronos_paper]), querying 15 servers at each poll interval (i.e., m=15) out of 500 servers (i.e., n=500), and setting w to be around 25 milliseconds provides both high time accuracy and good security. Moreover, empirical analyses showed that, on average, when selecting w=25ms, approximately 83% of the servers' clocks are at most w-away from the UTC, and within 2w from each other, satisfying the first condition of Khronos' system process. There might be congested links scenarios, where higher values, such as 1 sec, will be more appropriate.¶
Furthermore, according to Khronos security analysis, setting K to be 3 (i.e., if after 3 re-samplings the two conditions are not satisfied then Khronos enters "panic mode") is safe when facing time shifting attacks. Moreover, when setting K to 3, the probability of an attacker forcing a panic mode on a client is negligible (less than 0.000002).¶
Khronos' effect on precision and accuracy are discussed in Section 6 and Section 4.¶
Khronos repeatedly gathers time samples from small subsets of a large local Khronos pool of NTP servers. The following man-in-the-middle (MitM) byzantine attacker is considered: the attacker is assumed to control a subset of the servers in the Khronos pool and is capable of fully determining the values of the time samples gathered from these NTP servers. The threat model encompasses a broad spectrum of MitM attackers, ranging from fairly weak (yet dangerous) MitM attackers only capable of delaying and dropping packets (for example using the Bufferbloat attack) to extremely powerful MitM attackers who are in control of (even authenticated) NTP servers.¶
MitM attackers covered by this model might be, for example, (1) in direct control of a fraction of the NTP servers (e.g., by exploiting a software vulnerability), (2) an ISP (or other Autonomous-System-level attacker) on the default BGP paths from the NTP client to a fraction of the available servers, (3) a nation state with authority over the owners of NTP servers in its jurisdiction, or (4) an attacker capable of hijacking (e.g., through DNS cache poisoning or BGP prefix hijacking) traffic to some of the available NTP servers. The details of the specific attack scenario are abstracted by reasoning about MitM attackers in terms of the fraction of servers with respect to which the attacker has MitM capabilities.¶
Notably, Khronos provides protection from MitM attacks that cannot be achieved by cryptographic authentication protocols since even with such measures in place an attacker can still influence time by dropping/delaying packets. However, adding an authentication and crypto-based security layer to Khronos will enhance its security guarantees and enable the detection of various spoofing and modification attacks.¶
Khronos detects time-shifting attacks by constantly monitoring NTPv4's (or potentially any other current or future time protocol) offset and the offset computed by Khronos and checking whether the difference between the two exceeds a certain threshold (10 milliseconds by default). Unless an attack was detected, NTPv4 controls the client's clock. Under attack, Khronos takes control over the clients clock in order to prevent its shift.¶
Analytical results (in [Khronos_paper]) indicate that if a local Khronos pool consists of 500 servers, 1/7 of whom are controlled by a man-in-the-middle attacker, and 15 servers are queried in each Khronos poll interval, then succeed in shifting time at a Khronos client by even a short time (e.g., 100 milliseconds), takes many years of effort (e.g., over 20 years in expectation). See a brief overview of Khronos' security analysis below.¶
Khronos' security analysis is briefly described next.¶
Time-samples that are at most w away from the UTC are considered "good", whereas other samples are considered "malicious". Two scenarios are considered:¶
The first scenario, where there are more than 1/3 good samples, consists of two sub-cases: (i) there is at least one good sample in the set of samples not eliminated by Khronos (in the middle third of samples), and (ii) there are no good samples in the remaining set of samples. In the first of these two cases (at least one good sample in the set of samples that was not eliminated by Khronos), the other remaining samples, including those provided by the attacker, must be close to a good sample (for otherwise, the first condition of Khronos' system process in Section 3.2 is violated and a new set of servers is chosen). This implies that the average of the remaining samples must be close to the UTC. In the second sub-case (where there are no good samples in the set of remaining samples), since more than a third of the initial samples were good, both the (discarded) third lowest-value samples and the (discarded) third highest-value samples must each contain a good sample. Hence, all the remaining samples are bounded from both above and below by good samples, and so is their average value, implying that this value is close to the UTC [RFC5905].¶
In the second scenario, where the attacker controls more than 2/3 of the queried servers, the worst possibility for the client is that all remaining samples are malicious (i.e., more than w away from the UTC). However, as proved in [Khronos_paper], the probability of this scenario is extremely low even if the attacker controls a large fraction (e.g., 1/4) of the servers in the local Khronos pool. Therefore, the probability that the attacker repeatedly reach this scenario decreases exponentially, rendering the probability of a significant time shift negligible. We can express the improvement ratio of Khronos over NTPv4 by the ratios of their single shift probabilities. Such ratios are provided in Table Table 2, where higher values indicate higher improvement of Khronos over NTPv4 and are also proportional to the expected time till a time shift attack succeeds once.¶
Attack Ratio | 6 samples | 12 samples | 18 samples | 24 samples | 30 samples |
---|---|---|---|---|---|
1/3 | 1.93e+01 | 3.85e+02 | 7.66e+03 | 1.52e+05 | 3.03e+06 |
1/5 | 1.25e+01 | 1.59e+02 | 2.01e+03 | 2.54e+04 | 3.22e+05 |
1/7 | 1.13e+01 | 1.29e+02 | 1.47e+03 | 1.67e+04 | 1.90e+05 |
1/9 | 8.54e+00 | 7.32e+01 | 6.25e+02 | 5.32e+03 | 4.52e+04 |
1/10 | 5.83e+00 | 3.34e+01 | 1.89e+02 | 1.07e+03 | 6.04e+03 |
1/15 | 3.21e+00 | 9.57e+00 | 2.79e+01 | 8.05e+01 | 2.31e+02 |
In addition to evaluating the probability of an attacker successfully shifting time at the client's clock, we also evaluated the probability that the attacker succeeds in launching a DoS attack on the servers by causing many clients to enter a panic mode (and query all the servers in their local Khronos pools). This probability (with the previous parameters of n=500, m=15, w=25 and k=3) is negligible even for an attacker in control of a large number of servers in clients' local Khronos pools, and it is expected to take decades to force panic mode.¶
Further details about Khronos's security guarantees can be found in [Khronos_paper].¶
The pseudocode for Khronos' Time Sampling Scheme, which is invoked in each Khronos poll interval is as follows:¶
counter := 0 S = [] T = [] While counter < K do S := sample(m) //gather samples from (tens of) randomly chosen servers T := bi-side-trim(S,1/3) //trim the third lowest and highest values if (max(T) -min(T) <= 2w) and (|avg(T)-tc| < ERR + 2w) Then return avg(T) end counter ++ sleep(rand(0,1)*poll interval) end // panic mode S := sample(n) T := bi-sided-trim(S,1/3) //trim lowest and highest thirds; return avg(T)¶
Since NTPv4 updates the clock as long as time-shifting attacks are not detected, the precision and accuracy of a Khronos client are the same as NTPv4's when not under attack. Under attack, Khronos takes control over the client's clock, mitigating the time shift while guaranteeing relatively high accuracy (error is bounded by (Err+2w), which is 100 ms for the recommended parameters specified in Section 3.3). Khronos is based on crowdsourcing across servers, changes the set of queried servers more frequently than NTPv4 [Khronos_paper], and avoids some of the filters in NTPv4's system process. These factors can potentially harm its precision. Therefore, a smoothing mechanism can be used, where instead of a simple average of the remaining samples, the smallest (in absolute value) offset is used unless its distance from the average is higher than a predefined value Y. Setting Y to 1 millisecond, was impractically demonstrated to result with precision similar to NTPv4.¶
The authors would like to thank Erik Kline, Miroslav Lichvar, Danny Mayer, Karen O'Donoghue, Dieter Sibold, Yaakov. J. Stein, Harlan Stenn, Hal Murray and Marcus Dansarie, for valuable contributions to this document and helpful discussions and comments.¶
This memo includes no request to IANA.¶