This section outlines a probing policy suitable for unilateral adoption by any recursive resolver.
Following this policy should not result in failed resolutions or significant delay.¶
A recursive resolver implementing this draft must set system-wide values for some default parameters.
These parameters may be set independently for each supported encrypted transport, though a simple implementation may keep the parameters constant across encrypted transports.¶
Table 1:
recursive resolver system parameters per encrypted transport
Name |
Description |
Suggested Default |
persistence
|
How long should the recursive resolver remember successful encrypted transport connections? |
3 days (259200 seconds) |
damping
|
How long should the recursive resolver remember unsuccessful encrypted transport connections? |
1 day (86400 seconds) |
timeout
|
How long should the recursive resolver wait for an initiated encrypted connection to complete? |
4 seconds |
This document uses the notation E-foo
to refer to the foo
parameter for the encrypted transport E
.¶
For example DoT-persistence
would indicate the length of time that the recursive resolver will remember that an authoritative server had a successful connection over DoT
.¶
This document also assumes that the resolver maintains a list of outstanding cleartext queries destined for the authoritative resolver's IP address X
.
This list is referred to as Do53-queries[X]
.
This document does not attempt to describe the specific operation of sending and receiving cleartext DNS queries (Do53) for a recursive resolver.
Instead it describes a "bolt-on" mechanism that extends the recursive resolver's operation on a few simple hooks into the recursive resolver's existing handling of Do53.¶
Implementers or deployers of DNS recursive resolvers that follow the strategies in this document are encouraged to report their preferred values of these parameters.¶
To follow this guidance, a recursive resolver MUST implement at least one of either DoT or DoQ in its capacity as a client of authoritative nameservers.¶
A recursive resolver SHOULD implement the client side of DNS-over-TLS (DoT).
A recursive resolver MAY implement the client side of DNS-over-QUIC (DoQ).¶
DoT queries from the recursive resolver MUST target TCP port 853, with an ALPN of dot
.
DoQ queries from the recursive resolver MUST target UDP port 853, with an ALPN of doq
.¶
While this document focuses on the recursive-to-authoritative hop, a recursive resolver implementing these strategies SHOULD also accept queries from its clients over some encrypted transport (current common transports are DoH or DoT).¶
The recursive resolver SHOULD keep a record of the state for each authoritative server it contacts, indexed by the IP address of the authoritative server and the encrypted transports supported by the recursive resolver.¶
Each record should contain the following fields for each supported encrypted transport, each of which would initially be null
:¶
Table 2:
recursive resolver state per authoritative IP, per encrypted transport
Name |
Description |
Retain Across Reset |
session
|
The associated state of any existing, established session (the structure of this value is dependent on the encrypted transport implementation). If session is not null , it may be in one of two states: pending or established
|
N |
initiated
|
Timestamp of most recent connection attempt |
Y |
completed
|
Timestamp of most recent completed handshake |
Y |
status
|
Enumerated value of success or fail or timeout , associated with the completed handshake |
Y |
resumptions
|
A stack of resumption tickets (and associated parameters) that could be used to resume a prior successful connection |
Y |
queries
|
A queue of queries intended for this authoritative server, each of which has additional status early , unsent , or sent
|
N |
last-activity
|
A timestamp of the most recent activity on the connection |
N |
Note that the session
fields in aggregate constitute a pool of open connections to different servers.¶
With the exception of the session
, queries
, and last-activity
fields, this cache information should be kept across restart of the server unless explicitly cleared by administrative action.¶
This document uses the notation E-foo[X]
to indicate the value of field foo
for encrypted transport E
to IP address X
.¶
For example, DoT-initiated[192.0.2.4]
represents the timestamp when the most recent DoT connection packet was sent to IP address 192.0.2.4.¶
Note that the recursive resolver should record this per-authoritative-IP state for each IP address it uses as it sends its queries.
For example, if a recursive resolver can send a packet to authoritative servers from IP addresses 192.0.2.100
and 192.0.2.200
, it should keep two distinct sets of per-authoritative-IP state, one for each source address it uses.
Keeping these state tables distinct for each source address makes it possible for a pooled authoritative server behind a load balancer to do a partial rollout while minimizing accidental timeouts (see Section 3.1).¶
In designing a probing strategy, the recursive resolver could record its knowledge about any given authoritative server with different strategies, including at least:¶
- the authoritative server's IP address,¶
- the authoritative server's name (the NS record used), or¶
- the zone that contains the record being looked up.¶
This draft encourages the first strategy, to minimize timeouts or accidental delays.¶
A timeout (accidental delay) is most likely to happen when the recursive client believes that the authoritative server offers encrypted transport, but the actual server reached declines encrypted transport (or worse, filters the incoming traffic and does not even respond with an ICMP port closed message).¶
By associating state with the IP address, the recursive client is most able to avoid reaching a heterogenous deployment.¶
For example, consider an authoritative server named ns0.example.com
that is served by two installations (with two A
records), one at 192.0.2.7
that follows this guidance, and one at 192.0.2.8
that is a legacy (cleartext port 53-only) deployment.
A recursive client who associates state with the NS
name and reaches .7
first will "learn" that ns0.example.com
supports encrypted transport.
A subsequent query over encrypted transport dispatched to .8
would fail, potentially delaying the response.¶
By associating the state with the authoritative IP address, the client can minimize the number of accidental delays introduced (see also Section 4.3.1 and Section 3.1).¶
When a recursive resolver discovers the need for an authoritative lookup to an authoritative DNS server using IP address X
, it retrieves the records associated with X
from its cache.¶
The following sections presume that the time of the discovery of the need for lookup is time T0
.¶
If any of the records discussed here are absent, they are treated as null
.¶
The recursive resolver must know to decide whether to initially send a query over Do53, or over any of the supported encrypted transports (DoT or DoQ).¶
Note that a resolver might initiate this query via any or all of the known transports.
When multiple queries are sent, the initial packets for each connection can be sent concurrently, similar to "Happy Eyeballs" ([RFC8305]).
However, unlike Happy Eyeballs, when one transport succeeds, the other connections do not need to be terminated, but can instead be continued to establish whether the IP address X
is capable of corresponding on the relevant transport.¶
For any of the supported encrypted transports E
, if either of the following holds true, the resolver SHOULD NOT send a query to X
over Do53:¶
-
E-session[X]
is in the established
state, or¶
-
E-status[X]
is success
, and (T - E-completed[X]) < persistence
¶
Otherwise, if there is no outstanding session for any encrypted transport, and the last successful encrypted transport connection was long ago, the resolver sends a query to X
over Do53.
When it does so, it inserts a handle for the query in Do53-queries[X]
.¶
When a successful response R
is received in cleartext from authoritative server X
for a query Q
that was sent over Do53, the recursive resolver should:¶
But if R
is unsuccessful (e.g. SERVFAIL
):¶
If any E-session[X]
is in the established
, the recursive resolver SHOULD NOT initiate a new connection to X
over any other transport, but should instead send a query through the existing session (see Section 4.5.8).
FIXME: What if there's a preferred transport, but the established
session does not correspond to that preferred transport?¶
Otherwise, the timer should examine and possibly refresh its state for encrypted transport E
to authoritative IP address X
:¶
When resources are available to attempt a new encrypted transport, the resolver should only initiate a new connection to X
over E
as long as one of the following holds true:¶
-
E-status[X]
is success
, or¶
-
E-status[X]
is fail
or timeout
and (T - E-completed[X]) > damping
, or¶
-
E-status[X]
is null
and E-initiated[X]
is null
¶
When initiating a session to X
over encrypted transport E
, if E-resumptions[X]
is not empty, one ticket should be popped off the stack and used to try to resume a previous session.
Otherwise, the initial Client Hello handshake should not try to resume any session.¶
When initiating a connection, the resolver should take the following steps:¶
- set
E-initiated[X]
to T0
¶
- store a handle for the new session (which should have
pending
state) in E-session[X]
¶
- insert a handle for the query that prompted this connection in
E-queries[X]
, with status unsent
or early
, as appropriate (see below).¶
Modern encrypted transports like TLS 1.3 offer the chance to store "early data" from the client into the initial Client Hello in some contexts.
A resolver that initiates a connection over a encrypted transport according to this guidance in a context where early data is possible SHOULD send the DNS query that prompted the connection in the early data, according to the sending guidance in Section 4.5.8.¶
If it does so, the status of Q
in E-queries[X]
should be set to early
instead of unsent
.¶
When initiating a new connection (whether by resuming an old session or not), the recursive resolver SHOULD request a session resumption ticket from the authoritative server.
If the authoritative server supplies a resumption ticket, the recursive resolver pushes it into the stack at E-resumptions[X]
.¶
For modern encrypted transports like TLS 1.3, most client implementations expect to send a Server Name Indication (SNI) in the Client Hello.¶
There are two complications with selecting or sending SNI in this unilateral probing:¶
- Some authoritative servers are known by more than one name; selecting a single name to use for a given connection may be difficult or impossible.¶
- In most configurations, the contents of the SNI field is exposed on the wire to a passive adversary.
This potentially reveals additional information about which query is being made, based on the NS of the query itself.¶
To avoid additional leakage and complexity, a recursive resolver following this guidance SHOULD NOT send SNI to the authoritative when attempting encrypted transport.¶
If the recursive resolver needs to send SNI to the authoritative for some reason not found in this document, it is RECOMMENDED that it implements Encrypted Client Hello ([I-D.ietf-tls-esni] to reduce leakage.¶
A recursive resolver following this guidance MAY attempt to verify the server's identity by X.509 certificate or DANE.
When doing so, the identity would presumably be based on the NS name used for a given query.¶
However, since this probing policy is unilateral and opportunistic, the client SHOULD NOT consider it a failure if an encrypted transport handshake that does not authenticate to any particular expected name.¶
To avoid the complexity of authoritative servers with multiple simultaneous names, or multiple names over time, this draft does not attempt to describe what name a recursive resolver should use when validating an authoritative server, or what the recursive resolver should do with an authentication success.¶
When an encrypted transport connection actually completes (e.g., the TLS handshake completes) at time T1
, the resolver sets E-completed[X]
to T1
and does the following:¶
If the handshake completed successfully:¶
If, at time T2
an encrypted transport handshake completes with a failure (e.g. a TLS alert),¶
Note that this failure will trigger the recursive resolver to fall back to cleartext queries to the authoritative server at IP address X
.
It will retry encrypted transport to X
once the damping
timer has elapsed.¶
Once established, an encrypted transport might fail for a number of reasons (e.g., decryption failure, or improper protocol sequence).¶
If this happens:¶
Note that this failure will trigger the recursive resolver to fall back to cleartext queries to the authoritative server at IP address X
.
It will retry encrypted transport to X
once the damping
timer has elapsed.¶
FIXME: are there specific forms of failure that we might handle differently?
For example, What if a TCP timeout closes an idle DoT connection?
What if a QUIC stream ends up timing out but other streams on the same QUIC connection are going through?
Do the described scenarios cover the case when an encrypted transport's port is made unavailable/closed?¶
At time T3
, the recursive resolver may find that authoritative server X
cleanly closes an existing outstanding connection (most likely due to resource exhaustion, see Section 3.4).¶
When this happens:¶
Note that this premature shutdown will trigger the recursive resolver to fall back to cleartext queries to the authoritative server at IP address X
.
Any subsequent query to X
will retry the encrypted connection promptly.¶
When sending a query to an authoritative server over encrypted transport at time T4
, the recursive resolver should take a few reasonable steps to ensure privacy and efficiency.¶
When sending query Q
, the recursive resolver should ensure that its state in E-queries[X]
is set to sent
.¶
The recursive resolver also sets E-last-activity[X]
to T4
.¶
In addition, the recursive resolver should consider the following guidance:¶
To protect the privacy of the client, the recursive resolver SHOULD NOT send EDNS(0) Client Subnet information to the authoritative server ([RFC7871]) unless explicitly authorized to do so by the client.¶
To increase the anonymity set for each query, the recursive resolver SHOULD use EDNS(0) padding according to policies described in [RFC8467].¶
When multiple queries are multiplexed on a single encrypted transport to a single authoritative server, the recursive resolver MUST offer distinct query ID fields for every outstanding query on a connection, and MUST be capable of receiving responses out of order.¶
To the extent that the encrypted transport can avoid head-of-line blocking (e.g. QUIC can use a separate stream per query) the recursive resolver SHOULD avoid head-of-line blocking.¶
When a response R
for query Q
arrives at the recursive resolver over encrypted transport E
from authoritative server with IP address X
at time T5
, if Q
is in E-queries[X]
, the recursive resolver takes the following steps:¶
Over time, a recursive resolver following this policy may find that it is limited in resources, and may prefer to close some outstanding connections.¶
This could be done by checking available/consumed resources on a fixed schedule, by having a policy of only keeping a fixed number of connections open, by checking resources when activity occurs, or by some other cadence.¶
When existing connections should be closed, the recursive resolver should use a reasonable prioritization scheme to close outstanding connections.¶
One reasonable prioritization scheme would be:¶
- close outstanding
established
sessions based on E-last-activity[X]
(oldest timestamp gets closed first)¶
Note that when resources are limited, a recursive resolver following this guidance may also choose not to initiate new connections for encrypted transport.¶
Some recursive resolvers looking to amortize connection costs, and to minimize latency MAY choose to synthesize queries to a particular resolver to keep a encrypted transport session active.¶
A recursive resolver that adopts this approach should try to align the synthesized queries with other optimizations.
For example, a recursive resolver that "pre-fetches" a particular resource record to keep its cache "hot" can send that query over an established encrypted transport session.¶