Internet-Draft | DNS message fragments | April 2022 |
Yu & Liu | Expires 29 October 2022 | [Page] |
This document describes a method to transmit DNS messages over multiple UDP datagrams by fragmenting them at the application layer. The objective is to allow authoriative servers to successfully reply to DNS queries via UDP using multiple smaller datagrams, where larger datagrams may not pass through the network successfully.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 29 October 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
[RFC1035] describes how DNS messages are to be transmitted over UDP. A DNS query message is transmitted using one UDP datagram from client to server, and a corresponding DNS reply message is transmitted using one UDP datagram from server to client.¶
The upper limit on the size of a DNS message that can be transmitted thus depends on the maximum size of the UDP datagram that can be transmitted successfully from the sender to the receiver. Typically any size limit only matters for DNS replies, as DNS queries are usually small.¶
As a UDP datagram is transmitted in a single IP, in theory the size of a UDP datagram (including various lower internet layer headers) can be as large as 64 KiB. But practically, if the datagram size exceeds the path MTU, then the datagram will either be fragmented at the IP layer, or worse dropped, by a forwarder. In the case of IPv6, DNS packets are fragmented by the sender only. If a packet's size exceeds the path MTU, it must be fragmented. Except for the first fragmented package, other fragmented packages do not include a UDP or TCP header, and do not know the port number of the IP package, and the subsequent IP slice pack is filtered off. A Packet Too Big (PTB) ICMP message will be received by sender without any clue to the sender to reply again with a smaller sized message, due to the stateless feature of DNS. In addition, IP-level fragmentation caused by large DNS response packet will introduce risk of cache poisoning [Fragment-Poisonous], in which the attacker can circumvent some defense mechanisms (like port, IP, and query randomization [RFC5452]).¶
As a result, a practical DNS payload size limitation is necessary. [RFC1035] limited DNS message UDP datagram lengths to a maximum of 512 bytes. Although EDNS(0) [RFC6891] allows an initiator to advertise the capability of receiving lager packets (up to 4096 bytes), it leads to fragmentation because practically most packets are limited to 1500 byte size due to host Ethernet interfaces, or 1280 byte size due to minimum IPv6 MTU in the IPv6 stack [RFC3542].¶
According to DNS specifications [RFC1035], if the DNS response message can not fit within the packet's size limit, the response is truncated and the initiator will have to use TCP as a fallback to re-query to receive large response. However, not to mention the high setup cost introduced by TCP due to additional roundtrips, some firewalls and middle boxes even block TCP/53 which cause no responses to be received as well. It becomes a significant issue when the DNS response size inevitably increases with DNSSEC deployment.¶
In this memo, DNS message fragmentation attempts to work around middle box misbehavior by splitting a single DNS message across multiple UDP datagrams. Note that to avoid DNS amplification and reflection attacks, DNS cookies [I-D.ietf-dnsop-cookies] is a mandatory requirement when using DNS message fragments.¶
It is not a new topic regarding large DNS packets(>512B) issue [I-D.ietf-dnsop-respsize], starting from introduction of IPv6, EDNS(0) [SAC016], and DNSSEC deployment [SAC035]. In current production networks, using DNSSEC with longer DNSKEYs (ZSK>1024B and KSK>2048B) will result in response packets no smaller than 1500B [T-DNS]. Especially during the KSK rollover process, responses to the query of DNSKEY RRset will be enlarged as they contain both the new and old KSK.¶
When possible, we should avoid dropped packets as this means the client must wait for a timeout, which incurs a high cost. For example, a validator behind a firewall suffers waiting till the timeout with no response, if the firewall drops large EDNS(0) packets and IP fragments. It may even cause disaster when the validator can not recieve response for new trust anchor KSK due to the extreme case of bad middle boxes which also drop TCP/53.¶
Since UDP requires fewer packets on the wire and less state on servers than TCP, in this memo we propose continuing to use UDP for transmission but fragment the larger DNS packets into smaller DNS packets at the application layer. We would like the fragments to easily go through middle boxes and avoid falling back to TCP.¶
Clients supporting DNS message fragmentation add an EDNS option to their queries, which declares their support for this feature.¶
If a DNS reply is received that has been fragmented, it will consist of multiple DNS message fragments (each transmitted in a respective UDP packet), and every fragment contain an EDNS option which says how many total fragments there are, and the identifier of the fragment that the current packet represents. The client collects all of the fragments and uses them to reconstruct the full DNS message. Clients MUST maintain a timeout when waiting for the fragments to arrive.¶
Clients that support DNS message fragments MUST be able to reassemble fragments into a DNS message of any size, up to the maximum of 64KiB.¶
The client MAY save information about what sizes of packets have been received from a given server. If saved, this information MUST have a limited duration.¶
Any DNSSEC validation is performed on the reassembled DNS message.¶
Servers supporting DNS message fragmentation will look for the EDNS option which declares client support for the feature. If not present, the server MUST NOT use DNS message fragmentation. The server MUST check that DNS cookies are supported. [**FIXME**] Implementation of the first request case, where no existing established cookie is available needs discussion; we want to avoid additional round-trips here. Shane: don't cookies already handle this case?¶
The server prepares the response DNS message normally. If the message exceeds the maximum UDP payload size specified by the client, then it should fragment the message into multiple UDP datagrams.¶
Each fragment contains an identical DNS header with TC=1, possibly varying only in the section counts. Setting the TC flag in this way insures that clients which do not support DNS fragments can fallback to TCP transparently.¶
As many RR are included in each fragment as are possible without going over the desired size of the fragment. An EDNS option is added to every fragment, that includes both the fragment identifier and the total number of fragments.¶
The server needs to know how many total fragments there are to insert into each fragment. A simple approach would be to generate all fragments, and then count the total number at the end, and update the previously-generated fragments with the total number of fragments. Other techniques may be possible.¶
The server MUST limit the number of fragments that it uses in a reply. (See "Open Issues and Discussion" for remaining work.)¶
The server MUST NOT exceed the maximum fragment size requested by a client.¶
The server should use the following sizes for each fragment in the sequence in IPv4:¶
Fragment ID | Size |
---|---|
1 | min(512, client_specified_max) |
2 | min(1460, client_specified_max) |
3 | min(1480, client_specified_max) |
N | min(1480, client_specified_max) |
The rationale is that the first packet will always get through, since if a 512 octet packet doesn't work, DNS cannot function. We then increase to sizes that are likely to get through. 1460 is the 1500 octet Ethernet packet size, minus the IP header overhead and enough space to support tunneled traffic. 1480 is the 1500 octet Ethernet packet size, minus the IP header overhead. [**FIXME**] Why not add 1240 here? Shane answers: 1280 is not any kind of limit in IPv4, as far as I know.¶
The server should use the following sizes for each packet in the sequence in IPv6:¶
Fragment ID | Size |
---|---|
1 | min(1240, client_specified_max) |
2 | min(1420, client_specified_max) |
3 | min(1460, client_specified_max) |
N | min(1460, client_specified_max) |
Like with IPv4, the idea is that the first packet will always get through. In this case we use the IPv6-mandated 1280 octets, minus the IP header overhead. We then increase to 1420, which is the 1500 octet Ethernet packet size, minus the IP header overhead and enough space to support tunneled traffic. 1460 is the 1500 octet Ethernet packet size, minus the IP header overhead.¶
ALLOW-FRAGMENTS is an EDNS(0) [RFC6891] option that a client uses to inform a server that it supports fragmented responses. [**FIXME**] Why not simply use the FRAGMENT option here with count=0, identifier=ignored and avoid using another option code? Shane: There are no shortage of options. Plus, if we want to include a maximum fragment size value in the ALLOW-FRAGMENTS then we really need a separate option.¶
TBD.¶
The Maximum Fragment Size field is represented as an unsigned 16-bit integer. This is the maximum size used by any given fragment the server returns. [**FIXME**] This field's purpose has to be explained. Shane: discussed in the discussion section now.¶
As with other EDNS(0) options, the ALLOW-FRAGMENTS option does not have a presentation format.¶
FRAGMENT is an EDNS(0) [RFC6891] option that assists a client in gathering the various fragments of a DNS message from multiple UDP datagrams. It is described in a previous section. Here, its syntax is provided.¶
TBD.¶
The Fragment Identifier field is represented as an unsigned 8-bit integer. The first fragment is identified as 1. Values in the range [1,255] can be used to identify the various fragments. Value 0 is used for signalling purposes.¶
The Fragment Count field is represented as an unsigned 8-bit integer. It contains the number of fragments in the range [1,255] that make up the DNS message. Value 0 is used for signalling purposes.¶
As with other EDNS(0) options, the FRAGMENT option does not have a presentation format.¶
TCP-based application protocols co-exist well with competing traffic flows in the internet due to congestion control methods such as in [RFC5681] that are present in TCP implementations.¶
UDP-based application protocols have no restrictions in lower layers to stop them from flooding datagrams into a network and causing congestion. So applications that use UDP have to check themselves from causing congestion so that their traffic is not disruptive.¶
In the case of [RFC1035], only one reply UDP datagram was sent per request UDP datagram, and so the lock-step flow control automatically ensured that UDP DNS traffic didn't lead to congestion. When DNS clients didn't hear back from the server, and had to retransmit the question, they typically paced themselves by using methods such as a retransmission timer based on a smoothed round-trip time between client and server.¶
Due to the message fragmentation described in this document, when a DNS query causes multiple DNS reply datagrams to be sent back to the client, there is a risk that without effective control of flow, DNS traffic could cause problems to competing flows along the network path.¶
Because UDP does not guarantee delivery of datagrams, there is a possibility that one or more fragments of a DNS message will be lost during transfer. This is especially a problem on some wireless networks where a rate of datagrams can continually be lost due to interference and other environmental factors. With larger numbers of message fragments, the probability of fragment loss increases.¶
TBD.¶
Resolver behavior¶
We need some more discussion of resolver behavior in general, at least to the point of making things clear to an implementor.¶
The use of DNS fragments mechanism¶
Is this mechanism designed for all DNS transactions, or only used in some event or special cases like a key rollover process? If the mechanism is designed for general DNS transactions, when is it triggered and how is it integrated with existing patterns?¶
One option is that DNS fragments mechanism works as a backup with EDNS, and triggered only when a larger packet fails in the middle. It will be orthogonal with TCP which provide additional context that TC bit will be used in server side.¶
What is the size of fragments?¶
Generally speaking the number of fragment increases if fragment size is small (512 bytes, or other empirical value), which makes the mechanism less efficient. If the size can changed dynamically according to negotiation or some detection, it will introduce more cost and round trip time.¶
What happens if a client that does not support DNS fragments receives an out-of-order or partial fragment?¶
We need to consider what happens when a client that does not support DNS fragments gets a partial response, possibly even out of order.¶
We should explain risk of congestion, packet loss, etc. when introducing the limit on the number of fragments. We might also set specific upper limits for number of fragments.¶
EDNS buffer sizes vs. maximum fragmentation sizes¶
Mukund Sivaraman: We need further discussion about the sizes; also an upper limit for each *fragment* has to be the client's UDP payload size as it is the driver and it alone knows the ultimate success/failure of message delivery. So if it sets a maximum payload size of 1200, there's no point in trying 1460. Clients that support DNS message fragments (and signal support using the EDNS option) should adapt their UDP payload size discovery algorithm to work with this feature, as the following splits on sizes will assist PMTU discovery.¶
Shane Kerr: I think we need to separate the EDNS maximum UDP payload size from the maximum fragment size. I think that it is quite likely that (for example) we will want to restrict each fragment to 1480 bytes, but that the EDNS buffer size might remain at 4 kibibytes.¶
TSIG should be addressed¶
We need to document how to handle TSIG, even though this is not likely to be a real-world issue. Probably each fragment should be TSIG signed, as this makes it harder for an attacker to inject bogus packets that a client will have to process.¶
RR splitting should be addressed¶
We need to document whether or not RR can be split. Probably it makes sense not to allow this, although this will reduce the effectiveness of the fragmentation, as the units that can be packed into each fragment will be bigger.¶
We need to document that some messages may not be possible to split.¶
Some messages may be too large to split. A trivial example is a TXT record that is larger than the buffer size. Probably the best behavior here is to truncate.¶
DNSSEC checks¶
DNSSEC checks should be done on the final reassembled packet. This needs to be documented.¶
Name compression¶
Name compression should be done on the each fragment separately. This needs to be documented.¶
OPT-RR¶
Some OPT-RR seem to be oriented at the entire message, others make more sense per packet. This needs to be sorted out. Also we need to investigate the edge case where fragments have conflicting options (Mukund Sivaraman thinks that we can copy the approach in the EDNS specification and use the same rules about conflicting OPT-RR that it uses.)¶
To avoid DNS amplification or reflection attacks, DNS cookies [I-D.ietf-dnsop-cookies] must be used. The DNS cookie EDNS option is identical in all fragments that make up a DNS message. The duplication of the same cookie values in all fragments that make up the message is not expected to introduce a security weakness in the case of off-path attacks.¶
The ALLOW-FRAGMENTS and FRAGMENT EDNS(0) options require option codes to be assigned for them.¶
Thanks to Stephen Morris, JINMEI Tatuya, Paul Vixie, Mark Andrews, and David Dragon for reviewing a pre-draft proposal and providing support, comments and suggestions.¶