Internet-Draft | avoid-fragmentation | September 2024 |
Fujiwara & Vixie | Expires 24 March 2025 | [Page] |
The widely deployed EDNS0 feature in the DNS enables a DNS receiver to indicate its received UDP message size capacity, which supports the sending of large UDP responses by a DNS server. Large DNS/UDP messages are more likely to be fragmented and IP fragmentation has exposed weaknesses in application protocols. It is possible to avoid IP fragmentation in DNS by limiting the response size where possible, and signaling the need to upgrade from UDP to TCP transport where necessary. This document describes techniques to avoid IP fragmentation in DNS.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 24 March 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This document was originally intended to be a BCP, but due to operating system and socket option limitations, some of the recommendations have not yet gained real-world experience and therefore the document is published as Informational. It is hoped and expected that, as operating systems and implementations evolve, we will gain more experience with the recommendations, and plan to publish an updated document as a Best Current Practice. In the case of IPv6 only, there are no concerns, and it is easy to reach a consensus.¶
DNS has an EDNS0 [RFC6891] mechanism. The widely deployed EDNS0 feature in the DNS enables a DNS receiver to indicate its received UDP message size capacity which supports the sending of large UDP responses by a DNS server. DNS over UDP invites IP fragmentation when a packet is larger than the MTU of some network in the packet's path.¶
Fragmented DNS UDP responses have systemic weaknesses, which expose the requestor to DNS cache poisoning from off-path attackers. (See Section 7.3 for references and details.)¶
[RFC8900] states that IP fragmentation introduces fragility to Internet communication. The transport of DNS messages over UDP should take account of the observations stated in that document.¶
TCP avoids fragmentation by segmenting data into packets that are smaller than or equal to the Maximum Segment Size (MSS). For each transmitted segment, the size of the IP and TCP headers is known, and the IP packet size can be chosen to keep it within the estimated MTU and the other end's MSS. This takes advantage of the elasticity of TCP's packetizing process as to how much queued data will fit into the next segment. In contrast, DNS over UDP has little datagram size elasticity and lacks insight into IP header and option size, so we must make more conservative estimates about available UDP payload space.¶
[RFC7766] states that all general-purpose DNS implementations MUST support both UDP and TCP transport.¶
DNS transaction security [RFC8945] [RFC2931] does protect against the security risks of fragmentation, including protecting delegation responses. But [RFC8945] has limited applicability due to key distribution requirements and there is little if any deployment of [RFC2931].¶
This document describes various techniques to avoid IP fragmentation of UDP packets in DNS. This document is primarily applicable to DNS use on the global Internet.¶
In contrast, a path MTU that deviates from the recommended value might be obtained through static configuration, server routing hints, or a future discovery protocol. However, addressing this falls outside the scope of this document and may be the subject of future specifications.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
"Requestor" refers to the side that sends a request. "Responder" refers to an authoritative server, recursive resolver or other DNS component that responds to questions. (Quoted from EDNS0 [RFC6891])¶
"Path MTU" is the minimum link MTU of all the links in a path between a source node and a destination node. (Quoted from [RFC8201])¶
In this document, the term "Path MTU discovery" includes both Classical Path MTU discovery [RFC1191], [RFC8201], and Packetization Layer Path MTU discovery [RFC8899].¶
Many of the specialized terms used in this document are defined in DNS Terminology [RFC8499].¶
These recommendations are intended for nodes with global IP addresses on the Internet. Private networks or local networks are out of the scope of this document.¶
The methods to avoid IP fragmentation in DNS are described below:¶
R1. UDP responders should not use IPv6 fragmentation [RFC8200].¶
R2. UDP responders should configure their systems to prevent fragmentation of UDP packets when sending replies, provided it can be done safely. The mechanisms to achieve this vary across different operating systems.¶
For BSD-like operating systems, the IP "Don't Fragment flag (DF) bit" [RFC0791] can be used to prevent fragmentation. In contrast, Linux systems do not expose a direct API for this purpose and require the use of Path MTU socket options (IP_MTU_DISCOVER) to manage fragmentation settings. However, it is important to note that enabling IPv4 Path MTU Discovery for UDP in current Linux versions is considered harmful and dangerous. For more details, refer to Appendix C.¶
R3. UDP responders should compose response packets that fit in the minimum of the offered requestor's maximum UDP payload size [RFC6891], the interface MTU, the network MTU value configured by the knowledge of the network operators, and the RECOMMENDED maximum DNS/UDP payload size 1400. (See Appendix A for more information.)¶
R4. If the UDP responder detects an immediate error indicating that the UDP packet cannot be sent beyond the path MTU size, the UDP responder may recreate response packets fit in the path MTU size, or with the TC bit set.¶
The cause and effect of the TC bit are unchanged [RFC1035].¶
R5. UDP requestors should limit the requestor's maximum UDP payload size that fit in the minimum of the interface MTU, the network MTU value configured by the knowledge of the network operators, and the RECOMMENDED maximum DNS/UDP payload size 1400. A smaller limit may be allowed. (See Appendix A for more information.)¶
R6. UDP requestors should/may drop fragmented DNS/UDP responses without IP reassembly to avoid cache poisoning attacks (at firewall function).¶
R7. DNS responses may be dropped by IP fragmentation. Requestors are recommended to try alternative transport protocols eventually.¶
Large DNS responses are typically the result of zone configuration. People who publish information in the DNS should seek configurations, resulting in small responses. For example,¶
R8. Use a smaller number of name servers.¶
R9. Use a smaller number of A/AAAA RRs for a domain name.¶
R10. Use minimal-responses configuration: Some implementations have a 'minimal responses' configuration option that causes DNS servers to make response packets smaller, containing only mandatory and required data (Appendix B).¶
R11. Use a smaller signature / public key size algorithm for DNSSEC. Notably, the signature sizes of ECDSA and EdDSA are smaller than those of equivalent cryptographic strength using RSA.¶
It is difficult to determine a specific upper limit for R8, R9, and R11, but it is sufficient if all responses from the DNS servers are below the size of R3 and R5.¶
Some authoritative servers deviate from the DNS standard as follows:¶
Some authoritative servers ignore the EDNS0 requestor's maximum UDP payload size and return large UDP responses. [Fujiwara2018]¶
Some authoritative servers do not support TCP transport.¶
Such non-compliant behavior cannot become implementation or configuration constraints for the rest of the DNS. If failure is the result, then that failure must be localized to the non-compliant servers.¶
This document requests no IANA actions.¶
If the Don't Fragment (DF) bit is not set, on-path fragmentation may happen on IPv4, and be vulnerable, as shown in Section 7.3. To avoid this, recommendation R6 need to be used to discard the fragmented responses and retry by TCP.¶
When avoiding fragmentation, a DNS/UDP requestor behind a small MTU network may experience UDP timeouts, which would reduce performance and which may lead to TCP fallback. This would indicate prior reliance upon IP fragmentation, which is considered to be harmful to both the performance and stability of applications, endpoints, and gateways. Avoiding IP fragmentation will improve operating conditions overall, and the performance of DNS/TCP has increased and will continue to increase.¶
If a UDP response packet is dropped in transit, up to and including the network stack of the initiator, it increases the attack window for poisoning the requestor's cache.¶
"Fragmentation Considered Poisonous" [Herzberg2013] proposed effective off-path DNS cache poisoning attack vectors using IP fragmentation. "IP fragmentation attack on DNS" [Hlavacek2013] and "Domain Validation++ For MitM-Resilient PKI" [Brandt2018] proposed that off-path attackers can intervene in the path MTU discovery [RFC1191] to perform intentionally fragmented responses from authoritative servers. [RFC7739] stated the security implications of predictable fragment identification values.¶
In Section 3.2 (Message Side Guidelines) of UDP Usage Guidelines [RFC8085] we are told that an application SHOULD NOT send UDP datagrams that result in IP packets that exceed the Maximum Transmission Unit (MTU) along the path to the destination.¶
A DNS message receiver cannot trust fragmented UDP datagrams primarily due to the small amount of entropy provided by UDP port numbers and DNS message identifiers, each of which being only 16 bits in size, and both likely being in the first fragment of a packet if fragmentation occurs. By comparison, the TCP protocol stack controls packet size and avoids IP fragmentation under ICMP NEEDFRAG attacks. In TCP, fragmentation should be avoided for performance reasons, whereas for UDP, fragmentation should be avoided for resiliency and authenticity reasons.¶
DNSSEC is a countermeasure against cache poisoning attacks that use IP fragmentation. However, DNS delegation responses are not signed with DNSSEC, and DNSSEC does not have a mechanism to get the correct response if an incorrect delegation is injected. This is a denial-of-service vulnerability that can yield failed name resolutions. If cache poisoning attacks can be avoided, DNSSEC validation failures will be avoided.¶
Because this document is published as an "Informational" document rather than a "Best Current Practice," this section presents steps that resolver operators can take to avoid vulnerabilities related to IP fragmentation.¶
To avoid vulnerabilities related to IP fragmentation, implement R5 and R6.¶
Specifically, config the firewall functions before the full-service resolver to discard incoming DNS response packets with a non-zero Fragment offset or a More Fragments (MF) bit of 1 on IPv4, and discard packets with IPv6 Fragment Headers. (If the resolver's IP address is not dedicated to the DNS resolver and uses UDP communication that relies on IP Fragmentation for purposes other than DNS, discard only the first fragment that contains the UDP header from port 53.)¶
The most recent resolver software is believed to implement R7.¶
Even if R7 is not implemented, it will only result in a name resolution error, preventing attacks from leading to malicious sites.¶
The author would like to specifically thank Paul Wouters, Mukund Sivaraman, Tony Finch, Hugo Salgado, Peter van Dijk, Brian Dickson, Puneet Sood, Jim Reid, Petr Spacek, Andrew McConachie, Joe Abley, Daisuke Higashi, Joe Touch and Wouter Wijngaards for extensive review and comments.¶
There are many discussions for default path MTU size and requestor's maximum UDP payload size.¶
The minimum MTU for an IPv6 interface is 1280 octets (see Section 5 of [RFC8200]). So, we can use it as the default path MTU value for IPv6. The corresponding minimum MTU for an IPv4 interface is 68 (60 + 8) [RFC0791].¶
[RFC4035] defines that "A security-aware name server MUST support the EDNS0 message size extension, MUST support a message size of at least 1220 octets". Then, the smallest number of the maximum DNS/UDP payload size is 1220.¶
In order to avoid IP fragmentation, [DNSFlagDay2020] proposed that the UDP requestors set the requestor's payload size to 1232, and the UDP responders compose UDP responses so they fit in 1232 octets. The size 1232 is based on an MTU of 1280, which is required by the IPv6 specification [RFC8200], minus 48 octets for the IPv6 and UDP headers.¶
Most of the Internet and especially the inner core has an MTU of at least 1500 octets. Maximum DNS/UDP payload size for IPv6 on MTU 1500 ethernet is 1452 (1500 minus 40 (IPv6 header size) minus 8 (UDP header size)). To allow for possible IP options and distant tunnel overhead, the recommendation of default maximum DNS/UDP payload size is 1400.¶
[Huston2021] analyzed the result of [DNSFlagDay2020] and reported that their measurements suggest that in the interior of the Internet between recursive resolvers and authoritative servers the prevailing MTU is at 1,500 and there is no measurable signal of use of smaller MTUs in this part of the Internet, and proposed that their measurements suggest setting the EDNS0 requestor's UDP payload size to 1472 octets for IPv4, and 1452 octets for IPv6.¶
As a result of discussions, this document decided to recommend a value of 1400, with smaller values also allowed.¶
Some implementations have a "minimal responses" configuration setting/option that causes a DNS server to make response packets smaller, containing only mandatory and required data.¶
Under the minimal-responses configuration, a DNS server composes responses containing only necessary RRs. For delegations, see [RFC9471]. In case of a non-existent domain name or non-existent type, the authority section will contain an SOA record and the answer section is empty. (defined in Section 2 of [RFC2308]).¶
Some resource records (MX, SRV, SVCB, HTTPS) require additional A, AAAA, and SVCB records in the Additional Section defined in [RFC1035], [RFC2782] and [RFC9460].¶
In addition, if the zone is DNSSEC signed and a query has the DNSSEC OK bit, signatures are added in the answer section, or the corresponding DS RRSet and signatures are added in the authority section. Details are defined in [RFC4035] and [RFC5155].¶
This section records the status of known implementations of these best practices defined by this specification at the time of publication, and any deviation from the specification.¶
Please note that the listing of any individual implementation here does not imply endorsement by the IETF. Furthermore, no effort has been spent to verify the information presented here that was supplied by IETF contributors.¶
BIND 9 does not implement the recommendations 1 and 2 in Section 3.1.¶
BIND 9 on Linux sets IP_MTU_DISCOVER to IP_PMTUDISC_OMIT with a fallback to IP_PMTUDISC_DONT.¶
BIND 9 on systems with IP_DONTFRAG (such as FreeBSD), IP_DONTFRAG is disabled.¶
Accepting PATH MTU Discovery for UDP is considered harmful and dangerous. BIND 9's settings avoid attacks to path MTU discovery.¶
For recommendation 3, BIND 9 will honor the requestor's size up to the
configured limit (max-udp-size
). The UDP response packet is bound to be
between 512 and 4096 bytes, with the default set to 1232. BIND 9 supports the
requestor's size up to the configured limit (max-udp-size
).¶
In the case of recommendation 4, and the send fails with EMSGSIZE, BIND 9 set the TC bit and try to send a minimal answer again.¶
In the first recommendation of Section 3.2, BIND 9 uses the edns-buf-size
option, with the default of 1232.¶
BIND 9 does implement recommendation 2 of Section 3.2.¶
For recommendation 3, after two UDP timeouts, BIND 9 will fall back to TCP.¶
Both Knot servers set IP_PMTUDISC_OMIT to avoid path MTU spoofing. UDP size limit is 1232 by default.¶
Fragments are ignored if they arrive over an XDP interface.¶
TCP is attempted after repeated UDP timeouts.¶
Minimal responses are returned and are currently not configurable.¶
Smaller signatures are used, with ecdsap256sha256 as the default.¶
Unbound sets IP_MTU_DISCOVER to IP_PMTUDISC_OMIT with fallback to IP_PMTUDISC_DONT. It also disables IP_DONTFRAG on systems that have it, but not on Apple systems. On systems that support it Unbound sets IPV6_USE_MIN_MTU, with a fallback to IPV6_MTU at 1280, with a fallback to IPV6_USER_MTU. It also sets IPV6_MTU_DISCOVER to IPV6_PMTUDISC_OMIT with a fallback to IPV6_PMTUDISC_DONT.¶
Unbound requests UDP size 1232 from peers, by default. The requestors size is limited to a max of 1232.¶
After some timeouts, Unbound retries with a smaller size, if that is smaller, at size 1232 for IPv6 and 1472 for IPv4. This does not do anything since the flag day change to 1232.¶
Unbound has minimal responses as an option, default on.¶