Internet-Draft | IPv4 Link Atomic Packet Notification | November 2022 |
Migault, et al. | Expires 25 May 2023 | [Page] |
This document considers a ingress and an egress security gateway connected over a IPv4 network. The Tunnel Link Packet have their Don't Fragment (DF) set to 0.¶
This document defines the IKEv2 IPv4 Link Maximum Atomic Packet Notification Extension which enables the egress security gateway to notify the ingress security gateway that Mid-tunnel Fragmentation is observed with the value of the Link Maximum Atomic Packet. The ingress security gateway is expected to take action as to avoid the egress security gateway to perform costly reassemble operation. The ingress security gateway is expected to either perform (when possible) Inner Fragmentation of to update Tunnel MTU.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 25 May 2023.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
As depicted in Figure 1, this document considers a tunnel established between a ingress and a egress security gateway. The Tunnel Transit Packet are IPv6 or IPv6 packets encapsulated over an IPsec/ESP [RFC4303] tunnel and the resulting Tunnel Link Packet is an IPv4 packet over the network N.¶
Fragments reassembling at the egress security gateway requires additional resources which under heavy load results in service degradations. Firstly, the security gateway to handle states for indefinite time. Then, as detailed in [RFC4963], [RFC6864] or [RFC8900], the 16-bit IPv4 identification field is not large enough to prevent duplication making fragmentation not sufficiently robust at high data rates.¶
The egress security gateway needs to reassemble fragmented packets when Mid tunnel fragmentation occurs (only for IPv4 DF=0 Tunnel Link Packet) (see (2) in Figure 1 or when the Outer fragmentation is performed by the ingress node (see 3 in Figure 1).¶
One can reasonably question why setting the IPv4 DF=1 is not sufficient to avoid fragmentation. The reason is that this setting DF=1 leads to a black holing situation, and setting DF=0 is the way to mitigate this. Suppose the Don't Fragment bit to 1 in the IPv4 Header of the Tunnel Link Packet. If that packet becomes larger than the link Maximum Transmission Unit (MTU), the packet is dropped by an on-path router and an ICMPv4 message Packet Too Big (PTB) [RFC0792] is returned to the sending address. The ICMPv4 PTB message is a Destination Unreachable message with Code equal to 4 and was augmented by [RFC1191] to indicate the acceptable MTU. Unfortunately, one cannot rely on such procedure as in practice some routers do not check the MTU and as such do not send ICMPv4 messages. In addition, when ICMv4 message are sent these message are unprotected, and may be blocked by firewalls or ignored. This results in IPv4 packets being dropped without the security gateways being aware of it which is also designated as black holing. To prevent this situation, IPv4 packets often set their DF bit set to 0. In this case, as described in [RFC0792], when a packet size exceeds its MTU, the node fragments the incoming packet in multiple fragments.¶
This document describes a mechanism where the egress security gateway can inform the in ingress security gateway that fragmentation is being observed. The ingress security gateway SHOULD either perform:¶
Source Fragmentation by the ingress node for the link layer (as recommended in [I-D.ietf-intarea-tunnels]) is not considered as is does not prevent the reassembly operation.¶
Note that the two mechanisms implement fragmentation with radical different views. More specifically [I-D.ietf-intarea-tunnels] considers Tunnel MTU and link layer MTU as relatively independent while [RFC4301] correlates them strongly. A significant difference between MPA and MTU is that fragmentation in [I-D.ietf-intarea-tunnels] is not supposed to impact the MTU and ICMP PTB is only expected when the router is not able to handle the packet. MPA on the other hand is an indication fragmentation is happening.¶
This mechanism follows the [RFC8900] that recommends each layer handles fragmentation at their layer and to reduce the reliance on IP fragmentation to the greatest degree possible. This document does not describes a Path MTU Discovery (PMTUD) procedure [RFC1191] nor an Execute Packetization Layer PMTUD (PLMTUD) [RFC4821] procedure.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
During an IKEv2 negotiation, the initiator and the responder indicate their support for notifying an IPv4 Link Maximum Atomic Packet by exchanging the IP4_LINK_MAP_SUPPORTED notifications. This notification MUST be sent in the IKE_AUTH exchange (in case of multiple IKE_AUTH exchanges - in the first IKE_AUTH message from initiator and in the last IKE_AUTH message from responder). If both the initiator and the responder send this notification during the IKE_AUTH exchange, peers may notify each other with an IPv4 Link Maximum Atomic Packet Notification when fragmentation is observed. Upon receiving such notifications, the peers may take the necessary actions to prevent such fragmentation to occur.¶
Initiator Responder ------------------------------------------------------------------- HDR, SA, KEi, Ni --> <-- HDR, SA, KEr, Nr HDR, SK {IDi, AUTH, SA, TSi, TSr, N(IP4_LINK_MAP_SUPPORTED)} --> <-- HDR, SK {IDr, AUTH, SA, TSi, TSr, N(IP4_LINK_MAP_SUPPORTED)}¶
The egress security gateway detects fragmentation occurred when it received a fragment the Flags 'More Fragment Bit' in IP header set to 1. In that case it takes the length of that fragment (Total Length) for the Link Maximum Atomic Packet length. Figure 2 shows the IPv4 Header as described in [RFC0791] section 3.1 to illustrate the different fields involved.¶
It is not expected that the egress security gateway sends a IPv4 Link Maximum Atomic Packet Notification each time a fragmentation is observed. Such heuristics are expected to be configurable and trigger a IPv4 Link Maximum Atomic Packet Notification.¶
Such heuristics include, for example, a threshold for number of initial fragment received, a threshold for a certain rate of initial fragments. Such thresholds are also expected to be combined with a timer or a counter of already sent IP4_LINK_MAP notifications to avoid overloading the sending gateways with such notifications. It is expected that the time between two such notifications increases with the number of notifications.¶
The receiving security gateway determines a recommended MTU value to be used by the sending gateway. The recommended MTU SHOULD be one of the potential ongoing MTU observed from IPv4 ESP packets that have been correctly authenticated. The recommended MTU SHOULD be greater than some minimal values. [RFC0791] specifies the IPv4 minimum MTU is 68 octets, but greater values are likely to be more realistic. Once the appropriated MTU has been selected, the receiving security gateway sends the sending gateway a IP4_LINK_MAP notification to the sending gateway as described below:¶
Egress Security Gateway Ingress Security Gateway ------------------------------------------------------------------- HDR SK { N(IP4_LINK_MAP)} -->¶
Upon receiving a IP4_LINK_MAP notification, the ingress security gateway derives the tunnel MAP from the received Link MAP as follows:¶
tunnel MAP = link MAP - outer IP header - encapsulation overhead¶
where encapsulation overhead contains the ESP header, the ESP Trailer including the variable Pad field. When the padding is minimizing the Pad Len, the encapsulation header is set to 14 (+ the size of the ICV).¶
The ingress security gateway may perform Source Fragmentation of the Tunnel Link Packet also represented as Inner Fragmentation (3). More specifically, when the Tunnel transit Packet is IPv4 with DF=0, the ingress nodes fragments it into chunks that do not exceeds the MAP, so the (IPv4) encapsulated Tunnel Link Packet does not undergo Mid-tunnel fragmentation (See section 4.2.2 of [I-D.ietf-intarea-tunnels]).¶
The details of the ingress processing is described below with TP being the Tunnel Transit Packet.¶
if (TP.len <= tunnel MAP) then
encapsulate the TP and emit
else
if (tunnel MAP < TP.len) then
encapsulate the TP, creating the TLP
fragment the TLP into tunnel MAP chunks
emit the TLP fragments
endif
endif
¶
The ingress security gateway SHOULD propagates the tunnel MTU back to the source so the Source does not emit packets larger than the MAP. This is done by configuring the EMTU_R associated to the SA. Upon receiving a Tunnel Transit Packet larger than the MAP, the packet is discarded and an ICMP PTB message is returned to the Source which then performs Source Fragmentation (5) (See 8.2.1. of [RFC4301]).¶
It is worth mentioning that only futures packets will be impacted, that is not those causing fragmentation.¶
Figure 3 illustrates the Notify Payload packet format as described in Section 3.10 of [RFC7296] with a 4 bytes path allowed MTU value as notification data. This format is used for both the IP4_LINK_MAP_SUPPORTED and IP4_LINK_MAP notifications.¶
The fields Next Payload, Critical Bit, RESERVED and Payload Length are defined in [RFC7296]. Specific fields defined in this document are:¶
set to zero. SPI Size (1 octet):¶
set to zero. Notify Message Type (2 octets):¶
Specifies the type of notification message. It is set to TBD1 by IANA for the IP4_LINK_MAP_SUPPORTED notification or to TBD2 by IANA for the IP4_LINK_MAP notification. Notification Data:¶
Specifies the data associated to the notification message. It is empty for the IP4_LINK_MAP_SUPPORTED notification or a 4 octets that contains the MTU value for the IP4_LINK_MAP notification - as represented in Figure 4.¶
IANA is requested to allocate two values in the "IKEv2 Notify Message Types - Status Types" registry (available at https://www.iana.org/assignments/ikev2-parameters/ikev2-parameters.xhtml#ikev2-parameters-16) with the following definition:¶
+=======+================================+ | Value | NOTIFY MESSAGES - STATUS TYPES | +=======+================================+ | TBD1 | IP4_LINK_MAP_SUPPORTED | | TBD2 | IP4_LINK_MAP | +-------+--------------------------------+¶
This document defines an IKEv2 extension that informs a sending gateway that fragmentation is observed. In addition, an observed MTU value is reported to the sending security gateway. These pieces of information are inferred from a valid ESP packet that is authenticated, and the information is transferred from one security gateway to the other security gateway using the protected IKEv2 channel.¶
On the other hand, ESP does not provides any protection to the IPv4 header and as such to fragmentation procedure nor related pieces of information defined in [RFC0791], [RFC8900].
In our case, this includes information such as the DF bit and MF bit of the Flags field as well as the Total Length field from which the link MAP is inferred.
This is not surprising as fragmentation in the case of IPv4 MAY be performed by any node.
Similarly, ICMPv4 PTB messages are not protected either.
As a result, the security considerations related to MTU discovery [RFC0791], [RFC8900], [RFC4963], [RFC6864], [RFC1191] apply here.¶
The authors would like to thank Paul Wouters, Joe Touch for his reviews and valuable comments and suggestions.¶