Internet-Draft | Minimal ESP | May 2022 |
Migault & Guggemos | Expires 25 November 2022 | [Page] |
This document describes the minimal properties that an IP Encapsulating Security Payload (ESP) implementation needs to meet to remain interoperable with the standard RFC4303 ESP. Such a minimal version of ESP is not intended to become a replacement of the RFC 4303 ESP. Instead, a minimal implementation is expected to be optimized for constrained environments while remaining interoperable with implementations of RFC 4303 ESP. In addition, this document also provides some considerations for implementing minimal ESP in a constrained environment which includes limiting the number of flash writes, handling frequent wakeup / sleep states, limiting wakeup time, and reducing the use of random generation.¶
This document does not update or modify RFC 4303. It provides a compact description of how to implement the minimal version of that protocol. RFC 4303 remains the authoritative description.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 25 November 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
ESP [RFC4303] is part of the IPsec protocol suite [RFC4301]. IPsec is used to provide confidentiality, data origin authentication, connectionless integrity, an anti-replay service and limited traffic flow confidentiality (TFC) padding.¶
Figure 1 describes an ESP Packet. Currently, ESP is implemented in the kernel of most major multipurpose Operating Systems (OS). ESP is usually implemented with all of its features to fit the multiple purpose usage of these OSes, at the expense of resources and with no considerations for code size. Constrained devices are likely to have their own implementation of ESP optimized and adapted to their specific use, such as limiting the number of flash writes (for each packet or across wake time), handling frequent wakeup and sleep state, limiting wakeup time, and reducing the use of random generation. With the adoption of IPsec by IoT devices with minimal IKEv2 [RFC7815] and ESP Header Compression (EHC) with [I-D.mglt-ipsecme-diet-esp] or [I-D.mglt-ipsecme-ikev2-diet-esp-extension], these ESP implementations MUST remain interoperable with standard ESP implementations. This document describes the minimal properties an ESP implementation needs to meet to remain interoperable with [RFC4303] ESP. In addition, this document also provides advise to implementers for implementing ESP within constrained environments. This document does not update or modify RFC 4303.¶
For each field of the ESP packet represented in Figure 1 this document provides recommendations and guidance for minimal implementations. The primary purpose of Minimal ESP is to remain interoperable with other nodes implementing RFC 4303 ESP, while limiting the standard complexity of the implementation.¶
[RFC4303] defines the SPI as a mandatory 32 bits field.¶
The SPI has a local significance to index the Security Association (SA). From [RFC4301] section 4.1, nodes supporting only unicast communications can index their SA using only the SPI. Nodes supporting multicast communications also require to use the IP addresses and thus SA lookup need to be performed using the longest match.¶
For nodes supporting only unicast communications, it is RECOMMENDED indexing the SA using only the SPI. The index may be based on the full 32 bits of SPI or a subset of these bits. The node may require a combination of the SPI as well as other parameters (like the IP address) to index the SA.¶
Values 0-255 MUST NOT be used. As per section 2.1 of [RFC4303], values 1-255 are reserved and 0 is only allowed to be used internally and it MUST NOT be sent over the wire.¶
[RFC4303] does not require the 32 bit SPI to be randomly generated, although that is the RECOMMENDED way to generate SPIs as it provides some privacy and security benefits and avoids correlation between ESP communications. To obtain a usable random 32 bit SPI, the node generates a random 32 bit value and checks it does not fall within the 0-255 range. If the SPI has an acceptable value, it is used to index the inbound session. Otherwise the generated value is discarded and the process repeats until a valid value is found.¶
Some constrained devices are less concerned with the privacy properties associated to randomly generated SPIs. Examples of such devices might include sensors looking to reduce their code complexity. The use of a predictive function to generate the SPI might be preferred over the generation and handling of random values. An implementation of such predictable function could use the combination of a fixed value and the memory address of the SAD structure. For every incoming packet, the node will be able to point to the SAD structure directly from the SPI value. This avoids having a separate and additional binding and lookup function for the SPI to its SAD entry for every incoming packet.¶
SPIs that are not randomly generated over 32 bits may have privacy and security concerns. As a result, the use of alternative designs requires careful security and privacy reviews. This section provides some considerations upon the adoption of alternative designs.¶
The SPI value is only looked up for inbound traffic. The SPI negotiated with IKEv2 [RFC7296] or Minimal IKEv2 [RFC7815] by a peer is the value used by the remote peer when it sends traffic. The main advantage of using a rekeying mechanism is to enable a rekey, that is performed by replacing an old SA by a new SA, both indexed with distinct SPIs. As the SPI is only used for inbound traffic by the peer, this allows each peer to manage the set of SPIs used for its inbound traffic. The necessary number of SPI reflects the number of inbound SAs as well as the ability to rekey these SAs. Typically, rekeying a SA is performed by creating a new SA (with a dedicated SPI) before the old SA is deleted. This results in an additional SA and the need to support an additional SPI. Similarly, the privacy concerns associated with the generation of non-random SPIs is also limited to the incoming traffic.¶
Alternatively, some constrained devices will not implement IKEv2 or Minimal IKEv2 and as such will not be able to manage a roll-over between two distinct SAs. In addition, some of these constrained devices are also likely to have a limited number of SAs - likely to be indexed over 3 bytes only for example. One possible way to enable a rekey mechanism with these devices is to use the SPI where for example the first 3 bytes designates the SA while the remaining byte indicates a rekey index. SPI numbers can be used to implement tracking the inbound SAs when rekeying is taking place. When rekeying a SPI, the new SPI could use the SPI bytes to indicate the rekeying index.¶
The use of a small limited set of SPI numbers across communications comes with privacy and security concerns. Some specific values or subset of SPI values could reveal the models or manufacturer of the node implementing ESP. It could also reveal some state such as "not yet rekeyed" or "rekeyed 10 times". If a constrained host uses a very limited or even just one application, the SPI itself could indicate what kind of traffic (eg the kind of application typically running) is transmitted. This could be further correlated by encrypted data size to further leak information to an observer on the network. In addition, use of specific hardcoded SPI numbers could reveal a manufacturer or device version. If updated devices use different SPI numbers, an attacker could locate vulnerable devices by their use of specific SPI numbers.¶
A privacy analysis should consider at least the type of information as well the traffic pattern before deciding whether non-random SPIs are safe to use. Typically temperature sensors, wind sensors, used outdoors may not leak privacy sensitive information and most of its traffic is expected to be outbound traffic. When used indoors, a sensor that reports an encrypted status of a door (closed or opened) every minute, might not leak sensitive information outside the local network. In these examples, the privacy aspect of the information itself might be limited. Being able to determine the version of the sensor to potentially take control of it may also have some limited security consequences. Of course this depends on the context these sensors are being used. If the risks associated to privacy and security are acceptable, a non-randomized SPI can be used.¶
The Sequence Number (SN) in [RFC4303] is a mandatory 32 bits field in the packet.¶
The SN is set by the sender so the receiver can implement anti-replay protection. The SN is derived from any strictly increasing function that guarantees: if packet B is sent after packet A, then SN of packet B is higher than the SN of packet A.¶
Some constrained devices may establish communication with specific devices where it is known whether or not the peer implements anti-replay protection. As per [RFC4303], the sender MUST still implement a strictly increasing function to generate the SN.¶
The RECOMMENDED way for multipurpose ESP implementation is to increment a counter for each packet sent. However, a constrained device may avoid maintaining this context and use another source that is known to always increase. Typically, constrained devices use 802.15.4 Time Slotted Channel Hopping (TSCH). This communication is heavily dependent on time. A contrained device can take advantage of this clock mechanism to generate the SN. A lot of IoT devices are in a sleep state most of the time and wake up only to perform a specific operation before going back to sleep. These devices do have separate hardware that allows them to wake up after a certain timeout and typically also timers that start running when the device was booted up, so they might have a concept of time with certain granularity. This requires to store any information in a stable storage - such as flash memory - that can be restored across sleeps. Storing information associated with the SA such as SN requires some read and write operation on a stable storage after each packet is sent as opposed to a SPI number or cryptographic keys that are only written to stable storage at the creation of the SA. Write operations wear out the flash storage. Write operations also slow down the system significantly, as writing to flash is much slower than reading from flash. While these devices have internal clocks or timers that might not be very accurate, these are good enough to guarantee that each time the device wakes up from sleep that their time is greater than what it was before the device went to sleep. Using time for the SN would guarantee a strictly increasing function and avoid storing any additional values or context related to the SN on flash. In addition to the time value, a RAM based counter can be used to ensure that if the device sends multiple packets over an SA within one wake up period, that the serial numbers are still increasing and unique. Note that standard receivers are generally configured with incrementing counters and, if not appropriately configured, the use of a significantly larger SN than the previous packet can result in that packet falling outside of the peer's receiver window which could cause that packet to be discarded.¶
For inbound traffic, it is RECOMMENDED that receivers implement anti-replay protection. The size of the window should depend on the property of the network to deliver packets out of order. In an environment where out of order packets are not possible, the window size can be set to one. An ESP implementation may choose to not implement an anti-replay protection. An implementation of anti-replay protection may require the device to write the received SN for every packet to stable storage. This will have the same issues as discussed earlier with the SN. Some constrained device implementations may choose to not implement the optional anti-replay protection. A typical example might consider an IoT device such as a temperature sensor that is sending a temperature measurement every 60 seconds, and that receives an acknowledgment from the receiver. In such cases, the ability to spoof and replay an acknowledgement is of limited interest and might not justify the implementation of an anti-replay mechanism. Receiving peers may also use ESP anti-replay mechanism adapted to a specific application. Typically, when the sending peer is using SN based on time, anti-replay may be implemented by discarding any packets that present a SN whose value is too much in the past. Such mechanisms may consider clock drifting in various ways in addition to acceptable delay induced by the network to avoid the anti replay windows rejecting legitimate packets. It could accept any SN as long as it is higher than the previously received SN. Another mechanism could be used where only the received time on the device is used to consider a packet as valid, without looking at the SN at all.¶
The SN can be represented as a 32 bit number, or as a 64 bit number, known as Extended Sequence Number (ESN). As per [RFC4303], support of ESN is not mandatory and its use is negotiated via IKEv2 [RFC7296]. A ESN is used for high speed links to ensure there can be more than 2^32 packets before the SA needs to be rekeyed to prevent the SN from rolling over. This assumes the SN is incremented by 1 for each packet. When the SN is incremented differently - such as when time is used - rekeying needs to happen based on how the SN is incremented to prevent the SN from rolling over. The security of all data protected under a given key decreases slightly with each message and a node must ensure the limit is not reached - even though the SN would permit it. Estimation of the maximum number of packets to be sent by a node is not always predicatable and large margins should be used espcially as nodes could be online for much more time than expected. Even for constrained devices, it is RECOMMENDED to implement some rekey mechanisms (see Section 10).¶
Padding is required to keep the 32 bit alignment of ESP. It is also required for some encryption transforms that need a specific block size of input, such as ENCR_AES_CBC. ESP specifies padding in the Pad Length byte, followed by up to 255 bytes of padding.¶
Checking the padding structure is not mandatory, so constrained devices may omit these checks on received ESP packets. For outgoing ESP packets, padding must be applied as required by ESP.¶
In some situation the padding bytes may take a fixed value. This would typically be the case when the Data Payload is of fixed size.¶
ESP [RFC4303] additionally provides Traffic Flow Confidentiality (TFC) as a way to perform padding to hide traffic characteristics. TFC is not mandatory and is negotiated with the SA management protocol, such as IKEv2. TFC has been widely implemented but it is not widely deployed for ESP traffic. It is NOT RECOMMENDED to implement TFC for a minimal ESP.¶
As a consequence, communication protection that relies on TFC would be more sensitive to traffic patterns without TFC. This can leak application information as well as the manifacturor or model of the device used to a passive monitoring attacker. Such information can be used, for example, by an attacker in case a vulnerability is known for the specific device or application. In addition, some application use - such as health applications - could leak important privacy oriented information.¶
Constrained devices that have limited battery lifetime may prefer to avoid sending extra padding bytes. In most cases, the payload carried by these devices is quite small, and the standard padding mechanism can be used as an alternative to TFC. Alternatively, any information leak based on the size - or presence - of the packet can also be addressed at the application level, before the packet is encrypted with ESP. If application packets vary between 1 to 30 bytes, the application could always send 32 byte responses to ensure all traffic sent is of identical length. To prevent leaking information that a sensor changed state, such as "temperature changed" or "door opened", an application could send this information at regular time interval, rather than when a specific event is happening, even if the sensor state did not change.¶
ESP [RFC4303] defines the Next Header as a mandatory 8 bits field in the packet. The Next header, only visible after decryption, specifies the data contained in the payload. In addition, the Next Header may also carry an indication on how to process the packet [I-D.nikander-esp-beet-mode]. The Next Header can point to a dummy packet, i.e. packets with the Next Header value set to 59 meaning "no next header". The data following to "no next header" is unstructured dummy data.¶
The ability to generate and to receive and ignore dummy packets is required by [RFC4303]. An implementation can omit ever generating and sending dummy packets. For interoperability, a minimal ESP implementation MUST be able to process and discard dummy packets without indicating an error.¶
In constrained environments, sending dummy packets may have too much impact on the device lifetime, in which case dummy packets should not be generated and sent. On the other hand, Constrained devices running specific applications that would leak too much information by not generating and sending dummy packets may implement this functionality or even implement something similar at the application layer. Note also that similarly to padding and TFC that can be used to hide some traffic characteristics (see Section 5), dummy packet may also reveal some patterns that can be used to identify the application. For example, an application may send dummy data to hide some traffic pattern. Suppose such such application sends a 1 byte data when a change occurs. This results in sending a packet notifying a change has occurred. Dummy packet may be used to prevent such information to be leaked by sending a 1 byte packet every second when the information is not changed. After an upgrade the data becomes two bytes. At that point, the dummy packets do not hide anything and having 1 byte regularly versus 2 bytes make even the identification of the application, version easier to identify. This generaly makes the use of dummy packets more appropriated on high speed links.¶
In some cases, devices are dedicated to a single application or a single transport protocol, in which case, the Next Header has a fixed value.¶
Specific processing indications have not been standardized yet [I-D.nikander-esp-beet-mode] and is expected to result from an agreement between the peers. As a result, it SHOULD NOT be part of a minimal implementation of ESP.¶
The ICV depends on the cryptographic suite used. As detailed in [RFC8221] authentication or authenticated encryption are RECOMMENDED and as such the ICV field must be present with a size different from zero. Its length is defined by the security recommendations only.¶
The recommended algorithms to use are expected to evolve over time and implementers SHOULD follow the recommendations provided by [RFC8221] and updates.¶
This section lists some of the criteria that may be considered to select an appropriate cryptographic suite. The list is not expected to be exhaustive and may also evolve over time:¶
Power Consumption and Bandwidth Consumption: Reducing the payload sent may significantly reduce the energy consumption of the device. Encryption algorithm transforms with low overhead are strongly preferred. To reduce the overall payload size one may, for example:¶
There are no IANA consideration for this document.¶
Security Considerations are those of [RFC4303]. In addition, this document provided security recommendations and guidance over the implementation choices for each ESP field.¶
The security of a communication provided by ESP is closely related to the security associated with the management of that key. This usually includes mechanisms to prevent a nonce from repeating, for example. When a node is provisioned with a session key that is used across reboot, the implementer MUST ensure that the mechanisms put in place remain valid across reboot as well.¶
It is RECOMMENDED to use ESP in conjunction with key management protocols such as for example IKEv2 [RFC7296] or minimal IKEv2 [RFC7815]. Such mechanisms are responsible for negotiating fresh session keys as well as prevent a session key being use beyond its lifetime. When such mechanisms cannot be implemented, such as when the the session key is provisioned, the device MUST ensure that keys are not used beyond their lifetime and that the the key remains used in compliance will all security requirements across reboots - e.g. conditions on counters and nonces remains valid.¶
When a device generates its own key or when random value such as nonces are generated, the random generation MUST follow [RFC4086]. In addition, [SP-800-90A-Rev-1] provides guidance on how to build random generators based on deterministic random functions.¶
Preventing the leakage of privacy sensitive information is a hard problem to solve and usually result in balancing the information potentially being leaked to the cost associated with the counter measures. This problem is not inherent to the minimal ESP described in this document and also concerns the use of ESP in general.¶
This document targets minimal implementations of ESP and as such describes some minimalistic way to implement ESP. In some cases, this may result in potentially revealing privacy sensitive pieces of information. This document describes these privacy implications so the implementer can take the appropriate decisions given the specificities of a given environment and deployment.¶
The main risks associated with privacy is the ability to identify an application or a device by analyzing the traffic which is designated as traffic shaping. As discussed in Section 3, the use in some very specific context of non randomly generated SPI might in some cases ease the determination of the device or the application. Similarly, padding provides limited capabilities to obfuscate the traffic compared to those provided by TFC. Such consequence on privacy as detailed in Section 5.¶
The authors would like to thank Daniel Palomares, Scott Fluhrer, Tero Kivinen, Valery Smyslov, Yoav Nir, Michael Richardson, Thomas Peyrin, Eric Thormarker, Nancy Cam-Winget and Bob Briscoe for their valuable comments. In particular Scott Fluhrer suggested including the rekey index in the SPI. Tero Kivinen also provided multiple clarifications and examples of deployment ESP within constrained devices with their associated optimizations. Thomas Peyrin Eric Thormarker and Scott Fluhrer suggested and clarified the use of transform resilient to nonce misuse.¶