TOC |
|
The Authenticated Encrypted Data Content Type allows for the use of Authenticated Encryption modes with block cipher algorithms. At the time of the original design there was discussion about the relative location of the authenticated attributes and the encrypted content in the ASN.1 structure. With the benefits of implementation experience I revisit the discussion made at the time and re-evaluate the decision made.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
This Internet-Draft will expire on May 27, 2011.
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1.
Introduction
1.1.
Terminology
2.
Historic Arguments
3.
Algorithm Taxonomy
3.1.
CCM: Counter with CBC-MAC
3.2.
CS: Cipher-State
3.3.
CWC: Carter Wegman with Counter
3.4.
EAX: A Conventional Authenticated-Encryption Mode
3.5.
GCM: Galois/Counter Mode
3.6.
IACBC: Integrity Aware Cipher Block Chaining
3.7.
IAPM: Integrity Aware Parallelizable Mode
3.8.
OCB: Offset Codebook
3.9.
PCFB: Propagating Cipher Feedback
3.10.
SIV: Synthetic IV
3.11.
XCBC: eXtended Cipher Block Chaining Encryption
3.12.
MAC-Authenticated Encryption
4.
My Assumptions
5.
Conclusions
6.
Rebuttals
7.
Security Considerations
8.
IANA Considerations
9.
Normative References
§
Author's Address
TOC |
When the Authenticated Encryption content type defined in RFC 5083 [RFC5083] (Housley, R., “Cryptographic Message Syntax (CMS) Authenticated-Enveloped-Data Content Type,” November 2007.) was being discussed, the S/MIME working group had no actual implementation experience to guide it in some of the decisions that were being made at the time. The final ASN.1 adopted has been replicated in in Figure 1 (AuthEnvelopedData ASN.1 Extract) for the convenience of the reader.
The major focus of the discussions centered on the relative placement of the encrypted data blob (contained in the authEncryptedContentInfo field) and the authenticated attributes (contained in the authAttrs field). As can be seen from the ASN.1 the final decision was to place the authenticated data after the encrypted content. This was counter to the arguments that I made at the time which was to place the authenticated data before the encrypted content.
AuthEnvelopedData ::= SEQUENCE { version CMSVersion, originatorInfo [0] IMPLICIT OriginatorInfo OPTIONAL, recipientInfos RecipientInfos, authEncryptedContentInfo EncryptedContentInfo, authAttrs [1] IMPLICIT AuthAttributes OPTIONAL, mac MessageAuthenticationCode, unauthAttrs [2] IMPLICIT UnauthAttributes OPTIONAL }
Figure 1: AuthEnvelopedData ASN.1 Extract |
In this document I am revisiting that decision based on the implementation experience that I have since garnered and re-evaluate the location of these two fields based on that experience.
This document is organized as follows:
The major part of my discussion focuses on the desirability to use a streaming model for processing the ASN.1 structure and the data contained within it. This will be further detailed in Section 4 (My Assumptions).
TOC |
The following is a list of standardized terms used in the document:
- AE
- is an abbreviation for Authenticated Encryption. This is block cipher mode of operation which simultaneously provides confidentiality and integrity assurances on the data.
- AEAD
- is an abbreviation for Authenticated Encryption with Auxiliary Data. This is a block cipher mode of operation which simultaneously provides confidentiality and integrity assurances on the message data as well as integrity assurances on an additional set of data.
- Message Data
- is the section of the input data that is to be authenticated and encrypted by the AE or AEAD algorithm mode. For CMS, the encrypted message data is placed in the encryptedContent field of the authEncryptedContentInfo sequence.
- Authenticated Data
- is the section of input data that is to be authenticated but not encrypted. For CMS, the authenticated data is the sequence in the authAttrs field.
- Authentication Tag
- is a value that is generated by the mode which is used to validate the integrity of the data. The Authentication Tag is sometimes implicit and does not exist as an independent value. For CMS, it is assumed that the use of the algorithm will define an explicit tag and the tag will be placed in the mac field.
- Streaming Model
- is a method of doing the processing such that the ASN.1 processing and the cryptographic processing can be interleaved with each other.
TOC |
A review of the mailing list threads at the time the issue was being debated lead to the following issues being discussed.
PRO: We have working implementations of both AuthenticatedData and SignedData which work. In both of these cases the data structures are ordered such that the message data precedes the authenticated data. Keeping the order consistent makes coding easier and leads to fewer mistakes.
CON: Being constant is nice, however if it does not work correctly that does not matter.
PRO: It should be possible to create authenticated attributes based on the content of the data to be encrypted and have these attributes authenticated. Placing the attribute before the message content means that one must buffer the message content to do this. The example of this presented on the mailing list was the ability for a sender to process the body of the message on fly by a virus checker and publish the result of the virus checking as an authenticated attribute. This is the same thing that currently happens today for both SignedData and AuthenticatedData where the hash of the message data is computed on the fly and then placed in the signed/authenticated attributes when are then processed to compute the signature or mac values.
CON: Placing this information after the message data means that the recipient can not know to perform matching processing, if necessary, in order to check the value presented by the sender. The analogous step for the SignedData structure is the need for the recipient to hash the message data during processing in order to correctly validate the signed attribute fields.
PRO: The order of placing the attributes before the message data was dictated by a specific choice of algorithms (CCM and GCM) and that other authenticated encryption algorithms (specifically CWC) would naturally place the attributes second.
CON: No detailed analysis of algorithms was done. However, the attribute data should be expected to be much smaller than the message data and thus it makes more sense to cache the attributes for later processing than to cache the message data for later processing.
What happens with resource constrained devices that are acting as senders or recipients? The initial argument dealt with the question of resource limited senders that would not be able to store intermediate data, but the same question applies to resource limited recipients. We know that this was intended to be used with firmware upgrades as one option, but it could equally be used by a device sending out reports to a central server. This is a case where a close analysis would need to be done on the algorithm being used and how it will affect the resources needed.
There was a certain amount of discussion of the question of the relative frequency of processing between the sender and the recipient of a message. This would have bearing on the question of which entity the decisions should be optimized for. One set of people argued that recipients process messages more frequently than senders. Another set of people argued that there exist applications where the sender may create messages that are never verified.
TOC |
As can be seen from some of the arguments above, we needed to have done an analysis of the AEAD algorithms that might be used with the new data structure in order to get better input for the decision that was made. In this section, we will define a set of criteria that we are going to use to analysis the set of algorithms and then describe how each algorithm fits our criteria.
NIST has been gathering information on Authenticated Encryption Modes over the last decade. Information on these modes can be found at http://crc.nist.gov/groups/ST/toolkit/BCM/modes_development.html. For simplicity I used this as the set of algorithms to look at in order to characterize the requirements for the purposes of comparison with the characteristics required by the Authenticated Encryption data structure.
In this section we will look at 11 AE algorithms from the NIST submissions along with an algorithm [GUTMANN] (Gutmann, P., “Using MAC-authenticated Encryption in the Cryptographic Message Syntax (CMS),” .) being developed by Peter Gutmann. Since we are interested in how to setup a streaming model, the criteria we are looking at are chosen with that in mode. The major characteristics we are going to be looking at are:
NIST is currently in the middle of doing a review and selection process for new modes to adopt as US security standards. For simplicity the set of algorithms that I will be looking at come from the current set of candidate algorithms that are being reviewed for this purpose. One additional algorithm added to this is a simple hash and encrypt algorithm that has been proposed by Peter Gutmann.
TOC |
The Counter with CBC-MAC (CCM) mode was deisgned and documented by Doug Whiting, Russ Housley and Niels Ferguson. A full description of the mode can be found in RFC 3610 [RFC3610] (Whiting, D., Housley, R., and N. Ferguson, “Counter with CBC-MAC (CCM),” September 2003.) and on the NIST website. CCM is one of the standardized NIST modes (see [NIST‑800‑38C] (Dworkin, M., “Recommendation for Block Cipher Modes of Operation: The CCM Mode for Authentication and Confidentiality,” May 2004.)) and is one of the two modes that are currently documented for use with the CMS Authenticated Encryption structures.
The characteristics of the algorithm are:
- A.
- The nonce value,
- B.
- The length of authentication tag,
- C.
- The length of message data,
- D.
- The length of authenticated data,
- E.
- The authenticated data,
- F.
- The message data
- A.
- The nonce value
- B.
- The length of the authentication tag
- C.
- The length of the message
- D.
- The length of authenticated data,
- E.
- The authenticated data
- A.
- The nonce value,
- B.
- The length of the authentication tag
- C.
- The length of the message
- D.
- The length of authenticated data,
This algorithm mode provides major problems for a sender to process in a streaming model. The lengths of the message data and the authenticated data are both required to be known before any bytes of the message data or authenticated data can be processed. Except in cases where fixed length messages will be generated, it is required that the message data be cached prior to encrypting.
This algorithm provides some problems for recipients in processing, but under the correct circumstances can be processed under a streaming model. The length of the message data must be presented to the recipient before the message data is given. The authenticated data must be presented before the message data is presented. Optimal use of this algorithm would require that 1) the authenticated data be moved before the message data bytes and 2) a requirement be established that either the message data be DER encoded or the message data length be published as part of the authenticated data.
TOC |
Cipher-State is an algorithm that supports an AE mode of operation, but not an AEAD mode of operation. As such it does not matter where the authenticated parameters would be placed as they are not supported by the mode. This mode is therefore not of interested to this discussion.
TOC |
The Carter Wegman with Counter Authenticated Encryption mode was designed by Tadayoshi Kohno, John Viega and Doug Whiting. A full description of the mode can be found in [CWC] (Kohno, T., Viega, J., and D. Whiting, “The CWC authenticated encryption (assoicated data) mode,” May 2003.) and on the NIST website.
The characteristics of the algorithm are:
- A.
- The nonce,
- B.
- The authenticated data,
- C.
- The encrypted message data
- A.
- The nonce value,
- B.
- The authenticated data
- A.
- The nonce value
It should be noted that the analysis above is for a simplistic implementation of the algorithm such as would normally be done in software. The algorithm is designed so that it can be performed in parallel, it would be possible for message data bytes to be fully processed before the authenticated data bytes are processed. The full details of this approach are not spelled out in the referenced documents.
This algorithm can be easily streamed for the sender provided that the authenticated data are generated prior to the message data being generated.
This algorithm can be easily streamed for the recipient provided that the authenticated data is presented prior to the message data being presented.
TOC |
A Conventional Authenticated-Encryption Mode was designed and documented by M. Bellare, P. Rogaway and D. Wagner. A full description of the algorithm can be found at [EAX] (Bellare, M., Rogaway, P., and D. Wagner, “EAX: A Conventional Authenticated-Encryption Mode,” 2003.) and on the NIST website.
The characteristics of the algorithm are:
- A.
- The nonce,
- B.
- The authenticated attributes,
- C.
- The encrypted message.
- A.
- The nonce value.
- A.
- The nonce value.
This mode computes the authentication value on the authenticated data and on the encrypted message separately - so they can be computed in any order - and combines the results together after the entire message has been processed.
This algorithm can easily be streamed for the sender. The order of generating the authenticated data and message data is immaterial.
This algorithm can easily be streamed for the recipient. The order of presenting the authenticated data and the message data is immaterial.
TOC |
The Galois/Counter Mode of Operation (GCM) was designed and documented by David McGrew and John Viega. A full description of the algorithm can be found on the NIST website. GCM is one of the standardized NIST modes (see [NIST‑800‑38D] (Dworkin, M., “Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC,” November 2007.)) and is one of the two modes that are currently documented for use with the CMS Authenticated Encryption structures.
The characteristics of the algorithm are:
- A.
- The authenticated data,
- B.
- The encrypted message data,
- C.
- The length of the authenticated data,
- D.
- The length of the message data.
- A.
- The nonce value.
- B.
- The authenticated data.
This mode can easily be used in a stream model for senders provided the authenticated data is generated prior to the message data.
This mode can easily be used in a stream model for recipients provided that the authenticated data is presented prior to the message data.
TOC |
Integrity Aware Cipher Block Chaining is an algorithm that supports an AE mode of operation, but not an AEAD mode of operation. As such it does not matter where the authenticated parameters would be placed as they are not supported by the mode. This mode is therefore not of interested to this discussion.
TOC |
Integrity Aware Parallelizable Mode is an algorithm that supports an AE mode of operation, but not an AEAD mode of operation. As such it does not matter where the authenticated parameters would be placed as they are not supported by the mode. This mode is therefore not of interested to this discussion.
TOC |
Offset Codebook mode is an algorithm that supports an AE mode of operation, but not an AEAD mode of operation. As such it does not matter where the authenticated parameters would be placed as they are not supported by the mode. This mode is therefore not of interested to this discussion.
However, an addendum to the original mode submission described a method of adding the AEAD capability to any AE algorithm. This was described by Phillip Rogaway in [OCB‑AD1] (Rogaway, P., “The Associated-Data Problem,” November 2001.) as section 5 and designated as Ciphertext Translation.
The characteristics of this algorithm are:
- A.
- The message data
- B.
- The authenticated data
It needs to be noted that before one can process the last t bytes of the message (for either encryption or decryption) the authenticated data must be known. The value t is equal to the length of the output function for the authenticated data processor. This does mean that an indication that one is in the last t bytes of processing the data is needed for both encryption and decryption modes.
The sender can operate using a streaming model as long as it buffers the last t bytes of message data so that it can be correctly tagged and sent to the cryptographic code as needing special processing. The authenticated data must be computed prior to the last t bytes of the encryption stream being produced. One possible way of dealing with this is to make the last t bytes the authentication tag as there is no explicit authentication tag created.
The recipient can operate using a streaming model as long as it buffers the last t bytes of encrypted data so that it can be correctly tagged when sent to the cryptographic code. As no separate authentication tag is created by the algorithm, the authenticated attributes must be presented prior to the last bytes of the encrypted data stream being decrypted.
TOC |
Propagating Cipher Feedback is an algorithm that supports an AE mode of operation, but not an AEAD mode of operation. As such it does not matter where the authenticated parameters would be placed as they are not supported by the mode. This mode is therefore not of interested to this discussion.
TOC |
The Synthetic IV (SIV) mode was designed and documented by Phillip Rogaway and Thomas Shrimpton. A full description of the algorithm can be found on the NIST website at [SIV] (Rogaway, P. and T. Shrimpton, “The SIV Mode of Operation for Deterministic Authenticated-Encryption (Key Wrap) and Misuse-Resistant Nonce-Based Authenticated-Encryption,” August 2007.).
The characteristics of the algorithm are:
- A.
- None for the sender of the message
- B.
- An IV value for the recipient of the message. (The IV value acts as the authentication tag.)
- A.
- The authenticated data
- B.
- The message data
- A.
- The authenticated attributes.
The algorithm does not use a nonce value, instead the IV used for the counter mode is computed from the authenticated data and message data. The IV is then emitted as the authentication tag. Note that this also means that the message data must processed twice by the cryptographic code. Once to do the authentication computation and produce the IV and one to do the counter mode encryption.
This algorithm cannot be streamed by the sender. Since the IV used for the counter mode encryption of the message data depends on all of the message data, the message data must actually be processed twice by the encryption algorithm.
The algorithm can easily be streamed by the recipient. The requirement is that the authenticated attributes and the IV be presented to the recipient before the message data is presented. The authentication check is then done by comparing the IV passed in with the IV computed.
TOC |
eXtended Cipher Block Chaining Encryption is an algorithm that supports an AE mode of operation, but not an AEAD mode of operation. As such it does not matter where the authenticated parameters would be placed as they are not supported by the mode. This mode is therefore not of interested to this discussion.
TOC |
The MAC-Authenticated Encryption mode has been documented by Peter Gutmann. This mode is documented in [GUTMANN] (Gutmann, P., “Using MAC-authenticated Encryption in the Cryptographic Message Syntax (CMS),” .).
The characteristics of the algorithm are:
- A.
- A key derivation algorithm,
- B.
- A keyed MAC algorithm,
- C.
- An encryption algorithm
- A.
- The encrypted message,
- B.
- The authenticated attributes.
- A.
- The encrypted message data.
This algorithm can easily be used in a streaming model by the sender.
This algorithm can easily be used in a streaming model by the recipient.
Note: In the series of messages that I exchanged with Peter during the design of this algorithm, on of the things he noted was that to make streaming easier he should put the authenticated attributes after the message data. Thus the algorithm was designed to make sure that streaming worked well with the current encoding.
TOC |
This section will list the set of criteria that I am using in making my conclusions. Again, the most important thing in my mind is the ability to implement a streaming model for encode and decode operations.
There is one argument that says one should buffer up the entire encrypted buffer, decrypt in one chunk and then pass on the data in one piece. Since the name of the algorithm class is encrypted and authenticated, one should perhaps actually authenticate that the data is correct prior to releasing the data for additional processing.
I believe that it is sufficient to check that the encrypted buffer has been authenticated prior to acting on the data contained in the encrypted buffer. Thus I believe it makes sense to continue doing the decode and either fail on the decode operation and propagate a failure up either when the decode itself fails or when the authentication check is actually made. In this way it is no different than the processing of a signed message where the signature may be checked long after the message has been fully decoded. In fact this is the normal case for an S/MIME client where the content is often viewable with some indication that the validation of the signature failed for some reason.
TOC |
I now look again at the arguments presented in Section 2 (Historic Arguments) and review the arguements presented. All of the opinions in this section are mine and may or may not be represent those of any other people. Section 6 (Rebuttals) contains the opinions of other people.
This criteria should only be used a tie breaker in the event that all other criteria come out equal. When looking at this argument I am reminded of the following:
A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines (Ralph Waldo Emerson 1841)
This argument is slightly more believable than it was before I began this document as I now have an attribute which is derived from the message content, however this attribute is the length of the message data and in order to be useful it needs to be placed before the message data is consumed. (See Section 3.1 (CCM: Counter with CBC-MAC).)
I found this argument to be difficult to believe at the time it was presented, and I have not changed my mind since then. The argument that this means the authenticated attributes comes second would mean that this is an attribute that is attested to by the sender, but is not verified in any way by the recipient. If the recipient needed to do any processing then it would be much more desirable to have the attribute occur before the message data so that the recipient can setup to do the necessary processing prior to processing the message data.
I would argue that this is not a criteria that should not have be considered when making the decisions.
Looking at the taxonomy of algorithms that is presented in Section 3 (Algorithm Taxonomy) we come up with the following results:
The algorithms which cannot be easily streamed are: CCM, SIV (sender)
The algorithms which need attributes before the message body are: CWC (simple implementation), GCM, SIV (recipient)
The algorithms which need the message body before the attributes are: MAC-Authenticated
The algorithms which can have either the body or the attributes first are: CWC (parallelized implementation), EAX, OCB
From the above, we can see that having the attributes before the message data would allow for a simple implementation in all but the case of CCM and SIV for the sender. (The addition of a length authenticated attribute would allow CCM to fall into the second category for a recipient.) The only one which causes any problems is the MAC-Authenticated algorithm which was actually explicitly designed to work backwards.
From the above, we can see that having the message data before the attributes means that in at least half the cases means that the message data must be cached until the attributes can be processed.
If we had done this analysis at the time the decision was made then we should have made the decision to place the attributes first.
It is no more likely that the sender of a message is resource constrained than it is for the recipient of the message to be resource constrained. This means that it is better for a set of algorithms and layout to be chosen that will work well in a streaming model under normal circumstances than to optimize for either the sender or the recipient.
In my opinion, most of the time messages that are created using an authenticated encryption algorithm will be decrypted by at least one recipient. Messages which are not decrypted will exist, either from being lost in the ether or from being cached until needed, but these will be the smallest part of the set. Messages which need to be decrypted multiple times by a single recipient will generally be a small number as well, unless it because part of the S/MIME standard. However I believe that a significant number of messages will be created that will have multiple recipients. This may be done by creating multiple lock boxes up front, or by creating the lock boxes on demand in cases where it does not matter than a traffic analysis can be done that multiple recipients have gotten the same message. (An example of this might be sending a firmware upgrade to multiple devices, where the message is transferred on demand and it does not matter that an observer can see that the same set of firmware is being installed on multiple machines. This would be something that could probably be assumed anyway.)
I therefore think that overall more messages will be decoded and decrypted than encrypted and encoded. This would mean that a bias should be placed for the recipients of messages not the sender of messages in making decisions.
Based on the above, I would say that we should modify the order of these fields in the event that the document is updated.
It is unfortunate that I did not see the republication of the document from [RFC3852] (Housley, R., “Cryptographic Message Syntax (CMS),” July 2004.) to [RFC5083] (Housley, R., “Cryptographic Message Syntax (CMS) Authenticated-Enveloped-Data Content Type,” November 2007.) as I would have made these arguments at that time.
TOC |
This section has been left open for people who wish to express an opinion other than mine.
TOC |
This document discusses a security related document, however it makes no changes to the document. As such there are no actual security implications for this document.
TOC |
No action by IANA is required for this document.
TOC |
TOC |
Jim Schaad | |
Soaring Hawk Consulting | |
Email: | jimsch@augustcellars.com |