Internet-Draft | dCBOR | August 2023 |
McNally & Allen | Expires 7 February 2024 | [Page] |
CBOR (RFC 8949) defines "Deterministically Encoded CBOR" in its Section 4.2. The present document provides the application profile "dCBOR" that can be used to help achieve interoperable deterministic encoding.¶
This note is to be removed before publishing as an RFC.¶
Source for this draft and an issue tracker can be found at https://github.com/BlockchainCommons/WIPs-IETF-draft-deterministic-cbor.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 7 February 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
CBOR [RFC8949] has many advantages over other data serialization formats. One of its strengths is specifications and guidelines for serializing data deterministically, such that multiple agents serializing the same data automatically achieve consensus on the exact byte-level form of that serialized data. This is particularly useful when data must be compared for semantic equivalence by comparing the hash of its contents.¶
Nonetheless, determinism is an opt-in feature of CBOR, and most existing CBOR codecs put the primary burden of correct deterministic serialization and validation of deterministic encoding during deserialization on the engineer. This document specifies a set of requirements for the application profile "dCBOR" that MUST be implemented at the codec level. These requirements include but go beyond [RFC8949] §4.2.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The dCBOR Application Profile specifies the use of Deterministic Encoding as defined in Section 4.2 of [RFC8949] together with some application-level rules specified in this section.¶
The application-level rules specified here do not "fork" CBOR. A dCBOR implementation produces well-formed, deterministically encoded CBOR according to [RFC8949], and existing generic CBOR decoders will therefore be able to decode it, including those that check for deterministic encoding. Similarly, generic CBOR encoders will be able to produce valid dCBOR if handed dCBOR conforming data model level information from an application.¶
Note that the separation between standard CBOR processing and the processing required by the dCBOR application profile is a conceptual one: Both dCBOR processing and standard CBOR processing may be combined into a unified dCBOR/CBOR codec. The requirements in this document apply to encoding or decoding of dCBOR data, regardless of whether the codec is a unified dCBOR/CBOR codec operating in dCBOR-compliant modes, or a single-purpose dCBOR codec. Both of these are generically referred to as "dCBOR codecs" in this document.¶
This application profile is intended to be used in conjunction with an application, which typically will use a subset of CBOR, which in turn influences which subset of the application profile is used. As a result, this application profile places no direct requirement on what subset of CBOR is implemented. For instance, there is no requirement that dCBOR implementations support floating point numbers (or any other kind of number, such as arbitrary precision integers or 64-bit negative integers) when they are used with applications that do not use them. However, this document does place requirements on dCBOR implementations that support negative 64-bit integers and 64-bit or smaller floating point numbers.¶
dCBOR encoders MUST only emit CBOR conforming to the requirements "Core Deterministic Encoding Requirements" of [RFC8949] §4.2.1. To summarize,¶
dCBOR encoders:¶
In addition, dCBOR decoders:¶
Standard CBOR [RFC8949] defines maps with duplicate keys as invalid, but leaves how to handle such cases to the implementor (§2.2, §3.1, §5.4, §5.6).¶
dCBOR encoders:¶
dCBOR decoders:¶
dCBOR codecs that support floating point numbers (CBOR major type 7):¶
dCBOR encoders that support floating point numbers:¶
dCBOR decoders that support floating point numbers:¶
The above rules still produce well-formed CBOR according to the standard, and all existing generic decoders will be able to read it. It does exclude a map such as the following from being validated as dCBOR, even though it would be allowed in standard CBOR because:¶
10.0
is an invalid numeric value in dCBOR, and¶
10
more than once as a map key is not allowed.¶
{ 10: "ten", 10.0: "floating ten" }¶
[IEEE754] defines a negative zero value -0.0
.¶
dCBOR encoders that support floating point:¶
0
.¶
dCBOR decoders that support floating point:¶
Therefore with dCBOR, 0.0
, -0.0
, and 0
all encode to the same canonical single-byte value 0x00
.¶
[IEEE754] defines the NaN
(Not a Number) value [NAN]. This is usually divided into two types: quiet NaNs and signalling NaNs, and the sign bit is used to distinguish between these two types. The specification also includes a range of "payload" bits. These bit fields have no definite purpose and could be used to break determinism or exfiltrate data.¶
dCBOR encoders that support floating point:¶
NaN
values to the binary16 quiet NaN
value having the canonical bit pattern 0x7e00
.¶
+INF
values to the binary16 +INF
having the canonical bit pattern 0x7c00
.¶
-INF
values to the binary16 -INF
having the canonical bit pattern 0xfc00
.¶
dCBOR decoders that support floating point:¶
NaN
values not having the canonical bit pattern 0x7e00
.¶
+INF
values not having the canonical bit pattern 0x7c00
.¶
-INF
values not having the canonical bit pattern 0xfc00
.¶
The largest negative integer that can be represented in 64-bit two's complement (STANDARD_NEGATIVE_INT_MAX
) is -263 (0x8000000000000000
).¶
However, standard CBOR major type 1 can encode negative integers as low as CBOR_NEGATIVE_INT_MAX
, which is -264 (two's complement: 0x10000000000000000
, CBOR: 0x3BFFFFFFFFFFFFFFFF
).¶
Negative integers in the range [CBOR_NEGATIVE_INT_MAX
... STANDARD_NEGATIVE_INT_MAX
- 1] require 65 bits of precision, and are thus not representable in typical machine-sized integers.¶
Because of this incompatibility between standard CBOR and typical machine-size representations, dCBOR disallows encoding negative integer values in the range [CBOR_NEGATIVE_INT_MAX
... STANDARD_NEGATIVE_INT_MAX
- 1].¶
dCBOR encoders:¶
dCBOR decoders:¶
CBOR Major Type 7 includes the floating point values (0xf7
, 0xfa
, 0xfb
) and also the "simple values" false
(0xf4
), true
(0xf5
), and null
(0xf6
).¶
dCBOR encoders:¶
false
, true
, null
, and the floating point values.¶
dCBOR decoders:¶
false
, true
, null
, and the floating point values.¶
This section is informative.¶
These are single-purpose dCBOR codecs that conform to these specifications:¶
This document inherits the security considerations of CBOR [RFC8949].¶
Vulnerabilities regarding dCBOR will revolve around whether an attacker can find value in producing semantically equivalent documents that are nonetheless serialized into non-identical byte streams. Such documents could be used to contain malicious payloads or exfiltrate sensitive data. The ability to create such documents could indicate the failure of a dCBOR decoder to correctly validate according to this document, or the failure of the developer to properly specify or implement application protocol requirements using dCBOR. Whether these possibilities present an identifiable attack surface is a question that developers should consider.¶
This document makes no requests of IANA.¶
As of this writing the specification of deterministic CBOR beyond [RFC8949] is an active item before the CBOR working group. [BormannDCBOR] and [RundgrenDCBOR] are other approaches to deterministic CBOR.¶
The authors are grateful for the contributions of Carsten Bormann and Anders Rundgren in the CBOR working group.¶