Internet-Draft | Notable CBOR Tags | August 2023 |
Bormann | Expires 2 February 2024 | [Page] |
The Concise Binary Object Representation (CBOR, RFC 8949) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.¶
In CBOR, one point of extensibility is the definition of CBOR tags. RFC 8949's original edition, RFC 7049, defined a basic set of tags as well as a registry that can be used to contribute additional tag definitions [IANA.cbor-tags]. Since RFC 7049 was published, some 80 tag definitions have been added to that registry.¶
The present document provides a roadmap to a large subset of these tag definitions. Where applicable, it points to a IETF standards or standard development document that specifies the tag. Where no such document exists, the intention is to collect specification information from the sources of the registrations. After some more development, the present document is intended to be useful as a reference document for the IANA registrations of the CBOR tags the definitions of which have been collected.¶
This is an individual submission to the CBOR working group of the IETF, https://datatracker.ietf.org/wg/cbor/about/. Discussion currently takes places on the github repository https://github.com/cabo/notable-tags. If the CBOR WG believes this is a useful document, discussion is likely to move to the CBOR WG mailing list and a github repository at the CBOR WG github organization, https://github.com/cbor-wg.¶
The current version is true work in progress; some of the sections haven't been filled in yet, and in particular, permission has not been obtained from tag definition authors to copy over their text.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 February 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
(TO DO, expand on text from abstract here; move references here and neuter them in the abstract as per Section 4.3 of [RFC7322].)¶
The selection of the tags presented here is somewhat arbitrary; considerations such as how wide the scope and area of application of a tag definition is combine with an assessment how "ready to use" the tag definition is (i.e., is the tag specification in a state where it can be used).¶
This document can only be a snapshot of a subset of the current registrations. The most up to date set of registrations is always available in the registry "CBOR Tags" [IANA.cbor-tags].¶
The definitions of [STD94] apply.
Specifically: The term "byte" is used in its now customary sense as a synonym for
"octet"; "byte strings" are CBOR data items carrying a sequence of
zero or more (binary) bytes, while "text strings" are CBOR data items carrying a
sequence of zero or more Unicode code points, encoded in UTF-8 [STD63].
Where bit arithmetic is explained, this document uses the notation
familiar from the programming language C ([C], including C++14's 0bnnn
binary literals [Cplusplus20]), except that superscript notation
(example for two to the power of 64: 264) denotes exponentiation; in
the plain text version of this document, superscript notation is
rendered in paragraph text by C-incompatible surrogate notation as
seen in this example.
Ranges expressed using ..
are inclusive of the limits given.
Type names such as "int", "bigint" or "decfrac" are taken from
Appendix D of [RFC8610], the Concise Data Definition Language (CDDL).¶
[RFC7049] defines a number of tags that are listed here for convenience only.¶
Appendix G.3 of [STD94] states:¶
Tag 35 is not defined by this document; the registration based on the definition in RFC 7049 remains in place.¶
The reason for this exclusion is that the definition of Tag 35 in Section 2.4.4.3 of [RFC7049], leaves too much open to ensure interoperability:¶
Tag 35 is for regular expressions in Perl Compatible Regular Expressions (PCRE) / JavaScript syntax [ECMA262].¶
Not only are two partially incompatible specifications given for the semantics, JavaScript regular expressions have also developed significantly within the decade since JavaScript 5.1 (which was referenced as "ECMA262" by [RFC7049]), making it less reliable to assume that a producing application will manage to stay within that 2011 subset.¶
Nonetheless, the registration is in place, so it is available for applications that simply want to mark a text string as being a regular expression roughly of the PCRE/Javascript flavor families. See also Tag 21065 and 21066 above.¶
A number of CBOR tags are defined in security specifications that make use of CBOR.¶
CBOR Object Signing and Encryption (COSE) is defined in a number of RFCs. [RFC8152] was the initial specification, set up the registries, and populated them with an initial set of assignments. A revision split this specification into the data structure definitions RFC9052, an Internet Standard [STD96], and a separate document defining the representation for the algorithms employed [RFC9053], which is expected to be updated more frequently than the COSE format itself. [RFC9054] added a separate set of algorithms for cryptographic hash functions (Hash functions have been part of some [RFC9053] combined algorithms but weren't assigned separate codepoints). A revised COSE counter signature structure was defined in RFC9338, another part of [STD96]; this also defines a tag for these.¶
[RFC8392] defines the CBOR Web Token (CWT), making use of COSE to define a CBOR variant of the JOSE Web Token (JWT), [RFC7519], a standardized security token that has found use in the area of web applications, but is not technically limited to those.¶
Representation formats can be built on top of CBOR.¶
YANG [RFC7950] is a data modeling language originally designed in the context of the Network Configuration Protocol (NETCONF) [RFC6241], now widely used for modeling management and configuration information. [RFC7950] defines an XML-based representation format, and [RFC7951] defines a JSON-based [RFC8259] representation format for YANG.¶
YANG-CBOR [RFC9254] is a representation format for YANG data in CBOR.¶
Protocols may want to allocate CBOR tag numbers to identify specific protocol elements.¶
DDoS Open Threat Signaling (DOTS) defines tag number 271 for the DOTS signal channel object in [RFC9132].¶
As an example for how experimental protocols can make use of CBOR tag definitions, the RAINS (Another Internet Naming Service) Protocol Specification defines tag number 15309736 for a RAINS Message [I-D.trammell-rains-protocol]. (The seemingly random tag number was chosen so that, when represented as an encoded CBOR tag argument, it contains the Unicode character "雨" (U+96E8) in UTF-8, which represents rain in a number of languages.)¶
A number of tags have been registered for arithmetic representations
beyond those built into CBOR and defined by tags in [RFC7049].
These are all documented under http://peteroupc.github.io/CBOR/
; the
last pathname component for the URL is given in Table 5.¶
CBOR's basic generic data model (Section 2 of [STD94]) has a number system with limited-range integers (major types 0 and 1: -264..264-1) and floating point numbers that cover binary16, binary32, and binary64 (including non-finites) from [IEEE754]. With the tags defined with [RFC7049], the extended generic data model (Section 2.1 of [STD94]) adds unlimited-range integers (tag numbers 2 and 3, "bigint" in CDDL) as well as floating point values using the bases 2 (tag number 5, "bigfloat") and 10 (tag number 4, "decfrac").¶
This pre-defined number system has a number of limitations that are addressed in three of the tags discussed here:¶
Tag number 30 allows the representation of rational numbers as a ratio of two integers: a numerator (usually written as the top part of a fraction), and a denominator (the bottom part), where both integers can be limited-range basic and unlimited-range integers. The mathematical value of a rational number is the numerator divided by the denominator. This tag can express all numbers that the extended generic data model of [RFC7049] can express, except for non-finites [IEEE754]; it also can express rational numbers that cannot be expressed with denominators that are a power of 2 or a power of 10.¶
For example, the rational number 1/3 is encoded:¶
d8 1e ---- Tag 30 82 ---- Array length 2 01 ---- 1 03 ---- 3¶
Many programming languages have built-in support for rational numbers or support for them is included in their standard libraries; tag number 30 is a way for these platforms to interchange these rational numbers in CBOR.¶
The tag numbers 268..270 extend these tags further by providing a way to express non-finites within a tag with this number. This does not increase the expressiveness of the data model (the non-finites can already be expressed using major type 7 floating point numbers), but does allow both finite and non-finite values to carry the same tag. In most applications, a choice that includes some of the three tags 30, 264, 265 for finite values and major type 7 floating point values for non-finites (as well as possibly other parts of the CBOR number system) will be the preferred solution.¶
This document suggests using the CDDL typenames defined in Figure 2 for the three most useful tag numbers in this section.¶
https://github.com/svaarala/cbor-specs/blob/master/cbor-absent-tag.rst
defines tag 31 to be applied to the CBOR value Undefined (0xf7),
slightly modifying its semantics to stand for an absent value in a
CBOR Array.¶
(TO DO: Obtain permission to copy the definitions here.)¶
[RFC8746] defines tags for various kinds of arrays. A summary is reproduced in Table 6.¶
(TO DO: Obtain permission to copy the definitions here; explain how tags 52 and 54 essentially obsolete 260/261.)¶
Tag number | Tag content | Short Description | Reference | Author |
---|---|---|---|---|
37 | byte string | Binary UUID (Section 4.1.2 of [RFC4122]) | https://github.com/lucas-clemente/cbor-specs/blob/master/uuid.md | Lucas Clemente |
257 | byte string | Binary MIME message | http://peteroupc.github.io/CBOR/binarymime.html | Peter Occil |
260 | byte string | Network Address (IPv4 or IPv6 or MAC Address) | http://www.employees.org/~ravir/cbor-network.txt | Ravi Raju |
261 | map | Network Address Prefix (IPv4 or IPv6 Address + Mask Length) | https://github.com/toravir/CBOR-Tag-Specs/blob/master/networkPrefix.md | Ravi Raju |
263 | byte string | Hexadecimal string | https://github.com/toravir/CBOR-Tag-Specs/blob/master/hexString.md | Ravi Raju |
266 | text string | Internationalized resource identifier (IRI) | https://peteroupc.github.io/CBOR/iri.html | Peter Occil |
267 | text string | Internationalized resource identifier reference (IRI reference) | https://peteroupc.github.io/CBOR/iri.html | Peter Occil |
Tag | Data Item | Semantics | Reference |
---|---|---|---|
38 | array | Language-tagged string | Appendix A of [RFC9290] |
Tag 38 was originally registered by Peter Occil in http://peteroupc.github.io/CBOR/langtags.html; it has since been adopted and extended in Appendix A of [RFC9290], where a detailed definition of the tag and a few simple examples for its use are provided.¶
The problem that this tag was designed to solve is that text strings often need additional information to be properly presented to a human. While Unicode (and the UTF-8 form of Unicode used in CBOR) define the characters, additional information about the human language in use and the writing direction appropriate for the text given are often required.¶
The need to provide language information with text has been well-known for a while and led to a common form for this information, the language tag, defined in [BCP47].¶
Less well-known is the need to provide separate directionality information as well. The need for this information is demonstrated in [W3C-STRINGS-BIDI], which points out that it is "actually a bad idea to rely on language information to apply direction" and points out further reference information on this. [W3C-BIDI-USE-CASES] shows more examples for language tags and directionality, while [W3C-UBA-BASICS] provides an introduction to the way browsers, where "the order of characters in memory (logical) is not the same as the order in which they are displayed (visual)", "produce the correct order at the time of display" (Unicode Bidirectional Algorithm).¶
Tag 38 meets the requirements of its specific application in [RFC9290], which could be summarized as: Supplying the necessary information to present isolated, linear, comparatively small pieces of human-readable text. It neither addresses more complex requirements of specific languages such as [W3C-SIMPLE-RUBY], nor does it address requirements for more complex structure in texts such as emphasis, lists, or tables. These more complex requirements are typically met by specific media types such as HTML [HTML].¶
Additional tag definitions have been provided for date and time values.¶
Note that tags 100 and 1004 are for calendar dates that are not anchored to a specific time zone; they are meant to specify calendar dates as perceived by humans, e.g. for use in personal identification documents. Converting such a calendar date into a specific point in time needs the addition of a time-of-day (for which a CBOR tag is outstanding) and timezone information (also outstanding). Alternatively, a calendar date plus timezone information can be converted into a time period (range of time values given by the starting and the ending time); note that these time periods are not always exactly 24 h (86400 s) long.¶
[RFC8943] does not suggest CDDL [RFC8610] type names for the two tags. We suggest copying the definitions in Figure 3 into application-specific CDDL as needed.¶
Tag 1001 extends tag 1 by additional information (such as picosecond resolution) and allows the use of Decimal and Bigfloat numbers for the time.¶
(These are actually not as Perl-specific as the title of this section suggests. See also the penultimate paragraph of Section 3.4 of [STD94].)¶
These are all documented under http://cbor.schmorp.de/
; the
last pathname component is given in Table 10.¶
(TO DO: Obtain permission to copy the definitions here.)¶
(TO DO: Obtain permission to copy the definitions here.)¶
Tag number 262 has been registered to identify byte strings that carry embedded
JSON text (https://github.com/toravir/CBOR-Tag-Specs/blob/master/embeddedJSON.md
).¶
Tag number 275 can be used to identify maps that contain keys that are
all of type Text String, as they would occur in JSON
(https://github.com/ecorm/cbor-tag-text-key-map
).¶
(TO DO: Obtain permission to copy the definitions here.)¶
Some variants of UTF-8 are in use in specific areas of application.
Tags have been registered to be able to carry around strings in these
variants in case they are not also valid UTF-8 and can therefore not
be represented as a CBOR text string
(https://github.com/svaarala/cbor-specs/blob/master/cbor-nonutf8-string-tags.rst
).¶
(TO DO: Obtain permission to copy the definitions here.)¶
Tag number | Tag content | Short Description | Reference | Author |
---|---|---|---|---|
39 | multiple | Identifier | [https://github.com/lucas-clemente/cbor-specs/blob/master/id.md | Lucas Clemente |
42 | byte string | IPLD content identifier | [https://github.com/ipld/cid-cbor/ | Volker Mische |
103 | array | Geographic Coordinates | [https://github.com/allthingstalk/cbor/blob/master/CBOR-Tag103-Geographic-Coordinates.md | Danilo Vidovic |
104 | multiple | Geographic Coordinate Reference System WKT or EPSG number | [I-D.clarke-cbor-crs] | |
120 | multiple | Internet of Things Data Point | [https://github.com/allthingstalk/cbor/blob/master/CBOR-Tag120-Internet-of-Things-Data-Points.md | Danilo Vidovic |
258 | array | Mathematical finite set | [https://github.com/input-output-hk/cbor-sets-spec/blob/master/CBOR_SETS.md | Alfredo Di Napoli |
259 | map | Map datatype with key-value operations (e.g. .get ()/.set()/.delete() ) |
[https://github.com/shanewholloway/js-cbor-codec/blob/master/docs/CBOR-259-spec--explicit-maps.md | Shane Holloway |
(Original Text for this section was contributed by Duncan Coutts and Michael Peyton Jones; all errors are the author's.)¶
A set of CBOR tag numbers has been allocated (Section 11) for encoding data composed of enumerated alternatives:¶
Tags | Data Item | Meaning |
---|---|---|
121..127 | any | alternatives 0..6, 1+1 encoding |
1280..1400 | any | alternatives 7..127, 1+2 encoding |
101 | array [uint, any] | alternatives as given by the uint + 128 |
The tags defined in this section are for encoding data that can be in one of a number of different enumerated forms.¶
For example data representing the result of some action might be either a failure with some failure detail, or a success with some result. In this example there are two cases, the failure case and the success case, and we can enumerate them as 0 and 1.¶
In general the number of alternatives, and what data is expected in each alternative case is entirely application dependent.¶
The tags defined in this specification allow the encoding of any number of alternatives, but provide compact encoding for the common cases of low numbers of alternatives:¶
There are no special considerations for deterministic encoding Section 4.2 of [STD94]: The case numbers covered by each tag do not overlap; particularly, tag 101 encoding starts where the more compact special encodings for 0..6 and 7..127 end.¶
For cases 0..6 and 7..127, the tag value indicates the value of the alternative. For cases 128+, a single tag number is used with an enclosed two-element array that contains the case number and the value of the alternative.¶
The value consists of a case number and a case body. The case number is an unsigned integer that indicates which case out of the set of alternatives is used. The case body is any CBOR data value.¶
In a setting where the application uses a schema (formally or informally), then there will be an appropriate sub-schema for each case in the set of alternatives. The representation of the case body should comply with the schema corresponding to the case number used.¶
To continue the example above about representing failure or success, suppose that the failure detail consists of an integer code and a string, and suppose that the successful result is a byte string. A failure value will use case 0 and the case body will be a CBOR list containing an integer and a text string. Alternatively, a success value will use case 1 and the body will be a single CBOR byte string.¶
Decoders that enforce a schema must check the case number is within the range of cases allowed, and that the case body follows the schema for the supplied case number. Generic decoders should allow any case number and any CBOR data value for the case body.¶
CBOR has direct support for combinations of multiple values but not for alternatives of multiple values. Combinations are expressed in CBOR using lists or maps.¶
Most programming languages have a notion of data consisting of combinations of data values, often called records, structs or objects. Many programming languages also have a notion of data consisting of multiple alternative data values. For example C has unions, and other languages have "tagged" unions (where it is always clear which alternative is in use).¶
Crucially for this set of tags, the set of alternatives must be closed and ordered. This allows encoding using an unsigned number to distinguish each case.¶
Note that this does not correspond to the notion in some programming languages of classes and subclasses since in that context the set of alternatives is open and unordered. Alternatives of this kind are well-supported by tag 27 "Serialized language-independent object with type name and constructor arguments".¶
In functional programming languages, the primary way of forming new data types is to enumerate a set of alternatives (each of which may be a record). Such forms of data are also supported in hybrid functional languages or languages with functional features.¶
Thus, in some applications, it is very common to have data making use of alternatives, and it is worth finding a compact encoding, at least for the common cases. Just as most records are small, most alternatives are also small.¶
In this specification we reserve 7 values in the 2-byte part of the available tag encoding space for alternatives 0..6 which are by far the most common. We reserve a range of 121 values in the 3-bytes tag encoding space. To cover the general case we use an encoding using a pair consisting of an unsigned integer and the case body, the first 24 of which also result in a 3-byte encoding.¶
To elaborate on the example from the introduction, we have a "result" that is a failure or success, where:¶
This corresponds to the following schema, in CDDL notation:¶
result = #6.121([int, text]) / #6.122(bytes)¶
Example values:¶
121([3, "the printer is on fire"])¶
122(h'ff00')¶
As a second example, here is one based on a data type defined within the Haskell programming language, representing a simple expression tree.¶
-- A data type representing simple arithmetic expressions data Expr = Lit Int -- integer literal | Add Expr Expr -- addition | Sub Expr Expr -- subtraction | Neg Expr -- unary negation | Mul Expr Expr -- multiplication | Div Expr Expr -- integer division¶
In CDDL notation, and using the tags in this specification, such data could be encoded using this schema:¶
; A data type representing simple arithmetic expressions expr = 121(int) ; integer literal / 122([expr, expr]) ; addition / 123([expr, expr]) ; subtraction / 124(expr) ; unary negation / 125([expr, expr]) ; multiplication / 126([expr, expr]) ; integer division¶
The present document registers tag numbers 65535, 4294967295, and 18446744073709551615 (16-bit 0xffff, 32-bit 0xffffffff, and 64-bit 0xffffffffffffffff) as Invalid Tags, tags that are always invalid, independent of the tag content provided. The purpose of these tag number registrations is to enable the tag numbers to be reserved for internal use by implementations to note the absence of a tag on a data item where a tag could also be expected with that data item as tag content.¶
The Invalid Tags are not intended to ever occur in interchanged CBOR data items. Generic CBOR decoder implementations are encouraged to raise an error if an Invalid Tag occurs in a CBOR data item even if there is no validity checking implemented otherwise.¶
In the registry "CBOR Tags" [IANA.cbor-tags], IANA has allocated the first to third tag in Table 14 from the FCFS space, with the present document as the specification reference. IANA has allocated the tag in the next row, and is requested to allocate the tags in the next four rows, from the Specification Required space, with the present document as the specification reference.¶
Tag | Data Item | Semantics | Reference |
---|---|---|---|
65535 | (none valid) | always invalid | draft-bormann-cbor-notable-tags, Section 10.1 |
4294967295 | (none valid) | always invalid | draft-bormann-cbor-notable-tags, Section 10.1 |
18446744073709551615 | (none valid) | always invalid | draft-bormann-cbor-notable-tags, Section 10.1 |
63 | byte string | Encoded CBOR Sequence [RFC8742] | draft-bormann-cbor-notable-tags, Section 2.1 |
21065 | text string | I-Regexp | draft-bormann-cbor-notable-tags, Section 2.1; [I-D.draft-ietf-jsonpath-iregexp] |
18312 to 18540 (inclusive) | byte string | Bare Hash value (COSE algorithm -256 to -28) | draft-bormann-cbor-notable-tags, Section 3.1.1 |
18541 | array | [COSE algorithm identifier, Bare Hash value] | draft-bormann-cbor-notable-tags, Section 3.1.1 |
18542 to 18823 (inclusive) | byte string | Bare Hash value (COSE algorithm -26 to 255) | draft-bormann-cbor-notable-tags, Section 3.1.1 |
In addition, IANA is requested to allocate the tags from Table 13, with a reference to the present document.¶
The security considerations of [STD94] apply; the tags discussed here may also have specific security considerations that are mentioned in their specific sections above.¶
(Many, TBD)¶
Peter Occil registered tags 30, 264, 265, 268–270 (Section 6.1), 38, 257, 266 and 267 (Section 7), and contributed much of the text about these tags in this document.¶
Further contributors will be listed here as text is added.¶
Plase stay tuned.¶