Internet-Draft | SenML Data Content-Format Indication | September 2021 |
Keränen & Bormann | Expires 19 March 2022 | [Page] |
The Sensor Measurement Lists (SenML) media type supports multiple types of values, from numbers to text strings and arbitrary binary data values. In order to facilitate processing of binary data values, this document specifies a pair of new SenML fields for indicating the Content-Format of those binary data values, i.e., their Internet media type including parameters as well as any Content-Coding applied.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 19 March 2022.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
The Sensor Measurement Lists (SenML) media types [RFC8428] can be used to send various kinds of data. In the example given in Figure 1, a temperature value, an indication whether a lock is open, and a data value (with SenML field "vd") read from an NFC reader is sent in a single SenML pack. The example is given in SenML JSON representation, so the "vd" (data value) field is encoded as a base64url string (without padding), as per Section 5 of [RFC8428].¶
The receiver is expected to know how to interpret the data in the "vd" field based on the context, e.g., name of the data source and out-of-band knowledge of the application. However, this context may not always be easily available to entities processing the SenML pack. To facilitate automatic interpretation it is useful to be able to indicate an Internet media type and content-coding right in the SenML Record. The CoAP Content-Format (Section 12.3 of [RFC7252]) provides this information in the form of a single unsigned integer; enclosing a Content-Format number (in this case number 60 as defined for content-type application/cbor in [RFC8949]) in the Record is illustrated in Figure 2. All registered CoAP Content-Format numbers are listed in the COAP Content-Formats registry [IANA.core-parameters] as specified by Section 12.3 of [RFC7252].¶
In this example SenML Record, the data value contains a string "foo" and a number 42 encoded in a CBOR [RFC8949] array. Since the example above uses the JSON format of SenML, the data value containing the binary CBOR value is base64-encoded (Section 5 of [RFC4648]). The data value after base64 decoding is shown with CBOR diagnostic notation in Figure 3.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
A registered label for representations (byte strings) prepared for interchange, identified by a Media-Type-Name [RFC1590], [RFC6838].¶
A combination of a type-name and a subtype-name registered in [IANA.media-types] as per [RFC6838], conventionally identified by the two names separated by a slash.¶
A Media-Type-Name, optionally associated with parameters
(Section 5 of [RFC2045], separated from
the media type name and from each other by a semicolon).
In HTTP and many other protocols, used in a Content-Type
header field.¶
A name registered in the HTTP Content Coding registry [IANA.http-parameters] as specified by Section 8.5 of [RFC7230], indicating an encoding transformation with semantics further specified in Section 3.1.2.1 of [RFC7231]. Confusingly, in HTTP the Content-Coding is found in a header field called "Content-Encoding", however "Content-Coding" is the correct term.¶
the combination of a Content-Type and a Content-Coding, identified by (1) a numeric identifier defined in the COAP Content-Formats registry [IANA.core-parameters] as per Section 12.3 of [RFC7252] (referred to as Content-Format number), or (2) a Content-Format-String.¶
the string representation of the combination of a Content-Type and a Content-Coding.¶
the string representation of a Content-Format; either a Content-Format-String or the (decimal) string representation of a Content-Format number.¶
Readers should also be familiar with the terms and concepts discussed in [RFC8428].¶
When a SenML Record contains a Data Value field ("vd"), the Record MAY also include a Content-Format indication field, using label "ct". The value of this field is a Content-Format-Spec, i.e., one of:¶
The syntax of this field is formally defined in Section 6.¶
The CoAP Content-Format number provides a simple and efficient way to indicate the type of the data. Since some Internet media types and their content coding and parameter alternatives do not have assigned CoAP Content-Format numbers, using Content-Type and Content-Coding is also allowed. Both methods use a string value in the "ct" field to keep its data type consistent across uses. When the "ct" field contains only digits, it is interpreted as a CoAP Content-Format identifier.¶
To indicate that a Content-Coding is used with a Content-Type, the Content-Coding value is appended to the Content-Type value (media type and parameters, if any), separated by a "@" sign. For example (using a Content-Coding value of "deflate" as defined in Section 4.2.2 of [RFC7230]):¶
text/plain; charset=utf-8@deflate¶
If no "@" sign is present after the media type and parameters, then no Content-Coding has been specified, and the "identity" Content-Coding is used -- no encoding transformation is employed.¶
The Base Content-Format Field, label "bct", provides a default value for the Content-Format Field (label "ct") within its range. The range of the base field includes the Record containing it, up to (but not including) the next Record containing a "bct" field, if any, or up to the end of the pack otherwise. Resolution (Section 4.6 of [RFC8428]) of this base field is performed by adding its value with the label "ct" to all Records in this range that carry a "vd" field but do not already contain a Content-Format ("ct") field.¶
Figure 4 shows a variation of Figure 2 with multiple records, with the "nfc-reader" records resolving to the base field value "60" and the "iris-photo" record overriding this with the "image/png" media type (actual data left out for brevity).¶
The following examples are valid values for the "ct" and "bct" fields (explanation/comments in parenthesis):¶
This specification provides a formal definition of the syntax of Content-Format-Spec strings using ABNF notation [RFC5234], which contains three new rules and a number of rules collected and adapted from various RFCs [RFC7231] [RFC6838] [RFC5234] [RFC8866].¶
The indication of a media type in the data does not exempt a consuming application from properly checking its inputs. Also, the ability for an attacker to supply crafted SenML data that specify media types chosen by the attacker may expose vulnerabilities of handlers for these media types to the attacker. This includes "decompression bombs", compressed data that is crafted to decompress to extremely large data items.¶
(Note to RFC Editor: Please replace all occurrences of "RFC-AAAA" with the RFC number of this specification and remove this note.)¶
IANA is requested to assign new labels in the "SenML Labels" subregistry of the SenML registry [IANA.senml] (as defined in Section 12.2 of [RFC8428]) for the Content-Format indication as per Table 1:¶
Name | Label | JSON Type | XML Type | Reference |
---|---|---|---|---|
Base Content-Format | bct | String | string | RFC-AAAA |
Content-Format | ct | String | string | RFC-AAAA |
The authors would like to thank Sérgio Abreu for the discussions leading to the design of this extension and Isaac Rivera for reviews and feedback. Klaus Hartke suggested not burdening this draft with a separate mandatory-to-implement version of the fields. Alexey Melnikov, Jim Schaad, and Thomas Fossati provided helpful comments at Working-Group last call. Marco Tiloca asked for clarifying and using the term Content-Format-Spec.¶