Payload Working Group | M. Westerlund |
Internet-Draft | Ericsson |
Intended status: Informational | July 11, 2011 |
Expires: January 12, 2012 |
How to Write an RTP Payload Format
draft-ietf-payload-rtp-howto-01
This document contains information on how to best write an RTP payload format. It provides reading tips, design practices, and practical tips on how to produce an RTP payload format specification quickly and with good results. A template is also included with instructions that can be used when writing an RTP payload format.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 12, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
RTP [RFC3550] payload formats define how a specific real-time data format is structured in the payload of an RTP packet. A real-time data format without a payload format specification can't be transported using RTP. This creates an interest in many individuals/organizations with media encoders or other types of real-time data to define RTP payload formats. However, the specification of a well-designed RTP payload format is non-trivial and requires knowledge of both RTP and the real-time data format.
This document is intended to help any author of an RTP payload format make important design decisions, consider important features of RTP and RTP security, etc. The document is also intended to be a good starting point for any person with little experience in the IETF and/or RTP to learn the necessary steps.
This document extends and updates the information that is available in "Guidelines for Writers of RTP Payload Format Specifications" [RFC2736]. Since that RFC was written, further experience has been gained on the design and specification of RTP payload formats. Several new RTP profiles have been defined, and robustness tools have also been defined, and these need to be considered.
We also discuss the possible venues for defining an RTP payload format, in IETF, by other standard bodies and proprietary ones.
This document has several different parts discussing different aspects of the creation of an RTP payload format specification. Section 3 discusses the preparations the author(s) should do before starting to write a specification. Section 4 discusses the different processes used when specifying and completing a payload format, with focus on working inside the IETF. Section 5 discusses the design of payload formats themselves in detail. Section 6 discusses current design trends and provides good examples of practices that should be followed when applicable. Following that Section 7 provides a discussion on important sections in the RTP payload format specification itself such as security and IANA considerations. This document ends with an appendix containing a template that can be used when writing RTP payload formats.
RTP is a complex real-time media delivery framework and it has a lot of details that needs to be considered when writing an RTP payload format. It is also important to have a good understanding of the media codec/format so that all of its important features and properties are considered. Only when one has sufficient understanding of both parts one can produce an RTP payload format of high quality. On top of this, one needs to understand the process within IETF and especially the Working Group responsible for standardizing payload formats (currently PAYLOAD) to go quickly from initial idea stage to a finished RFC. This and the next section help an author prepare himself in those regards.
The following sub-sections list a number of documents. Not all need to be read in full detail. However, an author basically needs to be aware of everything listed below.
Newcomers to the IETF are strongly recommended to read the "Tao of the IETF" [RFC4677] that goes through most things that one needs to know about the IETF. This contains information about history, organisational structure, how the WG and meetings work and many more details.
The main part of the IETF process is formally defined in RFC 2026 [RFC2026]. In addition an author needs to understands the IETF rules and rights associated with copyright and IPR documented in BCP 78 [RFC5378] and BCP 79 [RFC3979]. RFC 2418 [RFC2418] describes the WG process, the relation between the IESG and the WG, and the responsibilities of WG chairs and participants.
It is important to note that the RFC series contain documents of several different categories: standards track, informational, experimental, best current practice (BCP), and historic. The standard track contains documents of three different maturity classifications, proposed, draft and Internet Standard. A standards track document must start as proposed; after proof of the interoperability of all of its features it can be moved to draft standard; and finally when further experience has been gathered and it has been widely deployed it can be moved to Internet Standard.
As the content of a given RFC is not allowed to change once published, the only way to modify an RFC is to write and publish a new one that either updates or replaces the old one. Therefore, whether reading or referencing an RFC, it is important to consider both the Category field in the document header and to check if the RFC is the latest on the subject and still valid. One way of checking the current status of an RFC is to use the RFC-editor's RFC search engine, which displays the current status and which if any RFC update or obsolete it. The RFC-editor search engine will also indicate if there exist any RFC-errata. Any approved Errata is issues of significant importance with the RFC and thus should be known also prior to an update and replacement publication.
Before starting to write a draft one should also read the Internet Draft writing guidelines (http://www.ietf.org/ietf/1id-guidelines.txt), the ID checklist (http://www.ietf.org/ID-Checklist.html) and the RFC editorial guidelines and procedures [RFC-ED]. Another document that can be useful is the "Guide for Internet Standards Writers" [RFC2360].
There are also a number of documents to consider in process of writing of drafts intended to become RFCs. These are important when writing certain type of text.
The recommended reading for RTP consist of several different parts; design guidelines, the RTP protocol, profiles, robustness tools, and media specific recommendations.
Any author of RTP payload formats should start by reading RFC 2736 [RFC2736] which contains an introduction to the application layer framing (ALF) principle, the channel characteristics of IP channels, and design guidelines for RTP payload formats. The goal of ALF is to be able to transmit Application Data Units (ADUs) that are independently usable by the receiver in individual RTP packets, thus minimizing dependencies between RTP packets and the effects of packet loss.
Then it is advisable to learn more about the RTP protocol, by studying the RTP specification RFC 3550 [RFC3550] and the existing profiles. As a complement to the standards document there exists a book totally dedicated to RTP [CSP-RTP]. There exist several profiles for RTP today, but all are based on the "RTP Profile for Audio and Video Conferences with Minimal Control" (RFC 3551) [RFC3551] (abbreviated AVP). The other profiles that one should know about are Secure RTP (RTP/SAVP) [RFC3711], "Extended RTP Profile for RTCP-based Feedback (RTP/AVPF)" [RFC4585] and "Extended Secure RTP Profile for RTCP-based Feedback (RTP/SAVPF)" [RFC5124]. It is important to understand RTP and the AVP profile in detail. For the other profiles it is sufficient to have an understanding of what functionality they provide and the limitations they create.
A number of robustness tools have been developed for RTP. The tools are for different use cases and real-time requirements.
There has also been both discussion and design of RTP payload formats, e.g AMR and AMR-WB [RFC4867], supporting the unequal error detection provided by UDP-Lite [RFC3828]. The idea is that by not having a checksum over part of the RTP payload one can allow bit errors from the lower layers. By allowing bit errors one can increase the efficiency of some link layers, and also avoid unnecessary discarding of data when the payload and media codec can get at least some benefit from the data. The main issue is that one has no idea of the level of bit errors present in the unprotected part of the payload. This makes it hard or impossible to determine if one can design something usable or not. Payload format designers are recommended against considering features for unequal error detection unless very clear requirements exist.
There also exist some management and monitoring extensions.
A number of transport optimizations have also been developed for use in certain environments. They are all intended to be transparent and do not require special consideration by the RTP payload format writer. Thus they are primarily listed here for informational reasons and do not require deeper studies.
There exist a couple of different security mechanisms that may be used with RTP. Generic mechanisms by definition are transparent for the RTP payload format and do not need special consideration by the format designer. The main reason that different solutions exist is that different applications have different requirements thus different solutions have been developed. For more discussion on this please see [I-D.ietf-avt-srtp-not-mandatory]. The main properties for a RTP security mechanism are to provide confidentiality for the RTP payload, integrity protection to detect manipulation of payload and headers, and source authentication. Not all mechanisms provide all of these features, a point which will need to be considered when one of these mechanisms is used.
This section does not remove the necessity to read up on RTP. However it does point out a few important details to remember when designing a payload format.
The definition of the RTP session from RFC 3550 is:
"An association among a set of participants communicating with RTP. A participant may be involved in multiple RTP sessions at the same time. In a multimedia session, each medium is typically carried in a separate RTP session with its own RTCP packets unless the encoding itself multiplexes multiple media into a single data stream. A participant distinguishes multiple RTP sessions by reception of different sessions using different pairs of destination transport addresses, where a pair of transport addresses comprises one network address plus a pair of ports for RTP and RTCP. All participants in an RTP session may share a common destination transport address pair, as in the case of IP multicast, or the pairs may be different for each participant, as in the case of individual unicast network addresses and port pairs. In the unicast case, a participant may receive from all other participants in the session using the same pair of ports, or may use a distinct pair of ports for each."
"The distinguishing feature of an RTP session is that each session maintains a full, separate space of SSRC identifiers (defined next). The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC (also defined below) or in RTCP. For example, consider a three-party conference implemented using unicast UDP with each participant receiving from the other two on separate port pairs. If each participant sends RTCP feedback about data received from one other participant only back to that participant, then the conference is composed of three separate point-to-point RTP sessions. If each participant provides RTCP feedback about its reception of one other participant to both of the other participants, then the conference is composed of one multi-party RTP session. The latter case simulates the behavior that would occur with IP multicast communication among the three participants."
"The RTP framework allows the variations defined here (RFC3550), but a particular control protocol or application design will usually impose constraints on these variations."
The RTP header contains a number of fields. Two fields always require additional specification by the RTP payload format, namely the RTP Timestamp and the marker bit. Certain RTP payload formats also use the RTP sequence number to realize certain functionalities. The payload type is used to indicate the used payload format. The Sender Source Identifier (SSRC) is used to distinguish RTP packets from multiple senders. Finally, [RFC5285] specifies how to extend the RTP header to carry metadata relating to the payload when this is desirable.
The remaining fields do not commonly influence the RTP payload format. The padding bit is worth clarifying as it indicates that one or more bytes are appended after the RTP payload. This padding must be removed by a receiver before payload format processing can occur. Thus it is completely separate from any padding that may occur within the payload format itself.
RTP has three multiplexing points that are used for different purposes. A proper understanding of this is important to correctly utilize them.
The first one is separation of media streams of different types, which is accomplished using different RTP sessions. So for example in the common multi-media session with audio and video, RTP multiplexes audio and video in different RTP sessions. To achieve this separation, transport-level functionalities are used, normally UDP port numbers. Different RTP sessions are also used to realize layered scalability as it allows a receiver to select one or more layers for multicast RTP sessions simply by joining the multicast groups over which the desired layers are transported. This separation also allows different Quality of Service (QoS) to be applied to different media types.
The next multiplexing point is separation of different sources within an RTP session. Here RTP uses the SSRC to identify individual sources. An example of individual sources in an audio RTP session would be different microphones, independently of whether they are connected to the same host or different hosts. For each SSRC a unique RTP sequence number and timestamp space is used.
The third multiplexing point is the RTP header payload type field. The payload type identifies what format the content in the RTP payload has. This includes different payload format configurations, different codecs, and also usage of robustness mechanisms like the one described in RFC 2198 [RFC2198].
There are several types of synchronization and we will here describe how RTP handles the different types:
The first step in inter-media synchronization is to determine which SSRCs in each session should be synchronized with each other. This is accomplished by comparing the CNAME fields in the RTCP SDES packets. SSRCs with the same CNAME in different RTP sessions should be synchronized.
The actual RTCP mechanism for inter-media synchronization is based on the idea that each media stream provides a position on the media specific time line (measured in RTP timestamp ticks) and a common reference time line. The common reference time line is expressed in RTCP as a wall clock time in the Network Time Protocol (NTP) format. It is important to notice that the wall clock time is not required to be synchronized between hosts, for example by using NTP [RFC5905] . It can even have nothing at all to do with the actual time, for example the host system's uptime can be used for this purpose. The important factor is that all media streams from a particular source that are being synchronized use the same reference clock to derive their relative RTP timestamp time scales.
Figure 1 illustrates how if one receives RTCP Sender Report (SR) packet P1 in one media stream and RTCP SR packet P2 in the other session, then one can calculate the corresponding RTP timestamp values for any arbitrary point in time T. However to be able to do that it is also required to know the RTP timestamp rates for each medium currently used in the sessions
TS1 --+---------------+-------> | | P1 | | | NTP ---+-----+---------T------> | | P2 | | | TS2 ---------+---------+---X-->
Assume that medium 1 uses an RTP Timestamp clock rate of 16 kHz, and medium 2 uses a clock rate of 90 kHz. Then TS1 and TS2 for point T can be calculated in the following way: TS1(T) = TS1(P1) + 16000 * (NTP(T)-NTP(P1)) and TS2(T) = TS2(P2) + 90000 * (NTP(T)-NTP(P2)). This calculation is useful as it allows the implementation to generate a common synchronization point for which all time values are provided (TS1(T), TS2(T) and T). So when one wishes to calculate the NTP time that the timestamp value present in packet X corresponds to one can do that in the following way: NTP(X) = NTP(T) + (TS2(X) - TS2(T))/90000.
Improved signaling for layered codecs and fast tune-in have been specified in Rapid Synchronization for RTP flows [RFC6051].
RTP payload formats are used in the context of application signalling protocols such as SIP [RFC3261] using the Session Description Protocol (SDP) [RFC4566] with Offer/Answer [RFC3264], RTSP [RFC2326] or SAP [RFC2326]. These examples all use out-of-band signalling to indicate which and how many media streams are desired to be used in the session and how they are configured. To be able to declare or negotiate the media format and RTP payload packetization, the payload format must be given an identifier. In addition to the identifier many payload formats have also the need to signal further configuration information out-of-band for the RTP payloads prior to the media transport session.
The above examples of session-establishing protocols all use SDP, but other session description formats may be used. For example there was discussion of a new XML-based session description format within IETF (SDP-NG). In the event, the proposal did not get beyond the initial protocol specification because of the enormous embedded base of SDP implementations. However, to avoid locking the usage of RTP to SDP based out-of-band signalling, the payload formats are identified using a separate definition format for the identifier and associated parameters. That format is the Media Type.
Media types [RFC4288] are identifiers originally created for identifying media formats included in email. In this usage they were known as MIME types, where the expansion of the MIME acronym includes the word "mail". The term "media type" was introduced to reflect a broader usage, which includes HTTP [RFC2616], MSRP [RFC4975] and many other protocols, to identify arbitrary content carried within the protocols. Media types also provide a media hierarchy that fits RTP payload formats well. Media type names are two-part and consist of content type and sub-type separated with a slash, e.g. "audio/PCMA" or "video/h263-2000". It is important to choose the correct content-type when creating the media type identifying an RTP payload format. However in most cases there is little doubt what content type the format belongs to. Guidelines for choosing the correct media type and registration rules for media type names are provided in RFC 4288 [RFC4288]. The additional rules for media types for RTP payload formats are provided in RFC 4855 [RFC4855].
Media types are allowed any number of parameters, which may be required or optional for that media type. They are always specified on the form "name=value". There exists no restrictions on how the value is defined from media type's perspective, except that parameters must have a value. However, the usage of media types in SDP etc. has resulted in the following restrictions that need to be followed to make media types usable for RTP identifying payload formats:
Since SDP [RFC4566] is so commonly used as an out-of-band signalling protocol, a mapping of the media type into SDP exists. The details on how to map the media type and its parameters into SDP are described in RFC 4855 [RFC4855]. However this is not sufficient to explain how certain parameters must be interpreted for example in the context of Offer/Answer negotiation [RFC3264].
The Offer/Answer (O/A) model allows SIP to negotiate which media formats and payload formats are to be used in a session and how they are to be configured. However O/A does not define a default behavior and instead points out the need to define how parameters behave. To make things even more complex the direction of media within a session has an impact on these rules, so that some cases may require separate descriptions for media streams that are send-only, receive-only or both sent and received as identified by the SDP attributes a=sendonly, a=recvonly, and a=sendrecv. In addition the usage of multicast adds further limitations as the same media stream is delivered to all participants. If those multicast-imposed restrictions are too limiting for unicast then separate rules for unicast and multicast will be required.
The simplest and most common O/A interpretation is that a parameter is defined to be declarative; i.e. the SDP offer/answer sending agent can declare a value and that has no direct impact on the other agent's values. This declared value applies to all media that are going to be sent to the declaring entity. For example most video codecs have a level parameter which tells the other participants the highest complexity the video decoder supports. The level parameter can be declared independently by two participants in a unicast session as it will be the media sender's responsibility to transmit a video stream that fulfills the limitation the other has declared. However in multicast it will be necessary to send a stream that follows the limitation of the weakest receiver, i.e. the one that supports the lowest level. To simplify the negotiation in these cases it is common to require any answerer to a multicast session to take a yes or no approach to parameters.
A "negotiated" parameter is a different case, for which both sides need to agree on its value. Such a parameter requires that the answerer either accept it as it is offered or remove the payload type the parameter belonged to from its answer. The removal of the payload type from the answer indicates to the offerer the lack of support for the parameter values presented. An unfortunate implication of the need to use complete payload types to indicate each possible configuration so as to maximize the chances of achieving interoperability, is that the number of necessary payload types can quickly grow large. This is one reason to limit the total number of sets of capabilities that may be implemented.
The most problematic type of parameters are those that relate to the media the entity sends. They do not really fit the O/A model but can be shoe-horned in. Examples of such parameters can be found in the H.264 video codec's payload format [RFC6184], where the name of all parameters with this property starts with "sprop-". The issue with these parameters is that they declare properties for a media stream that the other party may not accept. The best one can make of the situation is to explain the assumption that the other party will accept the same parameter value for the media it will receive as the offerer of the session has proposed. If the answerer needs to change any declarative parameter relating to streams it will receive then the offerer may be required to make an new offer to update the parameter values for its outgoing media stream.
Another issue to consider is the sendonly media streams in offers. Parameters that relate to what the answering entity accepts to receive have no meaning other than to provide a template for the answer. It is worth pointing out in the specification that these really provide a set of parameter values that the sender recommends. Note that sendonly streams in answers will need to indicate the offerer's parameters to ensure that the offerer can match the answer to the offer.
A further issue with offer/answer which complicates things is that the answerer is allowed to renumber the payload types between offer and answer. This is not recommended but allowed for support of gateways to the ITU conferencing suite. This means that it must be possible to bind answers for payload types to the payload types in the offer even when the payload type number has been changed, and some of the proposed payload types have been removed. This binding must normally be done by matching the configurations originally offered against those in the answer.
SAP (Session Announcement Protocol) [RFC2974] is used for announcing multicast sessions. Independently of the usage of Source Specific Multicast (SSM) [RFC3569] or Any-Source Multicast (ASM), the SDP provided by SAP applies to all participants. All media that is sent to the session must follow the media stream definition as specified by the SDP. This enables everyone to receive the session if they support the configuration. Here SDP provides a one way channel with no possibility to affect the configuration that the session creator has decided upon. Any RTP Payload format that requires parameters for the send direction and which needs individual values per implementation or instance will fail in a SAP session for a multicast session allowing anyone to send.
Real-Time Streaming Protocol (RTSP) [RFC2326] allows the negotiation of transport parameters for media streams which are part of a streaming session between a server and client. RTSP has divided the transport parameters from the media configuration. SDP is commonly used for media configuration in RTSP and is sent to the client prior to session establishment, either through use of the DESCRIBE method or by means of an out-of-band channel like HTTP, email etc. The SDP is used to determine which media streams and what formats are being used prior to session establishment.
Thus both SAP and RTSP use SDP to configure receivers and senders with a predetermined configuration for a media stream including the payload format and any of its parameters. All parameters are used in a declarative fashion. This can result in different treatment of parameters between offer/answer and declarative usage in RTSP and SAP. Any such difference will need to be spelled out by the payload format specification.
The general channel characteristics that RTP flows experience are documented in Section 3 of RFC2736 [RFC2736]. The discussion below provides additional information.
At the time of writing this document the most common IP Maximum Transmission Unit (MTU) in used link layers is 1500 bytes (Ethernet data payload). However there exist both links with smaller MTUs and links with much larger MTUs. Certain parts of the Internet already today support an IP MTU of 9000 bytes or more. There is a slow ongoing evolution towards larger MTU sizes. This should be considered in the design, especially in regards to features such as aggregation of independently decodable data units.
This section discusses the recommended process to produce an RTP payload format in the described venues. This is to document the best current practice on how to get a well designed and specified payload format as quickly as possible. For specifications that are defined by standards bodies other than the IETF the primary milestone is registration of the RTP payload format name. For proprietary media formats the primary goal depends on whether interoperability is desired at the RTP level. However there is also the issue of ensuring best possible quality of any specification.
For all standardized media formats, it is recommended that the payload format be specified in the IETF. The main reason is to provide an openly available RTP payload format specification that has been reviewed by people experienced with RTP payload formats. At the time of writing, this work is done in the PAYLOAD Working Group (WG), but that may change in the future.
There are a number of steps that an RTP payload format should go through from the initial idea until it is published. This also documents the process that the PAYLOAD Working Group applies when working with RTP payload formats.
WG meetings are for discussing issues, not presentations. This means that most RTP payload formats should never need to be discussed in a WG meeting. RTP payload formats that would be discussed are either those with controversial issues that failed to be resolved on the mailing list, or those including new design concepts worth a general discussion.
There exists no requirement to present or discuss a draft at a WG meeting before it becomes published as an RFC. Thus even authors who lack the possibility to go to WG meetings should be able to successfully specify an RTP payload format in IETF. WG meetings may become necessary only if the draft gets stuck in a serious debate that cannot easily be resolved.
To simplify the work of the PAYLOAD WG chairs and its WG members a specific draft file naming convention shall be used for RTP payload formats. Individual submissions shall be named draft-<lead author family name>-payload-rtp-<descriptive name>-<version>. The WG documents shall be named according to this template: draft-ietf-payload-rtp-<descriptive name>-<version>. The inclusion of "payload" in the draft filename ensures that the search for "payload-" will find all PAYLOAD related drafts. Inclusion of "rtp" tells us that it is an RTP payload format draft. The descriptive name should be as short as possible while still describing what the payload format is for. It is recommended to use the media format or codec acronym. Please note that the version must start at 00 and is increased by one for each submission to the IETF secretary of the draft. No version numbers may be skipped.
There a number of ways to lose a lot of time in the above process. This section discusses what to do and what to avoid.
Other standards bodies may define RTP payloads in their own specifications. When they do this they are strongly recommended to contact the PAYLOAD WG chairs and request review of the work. It is recommended that at least two review steps are performed. The first should be early in the process when more fundamental issues can be easily resolved without abandoning a lot of effort. Then when nearing completion, but while it is still possible to update the specification, a second review should be scheduled. In that pass the quality can be assessed and hopefully no updates will be needed. Using this procedure can avoid both conflicting definitions and serious mistakes, like breaking certain aspects of the RTP model.
RTP payload Media Types may be registered in the standards tree by other standard bodies. The requirements on the organization are outlined in the media types registration document (RFC 4855 [RFC4855] and RFC 4288 [RFC4288]). This registration requires a request to the IESG, which ensures that the filled-in registration template is acceptable. To avoid last-minute problems with these registrations the registration template must be sent for review both to the PAYLOAD WG and the media types list (ietf-types@iana.org) and is something that should be included in the IETF reviews of the payload format specification.
Registration of the RTP payload name is something that is required to avoid name collision in the future. Note that "x-" names are not suitable for any documented format as they have the same problem with name collision and can't be registered. The list of already registered media types can be found at IANA Web site (http://www.iana.org).
Proprietary RTP payload formats are commonly specified when the real-time media format is proprietary and not intended to be part of any standardized system. However there are reasons why also proprietary formats should be correctly documented and registered:
To avoid name collisions there is a central register keeping tracks of the registered Media Type names used by different RTP payload formats. When it comes to proprietary formats they should be registered in the vendor's own tree. All vendor specific registrations use sub-type names that start with "vnd.<vendor-name>". Names in the vendor's own tree are not required to be registered with IANA. However registration is recommended if the Media Type is used at all in public environments.
If interoperability at the RTP level is desired, a payload type specification should be standardized in the IETF following the process described above. The IETF does not require full disclosure of the codec when defining an RTP payload format to carry that codec, but a description must be provided that is sufficient to allow the IETF to judge whether the payload format is well designed. The Media Type identifier assigned to a standardized payload format of this sort will lie in the standards tree rather than the vendor tree.
The best summary of payload format design is KISS (Keep It Simple, Stupid). A simple payload format is easier to review for correctness, easier to implement, and has low complexity. Unfortunately, contradictory requirements sometimes make it hard to do things simply. Complexity issues and problems that occur for RTP payload formats are:
There are a number of common features in RTP payload formats. There is no general requirements to support these features; instead, their applicability must be considered for each payload format. It may in fact be that certain features are not even applicable.
Aggregation allows for the inclusion of multiple application data units (ADUs) within the same RTP payload. This is commonly supported for codecs that produce ADUs of sizes smaller than the IP MTU. Do remember that the MTU may be significantly larger than 1500 bytes. An MTU of 9000 bytes is available today and an MTU of 64k may be available in the future. Many speech codecs have the property of ADUs of a few fixed sizes. Video encoders may generally produce ADUs of quite flexible sizes. Thus the need for aggregation may be less. However in certain use cases the possibility to aggregate multiple ADUs especially for different playback times is useful.
The main disadvantage of aggregation is the extra delay introduced (due to buffering until a sufficient number of ADUs have been collected at the sender) and reduced robustness against packet loss. Aggregation also introduces buffering requirements at the receiver.
If the real-time media format has the property that it may produce ADUs that are larger than common MTU sizes then fragmentation support should be considered. An RTP Payload format may always fall back on IP fragmentation, however as discussed in RFC 2736 this has some drawbacks. The usage of RTP payload format-level fragmentation allows for more efficient usage of RTP packet loss recovery mechanisms. However it may in some cases also allow earlier usage of partial ADUs by doing media specific fragmentation at media specific boundaries.
Interleaving has been implemented in a number of payload formats to allow for less quality reduction when packet loss occurs. When losses are bursty and several consecutive packets are lost, the impact on quality can be quite severe. Interleaving is used to convert that burst loss to several spread-out individual packet losses. It can also be used when several ADUs are aggregated in the same packets. A loss of an RTP packet with several ADUs in the payload has the same affect as a burst loss if the ADUs would have been transmitted in individual packets. To reduce the burstiness of the loss, the data present in an aggregated payload may be interleaved, thus spread the loss over a longer time period.
A requirement for doing interleaving within an RTP payload format is the aggregation of multiple ADUs. For formats that do not use aggregation there is still a possibility of implementing a transmission order re-scheduling mechanism. That has the effect that the packets transmitted consecutively originate from different points in the media stream. This can be used to mitigate burst losses, which may be useful if one transmits packets at frequent intervals. However it may also be used to transmit more significant data earlier in combination with RTP retransmission to allow for more graceful degradation and increased possibility to receive the most important data, e.g. intra frames of video.
The drawback of interleaving is the significantly increased transmission buffering delay, making it less useful for low-delay applications. It may also create significant buffering requirements on the receiver. That buffering is also problematic as it is usually difficult to indicate when a receiver may start consume data and still avoid buffer underrun caused by the interleaving mechanism itself. Transmission re-scheduling is only useful in a few specific cases, as in streaming with retransmissions. The potential gains must be weighted against the complexity of these schemes.
A few RTP payload formats have implemented back channels within the media format. Those have been for specific features, like the AMR [RFC4867] codec mode request (CMR) field. The CMR field is used in the operation of gateways to circuit-switched voice to allow an IP terminal to react to the circuit-switched network's need for a specific encoder mode. A common motivation for media back channels is the need to have signalling in direct relation to the media or the media path.
If back channels are considered for an RTP payload format they should be for a specific requirements which cannot be easily satisfied by more generic mechanisms within RTP or RTCP.
Some codecs support various types of media scalability, i.e. some data of a media stream may be removed to adapt the media's properties, such as bitrate and quality. The adaptation may be applied in the following dimensions of the media:
At the time of writing this document, codecs that support scalability have a bit of revival. It has been realized that getting the required functionality for supporting the features of the media stream into the RTP framework is quite challenging. One of the recent examples for layered and scalable codecs is Scalable Video Coding [RFC6190] (SVC).
SVC is a good example for a payload format supporting media scalability features, which have been in its basic form already included in RTP. A layered codec supports the dropping of data parts of a media stream, i.e. RTP packets may be not transmitted or forwarded to a client in order to adapt the media stream rate as well as the media stream quality, while still providing a decodable subset of the media stream to a client. One example for using the scalability feature may be an RTP Mixer (Multipoint Control Unit) which controls the rate and quality sent out to participants in a conversation based on dropping RTP packets. Another example may be an transport channel which allows for differentiation in Quality of Service (QoS) parameters based on RTP sessions in a multicast session. In such a case, the more important packets of the scalable media stream (base layer) may get better QoS parameters, then the less important packets (enhancement layer) in order to provide some kind of graceful degradation. The scalability features required for allowing an adaptive transport as described in the two examples above are based on RTP multiplexing in order to identify the packets to be dropped or transmitted/forwarded. The multiplexing features defined for Scalable Video Coding [RFC6190] are:
In the first case (SST), additional in-band as well as out-of-band signaling is required in order to allow identification of packets belonging to a specific media layer. Furthermore, an adaptation of the media stream requires dropping of specific packets in order to provide the client with a compliant media stream. In case of using encryption, it is typically required for an adapting network device to be in the security context to allow packet dropping and providing an intact RTP session to the client. This typically requires the network device to be an RTP mixer.
In general having a media unaware network device dropping excessive packets will be more problematic than have a Media Aware Network Entity (MANE). First is the need to understand the media format and know which ADUs or payloads that belongs to the layers that no other layer will be dependent on after the dropping. Secondly, if the MANE can work as RTP mixer or translator it can rewrite the RTP and RTCP in such a way that the receiver will not suspect non-intentional RTP packet losses needing repair actions. This as the receiver can't determine if a lost packet was an important base layer packet or one of the less important extension layers.
In the second case (MST), the out-of-band signaling typically provides enough information to identify the media layers and its properties. The decision for dropping packets is based on the Network Address which identifies the RTP session to be dropped. In order to allow correct data provision to a decoder after reception from different sessions, data re-alignment mechanisms are described for Scalable Video Coding [RFC6190]. A more generic one is also described in Rapid Sync for RTP flows [RFC6051], which is purely based on existing RTP mechanisms, i.e. the NTP timestamp, for inter-session synchronization. Another signaling feature is the generic indication of dependencies of RTP sessions in SDP, as defined in the Media Decoding Dependency Grouping in SDP [RFC5583].
When QoS settings, e.g. diffserv markings, are used to ensure that the extension layers are dropped prior the baselayer the receiving end-point has the benefit in MST to know which layer or set of layers the missing packets belong as it will be bound to different RTP sessions. Thus explicitly indicating the importance of the loss.
Some media codecs require high packet rates, and in these cases the RTP sequence number wraps too quickly. As rule of thumb, it must not be possible to wrap the sequence number space in less than 2 minutes (TCP maximum segment lifetime). If earlier wrapping may occur then the payload format should specify an extended sequence number field to allow the receiver to determine where a specific payload belongs in the sequence, even in the face of extensive reordering. The RTP payload format for uncompressed video [RFC4175] can be used as an example for such a field.
This section provides a few examples of payload formats that are worth noting for good design in general or specific details of their design.
The AMR [RFC4867], AMR-WB [RFC4867], EVRC [RFC3558], SMV [RFC3558] payload formats are all quite similar. They are all for frame-based audio codecs and use a table of content structure. Each frame has a table of contents entry that indicates the type of the frame and if additional frames are present. This is quite flexible but produces unnecessary overhead if the ADU is of fixed size and if when aggregating multiple ADUs they are commonly of the same type. In that case a solution like that in AMR-WB+ [RFC4352] may be more suitable.
AMR-WB+ does contain one less desirable feature which is dependent on the media codec itself. The media codec produces a large range of different frame lengths in time perspective. The RTP timestamp rate is selected to have the very unusual value of 72 kHz despite the fact that output normally is at a sample rate of 48kHz. The 72 kHz timestamp rate is the smallest found value that would make all of the frames the codec could produce result in an integer frame length in RTP timestamp ticks. This way, a receiver can always correctly place the frames in relation to any other frame, even when the frame length changes. The downside is that the decoder outputs for certain frame lengths is in fact partial samples. The result is that the output in samples from the codec will vary from frame to frame, potentially making implementation more difficult.
The RTP payload format for MIDI [RFC6295] contains some interesting features. MIDI is an audio format sensitive to packet losses, as the loss of a "note off" command will result in a note being stuck in an "on" state. To counter this a recovery journal is defined that provides a summarized state that allows the receiver to recover from packet losses quickly. It also uses RTCP and the reported highest sequence number to be able to prune the state the recovery journal needs to contain. These features appear limited in applicability to media formats that are highly stateful and primarily use symbolic media representations.
The definition of RTP payload formats for video has seen an evolution from the early ones such as H.261 towards the latest for VC-1 and H.264.
The H.264 RTP payload format [RFC3984] can be seen as a smorgasbord of functionality, some of it such as the interleaving being pretty advanced. The reason for this was to ensure that the majority of applications considered by the ITU-T and MPEG that can be supported by RTP are indeed supported. This has created a payload format that rarely is fully implemented. Despite that, no major issues with interoperability has been reported with one exception namely the offer/answer and parameter signalling, which resulted in a revised specification [RFC6184]. However, complaints about its complexity are common.
The RTP payload format for uncompressed video [RFC4175] must be mentioned in this context as it contains a special feature not commonly seen in RTP payload formats. Due to the high bit-rate and thus packet rate of uncompressed video (gigabits rather than megabits) the payload format includes a field to extend the RTP sequence number since the normal 16-bit one can wrap in less than a second. [RFC4175] also specifies a registry of different color sub-samplings that can be re-used in other video RTP payload formats.
It would be overstating things to say that there exists a trend in text payload formats as only a single format text format has been standardized in IETF, namely T.140 [RFC4103]. The 3GPP Timed Text format [RFC4396] could be considered to be text, even though in the end was registered as a video format. It was registered in that part of the tree because it deals with decorated text, usable for subtitles and other embellishments of video. However, it has many of the properties that text formats generally have.
The RTP payload format for T.140 was designed with high reliability in mind as real-time text commonly is an extremely low bit-rate application. Thus, it recommends the use of RFC 2198 with many generations of redundancy. However, the format failed to provide a text block specific sequence number and relies instead of the RTP one to detect loss. This makes detection of missing text blocks unnecessarily difficult and hinders deployment with other robustness mechanisms that would involve switching the payload type as that may result in erroneous error marking in the T.140 text stream.
A number of sections in the payload format draft that need some special consideration. These include the Security and IANA Considerations sections.
The intention of this section is to enable reviewers and other readers to get an overview of the capabilities and major properties of the media format. It should be kept short and concise and is not a complete replacement for reading the media format specification.
All Internet drafts require a Security Considerations section. The security considerations section in an RTP payload format needs to concentrate on the security properties this particular format has. Some payload formats have very few specific issues or properties and can fully fall back on the security considerations for RTP in general and those of the profile being used. Because those documents are always applicable, a reference to these is normally placed first in the security considerations section. There is suggested text in the template below.
The security issues of confidentiality, integrity protection and source authentication are common issue for all payload formats. These should be solved by mechanisms external to the payload and do not need any special consideration in the payload format except for an reminder on these issues. Suitable stock text to inform people about this is included in the template.
Potential security issues with an RTP payload format and the media encoding that needs to be considered are:
Suitable stock text for the security considerations section is provided in the template in the appendix. However, authors do need to actively consider any security issues from the start. Failure to address these issues may block approval and publication.
RTP and its profiles do discuss congestion control. Congestion control is an important issue in any usage in non-dedicated networks. For that reason it is recommended that all RTP payload format documents discuss the possibilities that exist to regulate the bit-rate of the transmissions using the described RTP payload format. Some formats may have limited or step wise regulation of bit-rate. Such limiting factors should be discussed.
Since all RTP Payload formats contain a Media Type specification, they also need an IANA Considerations section. The Media Type name must be registered and this is done by requesting that IANA register that media name. When that registration request is written it shall also be requested that the media type is included under the "RTP Payload Format media types" list part of the RTP registry (http://www.iana.org/assignments/rtp-parameters).
In addition to the above request for media type registration, some payload formats may have parameters where in the future new parameter values need to be added. In these cases a registry for that parameter must be created. This is done by defining the registry in the IANA Considerations section. BCP 26 (RFC 5226) [RFC5226] provides guidelines to specifying such registries. Care should be taken when defining the policy for new registrations.
Before specifying a new registry it is worth checking the existing ones in the IANA "MIME Media Type Sub-Parameter Registries" list. For example video formats needing a media parameter expressing color sub-sampling may be able to reuse those defined for video/raw [RFC4175].
This section provides information on and and recommends some tools that may be used. Don't feel pressured to follow these recommendations. There exist a number of alternatives. But these suggestions are worth checking out before deciding that the field is greener somewhere else.
There are many choices when it comes to tools to choose for authoring Internet drafts. However in the end they need to be able to produce a draft that conforms to the Internet Draft requirements. If you don't have any previous experience with authoring Internet drafts XML2RFC does have some advantages. It helps by create a lot of the necessary boiler plate in accordance with the latest rules, thus reducing the effort. It also speeds up publication after approval as the RFC-editor can use the source XML document to produce the RFC more quickly.
Another common choice is to use Microsoft Word and a suitable template, see [RFC5385] to produce the draft and print that to file using the generic text printer. It has some advantages when it comes to spell checking and change bars. However Word may also produce some problems, like changing formatting, and inconsistent results between what one sees in the editor and in the generated text document, at least according to the authors' personal experience.
There are a few tools that are very good to know about when writing a draft. These help check and verify parts of one's work. These tools can be found at http://tools.ietf.org.
This document currently has a few open issues that needs resolving before publication:
This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an RFC.
As this is an informational document about writing drafts that are intended to become RFCs there are no direct security considerations. However the document does discuss the writing of security considerations sections and what should be particularly considered when specifying RTP payload formats.
The author would like to thank Tom Taylor for the editing pass of the whole document and contributing text regarding proprietary RTP payload formats. Thanks also goes to Thomas Schierl who contributed text regarding Media Scalability features in payload formats.
The author would like to thank the individuals who have provided input to this document. These individuals include John Lazzaro, Ali C. Begen and Tom Taylor.
This section contains a template for writing an RTP payload format in form as a Internet draft. Text within [...] are instructions and must be removed. Some text proposals that are included are conditional. "..." is used to indicate where further text should be written.
[The title shall be descriptive but as compact as possible. RTP is allowed and recommended abbreviation in the title]
RTP Payload format for ...
Status of this Memo
[Insert the IPR notice and copyright boiler plate from BCP 78 and 79 that applies to this draft.]
[Insert the current Internet Draft document explanation. At the time of publishing it was:]
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
[A payload format abstract should mention the capabilities of the format, for which media format is used, and a little about that codec formats capabilities. Any abbreviation used in the payload format must be spelled out here except the very well known like RTP. No references are allowed, no use of RFC 2119 language either.]
[All drafts over 15 pages in length must have an Table of Content.]
[The introduction should provide a background and overview of the payload formats capabilities. No normative language in this section, i.e. no MUST, SHOULDs etc.]
[Define conventions, definitions and acronyms used in the document in this section. The most common definition used in RTP Payload formats are the RFC 2119 definitions of the upper case normative words, e.g. MUST and SHOULD.]
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
[The intention of this section is to enable reviewers and persons to get an overview of the capabilities and major properties of the media format. It should be kept short and concise and is not a complete replacement for reading the media format specification.]
[Overview of payload structure]
[RTP header usage needs to be defined. The fields that absolutely need to be defined are timestamp and marker bit. Further field may be specified if used. All the rest should be left to their RTP specification definition]
The remaining RTP header fields are used as specified in RFC 3550.
[Define how the payload header, if it exist, is structured and used.]
[The payload data, i.e. what the media codec has produced. Commonly done through reference to media codec specification which defines how the data is structured. Rules for padding may need to be defined to bring data to octet alignment.]
[One or more examples are good to help ease the understanding of the RTP payload format.]
[This section is to describe the possibility to vary the bit-rate as a response to congestion. Below is also a proposal for an initial text that reference RTP and profiles definition of congestion control.]
Congestion control for RTP SHALL be used in accordance with RFC 3550 [RFC3550], and with any applicable RTP profile; e.g., RFC 3551 [RFC3551]. An additional requirement if best-effort service is being used is: users of this payload format MUST monitor packet loss to ensure that the packet loss rate is within acceptable parameters.
This RTP payload format is identified using the ... media type which is registered in accordance with RFC 4855 [RFC4855] and using the template of RFC 4288 [RFC4288].
[Here the media type registration template from RFC 4288 is placed and filled out. This template is provided with some common RTP boilerplate.]
Type name:
Subtype name:
Required parameters:
Optional parameters:
Encoding considerations:
This media type is framed and binary, see section 4.8 in RFC4288 [RFC4288].
Security considerations:
Please see security consideration in RFCXXXX
Interoperability considerations:
Published specification:
Applications that use this media type:
Additional information:
Magic number(s):
[Only applicable for media types that has file format specification. Remove otherwise.]
File extension(s):
[Only applicable for media types that has file format specification. Remove otherwise.]
Macintosh file type code(s):
[Only applicable for media types that has file format specification. Remove otherwise.]
Person & email address to contact for further information:
Intended usage: (One of COMMON, LIMITED USE or OBSOLETE.)
Restrictions on usage:
[The below text is for media types that is only defined for RTP payload formats. There exist certain media types that are defined both as RTP payload formats and file transfer. The rules for such types are documented in RFC 4855 [RFC4855].]
This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550]. Transport within other framing protocols is not defined at this time.
Author:
Change controller:
IETF Audio/Video Transport working group delegated from the IESG.
(Any other information that the author deems interesting may be added below this line.)
[From RFC 4288: Some discussion of Macintosh file type codes and their purpose can be found in [MACOSFILETYPES]. Additionally, please refrain from writing "none" or anything similar when no file extension or Macintosh file type is specified, lest "none" be confused with an actual code value. Instead remove the heading.]
The mapping of the above defined payload format media type and its parameters SHALL be done according to Section 3 of RFC 4855 [RFC4855].
[More specific rules only need to be included if some parameter does not match these rules.]
[Here write your offer/answer consideration section, please see Section Section 3.3.2.1 for help.]
[Here write your considerations for declarative SDP, please see Section Section 3.3.2.2 for help.]
This memo requests that IANA registers [insert media type name here] as specified in Appendix Appendix A.11.1. The media type is also requested to be added to the IANA registry for "RTP Payload Format MIME types" (http://www.iana.org/assignments/rtp-parameters).
[See Section Section 7.4 and consider if any of the parameter needs a registered name space.]
[See Section Section 7.2]
RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550] , and in any applicable RTP profile. The main security considerations for the RTP packet carrying the RTP payload format defined within this memo are confidentiality, integrity and source authenticity. Confidentiality is achieved by encryption of the RTP payload. Integrity of the RTP packets through suitable cryptographic integrity protection mechanism. Cryptographic system may also allow the authentication of the source of the payload. A suitable security mechanism for this RTP payload format should provide confidentiality, integrity protection and at least source authentication capable of determining if an RTP packet is from a member of the RTP session or not.
Note that the appropriate mechanism to provide security to RTP and payloads following this memo may vary. It is dependent on the application, the transport, and the signalling protocol employed. Therefore a single mechanism is not sufficient, although if suitable the usage of SRTP [RFC3711] is recommended. Other mechanism that may be used are IPsec [RFC4301] and TLS [RFC5246] (RTP over TCP), but also other alternatives may exist.
This RTP payload format and its media decoder do not exhibit any significant non-uniformity in the receiver-side computational complexity for packet processing, and thus are unlikely to pose a denial-of-service threat due to the receipt of pathological data. Nor does the RTP payload format contain any active content.
[The previous paragraph may need editing due to the format breaking either of the statements. Fill in here any further potential security threats]
Note to RFC Editor: This section may be removed after carrying out all the instructions of this section.
RFCXXXX is to be replaced by the RFC number this specification recieves when published.
[References must be classified as either normative or informative and added to the relevant section. References should use descriptive reference tags.]
[Normative references are those that are required to be used to correctly implement the payload format.]
[All other references.]
[All Authors need to include their Name and email addresses as a minimal. Commonly also surface mail and possibly phone numbers are included.]
[The Template Ends Here!]