Network Working Group | C. S. Perkins |
Internet-Draft | University of Glasgow |
Intended status: Standards Track | M. Westerlund |
Expires: January 12, 2012 | Ericsson |
J. Ott | |
Aalto University | |
July 11, 2011 |
RTP Requirements for RTC-Web
draft-perkins-rtcweb-rtp-usage-02
This memo discusses use of RTP in the context of the RTC-Web activity. It discusses important features of RTP that need to be considered by other parts of the RTC-Web framework, describes which RTP profile to use in this environment, and outlines what RTP extensions should be supported.
This document is a candidate to become a work item of the RTCWEB working group as <WORKING GROUP DRAFT "MEDIA TRANSPORTS">.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 12, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This memo discusses the Real-time Transport Protocol (RTP) [RFC3550] in the context of the RTC-Web activity. The work in the IETF Audio/Video Transport Working Group, and it's successors, has been about providing building blocks for real-time multimedia transport, and has not specified who should use which building blocks. The selection of building blocks and functionalities can really only be done in the context of some application, for example RTC-Web. We have selected a set of RTP features and extensions that are suitable for a number of applications that fit the RTC-Web context. Thus, applications such as VoIP, audio and video conferencing, and on-demand multimedia streaming are considered. Applications that rely on IP multicast have not been considered likely to be applicable to RTC-Web, thus extensions related to multicast have been excluded. We believe that RTC-Web will greatly benefit in interoperability if a reasonable set of RTP functionalities and extensions are selected. This memo is intended as a starting point for discussion of those features in the RTC-Web framework.
This memo is structured into different topics. For each topic, one or several recommendations from the authors are given. When it comes to the importance of extensions, or the need for implementation support, we use three requirement levels to indicate the importance of the feature to the RTC-Web specification:
When this memo discusses RTP, it includes the RTP Control Protocol (RTCP) unless explicitly stated otherwise. RTCP is a fundamental and integral part of the RTP protocol, and is REQUIRED to be implemented.
As RTC-Web is focused on peer to peer connections established from clients in web browsers the following topologies further discussed in RTP Topologies [RFC5117] are primarily considered. The topologies are depicted and briefly explained here for ease of the reader.
+---+ +---+ | A |<------->| B | +---+ +---+
point to point topology [fig-p2p] is going to be very common in any single user to single user applications.
+---+ +---+ | A |<---->| B | +---+ +---+ ^ ^ \ / \ / v v +---+ | C | +---+
For small multiparty sessions it is practical enough to create RTP sessions by letting every participant send individual unicast RTP/UDP flows to each of the other participants. This is called multi-unicast and is unfortunately not discussed in the RTP Topologies [RFC5117]. This topology has the benefit of not requiring central nodes. The downside is that it increases the used bandwidth at each sender by requiring one copy of the media streams for each participant that are part of the same session beyond the sender itself. Thus this is limited to scenarios with few end-points unless the media is very low bandwidth.
It needs to be noted that, if this topology is to be supported by the RTC-Web framework, it needs to be possible to connect one RTP session to multiple established peer to peer flows that are individually established.
+---+ +------------+ +---+ | A |<---->| |<---->| B | +---+ | | +---+ | Mixer | +---+ | | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+
RTP mixer [fig-mixer] is a centralised point that selects or mixes content in a conference to optimise the RTP session so that each end-point only needs connect to one entity, the mixer. The mixer also reduces the bit-rate needs as the media sent from the mixer to the end-point can be optimised in different ways. These optimisations include methods like only choosing media from the currently most active speaker or mixing together audio so that only one audio stream is required in stead of 3 in the depicted scenario. The downside of the mixer is that someone is required to provide the actual mixer.
+---+ +------------+ +---+ | A |<---->| |<---->| B | +---+ | | +---+ | Translator | +---+ | | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+
If one wants a less complex central node it is possible to use an relay (called an Transport Translator) [fig-relay] that takes on the role of forwarding the media to the other end-points but doesn't perform any media processing. It simply forwards the media from all other to all the other. Thus one endpoint A will only need to send a media once to the relay, but it will still receive 3 RTP streams with the media if B, C and D all currently transmits.
+------------+ | | +---+ | | +---+ | A |<---->| Translator |<---->| B | +---+ | | +---+ | | +------------+
To support legacy end-point (B) that don't fulfil the requirements of RTC-Web it is possible to insert a Translator [fig-translator] that takes on the role to ensure that from A's perspective B looks like a fully compliant end-point. Thus it is the combination of the Translator and B that looks like the end-point B. The intention is that the presence of the translator is transparent to A, however it is not certain that is possible. Thus this case is include so that it can be discussed if any mechanism specified to be used for RTC-Web results in such issues and how to handle them.
This section discusses some requirements RTP and RTCP [RFC3550] place on their underlying transport protocol, the signalling channel, etc.
There are three fundamental points of multiplexing within the RTP framework:
These multiplexing points area fundamental part of the design of RTP and are discussed in Section 5.2 of [RFC3550]. Of special importance is the need to separate different RTP sessions using a multiplexing mechanism at some lower layer than RTP, rather than trying to combine several RTP sessions implicitly into one lower layer flow. This will be further discussed in the next section.
In today's network with prolific use of Network Address Translators (NAT) and Firewalls (FW), there is a desire to reduce the number of transport layer ports used by an real-time media application using RTP. This has led some to suggest multiplexing two or more RTP sessions on a single transport layer flow, using either the Payload Type or SSRC to demultiplex the sessions, in violation of the rules outlined above. It is not the first time some people look at RTP and question the need for using RTP sessions for different media types, and even more the potential need to separate different media streams of the same type into different session due to their different purposes. Section 5.2 of [RFC3550] outlines some of those problems; we elaborate on that discussion, and on other problems that occurs if one violates this part of the RTP design and architecture.
As discussed in Section 5.2 of [RFC3550], multiplexing several RTP sessions (e.g., audio and video) onto a single transport layer flow introduces the following problems:
We do note that some of the above issues are resolved as long as there is explicit separation of the RTP sessions when transported over the same lower layer transport, for example by inserting a multiplexing layer in between the lower transport and the RTP/RTCP headers. But a number of the above issue are not resolved by this.
In the RTCWEB context, i.e. web browsers running on various end-points it might appear unlikely that flow based QoS is available on the end-points that will support RTCWEB. The authors don't disagree that it is unlikely for the common case of users in their home-network or at WiFi hotspots will have flow-based QoS available. However, if one considers enterprise users, especially using intranet applications, the availability and desire to use QoS is not implausible. There are also web users who use networks that are more resource-constrained than wired networks and WIFI networks, for example cellular network. The current access network QoS mechanism for user traffic in cellular technology from 3GPP are flow based.
RTP's design hasn't been changed, although session multiplexing related topics have been discussed at various points of RTP's 20 year history. The fact is that numerous RTP mechanism and extensions have been defined assuming that one can perform session multiplexing when needed. Mechanism that has been identified as problematic if one doesn't do session separation are:
As can be seen, the requirement that separate RTP sessions are carried in separate transport-layer flows is fundamental to the design of RTP. Due to this design principle, implementors of various services or applications using RTP have not commonly violated this model, and have separated RTP sessions onto different transport layer flows. After 15 years of deployment of RTP in its current form, any move to change this assumption must carefully consider the backwards compatibility problems that this will cause. In particular, since widespread use of multiplexed RTP sessions in RTC-Web will almost certainly cause their use in other scenarios, the discussion regarding compatibility must be wider than just whether multiplexing works for the extremely limited subset of RTP use cases currently being considered in the RTC-Web group. Any such multiplexing extension to RTP must therefore be developed by the AVTCORE working group, since it has much broader applicability and scope than RTC-Web.
The arguments the authors are aware of for why it is desirable to use a single underlying transport (e.g., UDP) flow for all media, rather than one flow for each type of media are the following:
As we have noted in the preceding sections, implicit multiplexing of multiple RTP sessions onto a single transport flow raises a large number of backwards compatibility issues. It has been argued that these issues are either not important, since the RTP features disrupted are not of interest to the current set of RTC-Web use cases, or can be solved by somehow explicitly dividing the SSRC space into different regions for different RTP sessions. We believe the first argument is short-sighted: those RTP features may not be important today, but the successful deployment of simple RTC-Web applications will generate interest to try more advanced scenarios, which may well need those features. Partitioning the SSRC space to separate RTP sessions results in new set of issues, where the biggest from our point of view is that it effectively creates a new variant of the RTP protocol, which is incompatible with standard RTP. Having two different variants of the core functionality of RTP will make it much more difficult to develop future protocol extensions, and the new variant will likely also have different set of extensions that work. In addition the two versions aren't directly interoperable, and will force anyone that want to interconnect the two version to deploy (complex) gateways. It also reduces the common user base and interest in maintaining and developing either version.
On the other hand, we are sympathetic to the argument that using a single transport flow does save some time in setup processing, it will save some resources on NATs and FWs that are in between the end-points communicating, it may have somewhat higher success rate of session establishment.
Thus the authors considered it REQUIRED that RTP sessions are multiplexed using an explicit mechanism outside RTP. We strongly RECOMMENDED that the mechanism used to accomplish this multiplexing is to use unique UDP flows for each RTP session, based on simplicity and interoperability. However, we can accept a WG consensus that using a single transport layer flow between peers is the default, and that also the fallback of using separate UDP flows are supported, under one constraint: that the RTP sessions are explicitly multiplexed in such a way existing mechanism or extensions to RTP are not prevented to work, and that the solution does not result in that an alternative variant of RTP is created (i.e., it must not disrupt RTCP processing, and the RTP semantics). In this later case we RECOMMEND that some type of multiplexing layer is inserted between UDP flow and the RTP/RTCP headers to separate the RTP sessions, since removing this shim-layer and gatewaying to standard RTP sessions is simpler than trying to separate RTP sessions that are multiplexed together to gateway them to standard RTP sessions. We discuss possible multiplexing layers in Section 3.
RTP is built with the assumption of an external to RTP/RTCP signalling channel to configure the RTP sessions and its functions. The basic configuration of an RTP session consists of the following parameters:
These parameters are often expressed in SDP messages conveyed within an offer/answer exchange. RTP does not depend on SDP or on the offer/answer model, but does require all the necessary parameters to be agreed somehow, and provided to the RTP implementation. We note that in RTCWEB context it will depend on the signalling model and API how these parameters need to be configured but they will be need to either set in the API or explicitly signalled between the peers.
As discussed in Section 2.3, the mapping between media type name, and its associated RTP payload format, and the RTP payload type number to be used for that format must be signalled as part of the session setup. An endpoint may signal support for multiple media formats, or multiple configurations of a single format, each using a different RTP payload type number. If multiple formats are signalled by an endpoint, that endpoint is REQUIRED to be prepared to receive data encoded in any of those formats at any time. RTP does not require advance signalling for changes between formats that were signalled during the session setup. This is needed for rapid rate adaptation.
This section explores a few different possible solutions for how to achieve explicit multiplexing between RTP sessions and possible other UDP based flows, such as STUN and protocols carrying application data. But before diving into the proposals we should consider a bit what requirements we can derive from the previous discussion and the intended goals.
General Requirements for this multiplexing solution as we understand them are:
Please keep these general requirements in mind when we look at some possible solutions.
The most reasonable approach is to use DCCP as common multiplexing layer, at least for RTP and non-RTP data and use DCCP's function for congestion control in both cases. This would result in a stack picture that looks like this:
+-------------+------+ | Media | FOO | +------+------+ | + | SRTP | DTLS | DTLS | +------+------+------+------+ | STUN | DCCP | +------+--------------------+ | UDP | +---------------------------+
STUN and DCCP can be demultiplexed simply as long as the DCCP source port are in the range 16384-65535. The great benefit of this solution is that it can support large number of parallel explicitly multiplexed datagram flows. Another great benefit is a common place for congestion control implementation for both RTP and non-RTP data. It also provides a negotiation mechanism for transport features, including congestion control algorithms, enabling future development of this layer.
The above leaves out the question of a reliable transport solution. This can be done in two major ways as far as we can see. Either build reliability extensions on top of DCCP or put a protocol in parallel with STUN and DCCP. The downside with the latter is that we again end up in a situation where we have several protocols that can occur in the outer UDP payload requiring implicit demultiplexing based on actual data, rather than on a field. As DCCP has a negotiation mechanism for both what service that uses DCCP and DCCP options and features both becomes viable methods for defining reliability extensions.
Note: that the main reason not also putting STUN on top of DCCP is the fact that DCCP do require a handshake on transport parameters when establishing a new flow. Thus performing that negotiation prior to doing verification of connection increase both the amount of data that will be transmitted to a not yet consenting peer and the the increased delay.
A very straightforward design would be adding a one or two byte shim layer on top of the transport payload prior to the actual multiplexed protocols. This allows both for static assignment of shim code-points like for STUN and for dynamically agreed on usages, either explicitly through signalling or implicitly by application context.
+-------------+------+ | Media | DTLS | +------+------+------+------+ | STUN | SRTP | DTLS | FOO | +------+------+------+------+ | SHIM | +---------------------------+ | UDP | +---------------------------+
The Internet Draft "RTC-Web Non-Media Data Transport Requirements" [I-D.cbran-rtcweb-data] dismisses the idea of a generic SHIM layer for a number of reasons:
A shim layer has low overhead combined with explicitness and great flexibility on what to put on top. In addition to definition of the shim itself some signalling will needed, either explicit or implicit depending on how the signalling model and the API. The signalling needs to assign meaning to what a particular multiplexing code-point means in the particular underlying transport flow.
Although a reliable protocol isn't included in the above example it can easily be included and be anything that can put in a UDP payload such as TCP, RMT based, home grown. Thus ensuring maximum flexibility to add additional protocols on top of the single UDP flow.
The main point with RTP internal multiplexing is to enable multiplexing RTP sessions without adding any extra layer between the RTP header and the lower transport, e.g. single UDP flow, that things are multiplex on. Rosenberg [I-D.rosenberg-rtcweb-rtpmux] suggests one method for RTP Internal Multiplexing. In addition to this there are suggestion in "RTC-Web Non-Media Data Transport Requirements" [I-D.cbran-rtcweb-data] to multiplex also the non-RTP data on the same level using implicit identification of data packets that separate them from DTLS-SRTP packets, RTP/RTCP packets and STUN packets. This results in a stack picture that looks like this:
+-------------+------+ | Media | DTLS | +------+------+------+------+ | STUN | SRTP | DTLS | FOO | +------+------+------+------+ | UDP | +---------------------------+
Where Foo is the protocol suggested by "RTC-Web Non-Media Data Transport Requirements" [I-D.cbran-rtcweb-data].
These proposals rely on the idea that a receiver can look at a number of the bytes of the UDP payload to identify the type of packet. So assuming DTLS-SRTP key management and a datagram non-RTP data transport we have at least four protocols to separate. If one have successfully identified the protocol as (S)RTP then one looks at the SSRC field to find out media type and stream IDs.
There are a number of issues with the current proposals which we will raise below. We also discuss what is going to be needed to drive this work.
The first argument against this design is that it further proliferates this bad design of implicit packet identification that started with STUN. And instead of trying to break out of this pattern we appear to pile on more protocols that is supposed to identified despite that all these protocols actually have protocol fields that have a purpose in these overlapping bytes that we attempt to perform identification in. At some point a protocol extension in either of the protocols will result in a collision breaking the demultiplexing mechanism.
Secondly, the design restricts RTCWEB to a subset of RTP functionality. By redefining the SSRC field this creates in practice an alternative RTP protocol that can't fully interoperate with RTP as currently defined. The inclusion of a magic word that allows Deep Packet Inspection and other interpreters to commonly identify the versions correctly is a clear admission to this fact, even if not state explicitly in the text. This new version is forever prevented from using any of the features that has been identified as not being compatible with this design. In addition it either forces future RTP extensions to take this severe limitation in into account or create additional extensions that are not compatible. Forking the RTP protocol into two versions is really not desirable.
Thirdly, a significantly limited size stream ID field requires someone to manage and ensure that unique stream IDs are used by each end-point. This would not be an issue if the only use case ever would be communication between two end-points. However, we at this point have use cases and requirements for centralised conferencing scenarios. Even a basic star scenario requires extra complexities as the central node needs to be able to force the node that aren't at the centre to use the IDs that the central node dictates. This usage then becomes much more complex at the very moment someone attempts to interconnect two stars. This is in fact likely to happen when one needs either scalability or geographical optimisation. With geographical optimisation I mean one entity in Asia and one in Africa that performs media mixing or transport relaying to reduce the delay and traffic load. In addition to the centralised conferencing usage, it looks plausible that RTCWEB could allow for an ad-hoc conferencing mesh. Without a central point beyond the web server, only the web server could ensure the uniqueness requirements. All of the above cases is easily handled by regular RTP without any control at all. Showing that this proposal brings extra complexities.
Fourth, if any legacy interoperation is considered one should be aware that it occurs that the same SSRC value is used in different RTP session in the same communication session. Commonly for providing quick association of media streams in the different sessions, sometime due to implementation choices, and sometime due to that an extension requires this, like the session mode of RTP retransmission [RFC4588].
Fifth, there is a need to support more than a single session context per media type. As shown in "RTP Multiple Stream Sessions and Simulcast" [I-D.westerlund-avtcore-multistream-and-simulcast] there are clear benefits in using multiple RTP sessions for separating intent with different media streams. This is already occurring in video conferencing to separate main video (e.g. active speaker) from alternative video (e.g. non-active speaker, audience) and document or slide video streams. We will not deny that the web server could track the flows and their purpose through other mechanisms and signalling channels. However, it complicates any interop with legacy and forces more functionality and additional APIs into any gateway function.
If RTCWEB WG decides that despite the issues associated with RTP internal multiplexing wants to pursue this approach the WG needs to be aware that this WG doesn't have the right to redefine RTP semantics. The IETF has an active WG chartered for maintaining and extending RTP in the AVTCORE WG, and proposal for change needs to be handled in that WG. This means that all RTCWEB WG can do for the RTP multiplexing part is to provide requirements to AVTCORE. The WG participants would then be encouraged to engage in proposing and be proponents for the work in the AVTCORE WG.
Considering that not only RTCWEB is has voiced the need for a multiplexing solution and that this likely have significant impact on RTP for the future, any proposal for a solution needs to be generally applicable. For example most of the arguments dismissed in "Multiplexing of Real-Time Transport Protocol (RTP) Traffic for Browser based Real-Time Communications (RTC)" [I-D.rosenberg-rtcweb-rtpmux] as not being applicable for RTCWEB will need to be reconsidered in the light of more general applications.
So some requirements on this solution are from the authors of this draft:
Looking at these proposals we authors are clearly in favour of a shim layer unless DCCP is being selected anyway as datagram or media transport protocol which in case one should strongly consider having both data and media over the same protocol to enable that it is used as multiplexing layer.
We don't see RTP internal as a realistic contender for the first phase of RTCWEB specifications. It has documented issues. The only way forward for the WG is to develop requirements for what RTCWEB needs and share these with AVTCORE. If there are proponents for driving a solution, they take the design of a generalised protocol in AVTCORE that takes into consideration the existing specification. It might find a suitable solution, it may not. When this is done we might have something stable to start deploying in two years from now or the WG has decided to drop the work as non feasible.
The "Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)" [RFC5124] is REQUIRED to be implemented. This builds on the basic RTP/AVP profile [RFC3551], the RTP/AVPF feedback profile [RFC4585], and the secure RTP/SAVP profile [RFC3711].
The RTP/AVPF part of RTP/SAVPF is required to get the improved RTCP timer model, that allows more flexible transmission of RTCP packets in response to events, rather than strictly according to bandwidth. This also saves RTCP bandwidth and will commonly only use the full amount when there is a lot of events on which to send feedback. This functionality is needed to make use of the RTP conferencing extensions discussed in Section 7.1.
The RTP/SAVP part of RTP/SAVPF is for support for Secure RTP (SRTP) [RFC3711]. This provides media encryption, integrity protection, replay protection and a limited form of source authentication. It does not contain a specific keying mechanism, so that, and the set of security transforms, will be required to be chosen. It is possible that a security mechanism operating on a lower layer than RTP can be used instead and that should be evaluated. However, the reasons for the design of SRTP should be taken into consideration in that discussion.
RTP and RTCP are two flexible and extensible protocols that allow, on the one hand, choosing from a variety of building blocks and combining those to meet application needs, and on the other hand, create extensions where existing mechanisms are not sufficient: from new payload formats to RTP extension headers to additional RTCP control packets.
Different informational documents provide guidelines to the use and particularly the extension of RTP and RTCP, including the following: Guidelines for Writers of RTP Payload Format Specifications [RFC2736] and Guidelines for Extending the RTP Control Protocol [RFC5968].
This section discusses some optimisations that makes RTP/RTCP work better and more efficient and therefore are considered.
Historically, RTP and RTCP have been run on separate UDP ports. With the increased use of Network Address/Port Translation (NAPT) this has become problematic, since maintaining multiple NAT bindings can be costly. It also complicates firewall administration, since multiple ports must be opened to allow RTP traffic. To reduce these costs and session setup times, support for multiplexing RTP data packets and RTCP control packets on a single port [RFC5761] is REQUIRED. Supporting this specification is generally a simplification in code, since it relaxes the tests in [RFC3550].
Note that the use of RTP and RTCP multiplexed on a single port ensures that there is occasional traffic sent on that port, even if there is no active media traffic. This may be useful to keep-alive NAT bindings.
RTCP packets are usually sent as compound RTCP packets; and RFC 3550 demands that those compound packets always start with an SR or RR packet. However, especially when using frequent feedback messages, these general statistics are not needed in every packet and unnecessarily increase the mean RTCP packet size and thus limit the frequency at which RTCP packets can be sent within the RTCP bandwidth share.
RFC5506 "Support for Reduced-Size Real-Time Transport Control Protocol (RTCP): Opportunities and Consequences" [RFC5506] specifies how to reduce the mean RTCP message and allow for more frequent feedback. Frequent feedback, in turn, is essential to make real-time application quickly aware of changing network conditions and allow them to adapt their transmission and encoding behaviour.
Support for RFC5506 is REQUIRED.
RTP entities choose the RTP and RTCP transport addresses, i.e., IP addresses and port numbers, to receive packets on and bind their respective sockets to those. When sending RTP packets, however, they may use a different IP address or port number for RTP, RTCP, or both; e.g., when using a different socket instance for sending and for receiving. Symmetric RTP/RTCP requires that the IP address and port number for sending and receiving RTP/RTCP packets are identical.
The reasons for using symmetric RTP is primarily to avoid issues with NAT and Firewalls by ensuring that the flow is actually bi-directional and thus kept alive and registered as flow the intended recipient actually wants. In addition it saves resources in the form of ports at the end-points, but also in the network as NAT mappings or firewall state is not unnecessary bloated. Also the number of QoS state are reduced.
Using Symmetric RTP and RTCP [RFC4961] is REQUIRED.
The RTCP Canonical Name (CNAME) provides a persistent transport-level identifier for an RTP endpoint. While the Synchronisation Source (SSRC) identifier for an RTP endpoint may change if a collision is detected, or when the RTP application is restarted, it's RTCP CNAME is meant to stay unchanged, so that RTP endpoints can be uniquely identified and associated with their RTP media streams. For proper functionality, RTCP CNAMEs should be unique among the participants of an RTP session.
The RTP specification [RFC3550] includes guidelines for choosing a unique RTP CNAME, but these are not sufficient in the presence of NAT devices. In addition, some may find long-term persistent identifiers problematic from a privacy viewpoint. Accordingly, support for generating a short-term persistent RTCP CNAMEs following method (b) as specified in Section 4.2 of "Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names (CNAMEs)" [RFC6222] is RECOMMENDED, since this addresses both concerns.
There are a number of RTP extensions that could be very useful in the RTC-Web context. One set is related to conferencing, others are more generic in nature.
RTP is inherently defined for group communications, whether using IP multicast, multi-unicast, or based on a centralised server. In today's practice, however, overlay-based conferencing dominates, typically using one or a few so-called conference bridges or servers to connect endpoints in a star or flat tree topology. Quite diverse conferencing topologies can be created using the basic elements of RTP mixers and translators as defined in RFC 3550.
An number of conferencing topologies are defined in [RFC5117] out of the which the following ones are the more common (and most likely in practice workable) ones:
1) RTP Translator (Relay) with Only Unicast Paths (RFC 5117, section 3.3)
2) RTP Mixer with Only Unicast Paths (RFC 5117, section 3.4)
3) Point to Multipoint Using a Video Switching MCU (RFC 5117, section 3.5)
4) Point to Multipoint Using Content Modifying MCUs (RFC 5117, section 3.6)
We note that 3 and 4 are not well utilising the functions of RTP and in some cases even violates the RTP specifications. Thus we recommend that one focus on 1 and 2.
RTP protocol extensions to be used with conferencing are included because they are important in the context of centralised conferencing, where one RTP Mixer (Conference Focus) receives a participants media streams and distribute them to the other participants. These messages are defined in the Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF) [RFC4585] and the "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] and are fully usable by the Secure variant of this profile (RTP/SAVPF) [RFC5124].
The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of CCM [RFC5104]. It is used to have the mixer request from a session participants a new Intra picture. This is used when switching between sources to ensure that the receivers can decode the video or other predicted media encoding with long prediction chains. It is RECOMMENDED that this feedback message is supported.
The Picture Loss Indicator is defined in Section 6.3.1 of AVPF [RFC4585]. It is used by a receiver to tell the encoder that it lost the decoder context and would like to have it repaired somehow. This is semantically different from the Full Intra Request above. It is RECOMMENDED that this feedback message is supported as a loss tolerance mechanism.
This feedback message is defined in Section 3.5.4 and 4.2.1 in CCM [RFC5104]. This message and its notification message is used by a media receiver, to inform the sending party that there is a current limitation on the amount of bandwidth available to this receiver. This can be for various reasons, and can for example be used by an RTP mixer to limit the media sender being forwarded by the mixer (without doing media transcoding) to fit the bottlenecks existing towards the other session participants. It is RECOMMENDED that this feedback message is supported.
The RTP specification [RFC3550] provides a capability to extend the RTP header with in-band data, but the format and semantics of the extensions are poorly specified. Accordingly, if header extensions are to be used, it is REQUIRED that they be formatted and signalled according to the general mechanism of RTP header extensions defined in [RFC5285].
As noted in [RFC5285], the requirement from the RTP specification that header extensions are "designed so that the header extension may be ignored" [RFC3550] stands. To be specific, header extensions must only be used for data that can safely be ignored by the recipient without affecting interoperability, and must not be used when the presence of the extension has changed the form or nature of the rest of the packet in a way that is not compatible with the way the stream is signalled (e.g., as defined by the payload type). Valid examples might include metadata that is additional to the usual RTP information.
The RTP rapid synchronisation header extension [RFC6051] is recommended, as discussed in Section 7.3 we also recommend the client to mixer audio level [I-D.ietf-avtext-client-to-mixer-audio-level], and consider the mixer to client audio level [I-D.ietf-avtext-mixer-to-client-audio-level] as optional feature.
Currently the other header extensions are not recommended to be included at this time. But we do include a list of the available ones for information below:
Many RTP sessions require synchronisation between audio, video, and other content. This synchronisation is performed by receivers, using information contained in RTCP SR packets, as described in the RTP specification [RFC3550]. This basic mechanism can be slow, however, so it is RECOMMENDED that the rapid RTP synchronisation extensions described in [RFC6051] be implemented. The rapid synchronisation extensions use the general RTP header extension mechanism [RFC5285], which requires signalling, but are otherwise backwards compatible.
The Client to Mixer Audio Level [I-D.ietf-avtext-client-to-mixer-audio-level] is an RTP header extension used by a client to inform a mixer about the level of audio activity in the packet the header is attached to. This enables a central node to make mixing or selection decisions without decoding or detailed inspection of the payload. Thus reducing the needed complexity in some types of central RTP nodes.
Assuming that the Client to Mixer Audio Level [I-D.ietf-avtext-client-to-mixer-audio-level] is published as a finished specification prior to RTCWEB's first RTP specification then it is RECOMMENDED that this extension is included.
The Mixer to Client Audio Level header extension [I-D.ietf-avtext-mixer-to-client-audio-level] provides the client with the audio level of the different sources mixed into a common mix from the RTP mixer. Thus enabling a user interface to indicate the relative activity level of a session participant, rather than just being included or not based on the CSRC field. This is a pure optimisations of non critical functions and thus optional functionality.
Assuming that the Mixer to Client Audio Level [I-D.ietf-avtext-client-to-mixer-audio-level] is published as a finished specification prior to RTCWEB's first RTP specification then it is OPTIONAL that this extension is included.
There are some tools that can make RTP flows robust against Packet loss and reduce the impact on media quality. However they all add extra bits compared to a non-robust stream. These extra bits needs to be considered and the aggregate bit-rate needs to be rate controlled. Thus improving robustness might require a lower base encoding quality but has the potential to give that quality with fewer errors in it.
Support for RTP retransmission as defined by "RTP Retransmission Payload Format" [RFC4588] is RECOMMENDED.
The retransmission scheme in RTP allows flexible application of retransmissions. Only selected missing packets can be requested by the receiver. It also allows for the sender to prioritise between missing packets based on senders knowledge about their content. Compared to TCP, RTP retransmission also allows one to give up on a packet that despite retransmission(s) still has not been received within a time window.
"RTC-Web Media Transport Requirements" [I-D.cbran-rtcweb-data] raises two issues that they think makes RTP Retransmission unsuitable for RTCWEB. We here consider these issues and explain why they are in fact not a reason to exclude RTP retransmission from the tool box available to RTCWEB media sessions.
The RTCWEB end-point implementations will need to both select when to enable RTP retransmissions based on API settings and measurements of the actual round trip time. In addition for each NACK request that a media sender receives it will need to make a prioritisation based on the importance of the requested media, the probability that the packet will reach the receiver in time for being usable, the consumption of available bit-rate and the impact of the media quality for new encodings.
To conclude, the issues raised are implementation concerns that an implementation needs to take into consideration, they are not arguments against including a highly versatile and efficient packet loss repair mechanism.
Support of some type of FEC to combat the effects of packet loss is beneficial, but is heavily application dependent. However, some FEC mechanisms are encumbered.
The main benefit from FEC is the relatively low additional delay needed to protect against packet losses. The transmission of any repair packets should preferably be done with a time delay that is just larger than any loss events normally encountered. That way the repair packet isn't also lost in the same event as the source data.
The amount of repair packets needed are also highly dynamically and depends on two main factors, the amount and pattern of lost packets to be recovered and the mechanism one use to derive repair data. The later choice also effects the the additional delay required to both encode the repair packets and in the receiver to be able to recover the lost packet(s).
The method for providing basic redundancy is to simply retransmit an some time earlier sent packet. This is relatively simple in theory, i.e. one saves any outgoing source (original) packet in a buffer marked with a timestamp of actual transmission, some X ms later one transmit this packet again. Where X is selected to be longer than the common loss events. Thus any loss events shorter than X can be recovered assuming that one doesn't get an another loss event before all the packets lost in the first event has been received.
The downside of basic redundancy is the overhead. To provide each packet with once chance of recovery, then the transmission rate increases with 100% as one needs to send each packet twice. It is possible to only redundantly send really important packets thus reducing the overhead below 100% for some other trade-off is overhead.
In addition the basic retransmission of the same packet using the same SSRC in the same RTP session is not possible in RTP context. The reason is that one would then destroy the RTCP reporting if one sends the same packet twice with the same sequence number. Thus one needs more elaborate mechanisms.
Block based redundancy collects a number of source packets into a data block for processing. The processing results in some number of repair packets that is then transmitted to the other end allowing the receiver to attempt to recover some number of lost packets in the block. The benefit of block based approaches is the overhead which can be lower than 100% and still recover one or more lost source packet from the block. The optimal block codes allows for each received repair packet to repair a single loss within the block. Thus 3 repair packets that are received should allow for any set of 3 packets within the block to be recovered. In reality one commonly don't reach this level of performance for any block sizes and number of repair packets, and taking the computational complexity into account there are even more trade-offs to make among the codes.
One result of the block based approach is the extra delay, as one needs to collect enough data together before being able to calculate the repair packets. In addition sufficient amount of the block needs to be received prior to recovery. Thus additional delay are added on both sending and receiving side to ensure possibility to recover any packet within the block.
The redundancy overhead and the transmission pattern of source and repair data can be altered from block to block, thus allowing a adaptive process adjusting to meet the actual amount of loss seen on the network path and reported in RTCP.
The alternatives that exist for block based FEC with RTP are the following:
(tbd)
It is REQUIRED to have an RTP Rate Control mechanism using Media adaptation to ensure that the generated RTP flows are network friendly, and maintain the user experience in the presence of network problems.
The biggest issue is that there are no standardised and ready to use mechanism that can simply be included in RTC-Web. Thus there will be need for the IETF to produce such a specification. A potential starting point for defining a solution is "RTP with TCP Friendly Rate Control" [rtp-tfrc].
RTCP does contains a basic set of RTP flow monitoring points like packet loss and jitter. There exist a number of extensions that could be included in the set to be supported. However, in most cases which RTP monitoring that is needed depends on the application, which makes it difficult to select which to include when the set of applications is very large.
This memo makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an RFC.
RTP and its various extensions each have their own security considerations. These should be taken into account when considering the security properties of the complete suite. We currently don't think this suite creates any additional security issues or properties. The use of SRTP will provide protection or mitigation against all the fundamental issues by offering confidentiality, integrity and partial source authentication. We don't discuss the key-management aspect of SRTP in this memo, that needs to be done taking the RTC-Web communication model into account.
In the context of RTC-Web the actual security properties required from RTP are currently not fully understood. Until security goals and requirements are specified it will be difficult to determine what security features in addition to SRTP and a suitable key-management, if any, that are needed.