Network Working Group | C. Bran |
Internet-Draft | C. Jennings |
Intended status: Standards Track | Cisco |
Expires: December 08, 2011 | June 06, 2011 |
RTC-Web Communications Protocols
draft-cbran-rtcweb-protocols-00
The real time communications web (RTC-Web) will enable applications such as web browsers to natively support real time interactive voice and video. This document outlines the communication protocols for realizing RTC-Web functionality within applications such as web browsers. In addition to communications protocols, this document proposes a set of application programming interface requirements for controlling the protocol stack.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 08, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document may not be modified, and derivative works of it may not be created, and it may not be published except as an Internet-Draft.
The Internet was, from very early in its lifetime, considered a possible vehicle for the deployment of real-time, interactive applications - with the most easily imaginable being audio conversations (aka "Internet telephony") and videoconferencing.
The first attempts to build this were dependent on special networks, special hardware and custom-built software, often at very high prices or at low quality, placing great demands on the infrastructure.
As the available bandwidth has increased, and as processors and other hardware has become ever faster, the barriers to participation have decreased, and it is possible to deliver a satisfactory experience on commonly available computing hardware.
Still, there are a number of barriers to the ability to communicate universally - one of these is that there are, as of yet, no single set of communication protocols that all agree should be made available for communication; another is the sheer lack of universal identification systems (such as is served by telephone numbers or email addresses in other communications systems).
Development of "The Universal Solution" has proved hard, however, for all the usual reasons. This memo aims to take a more building-block- oriented approach, and try to find consensus on a set of substrate components that we think will be useful in any real-time communications systems.
The last few years have also seen a new platform rise for deployment of services: The browser-embedded application, or "Web application". It turns out that as long as the browser platform has the necessary interfaces, it is possible to deliver almost any kind of service on it.
Traditionally, these interfaces have been delivered by plugins, which had to be downloaded and installed separately from the browser; in the development of HTML5, much promise is seen by the possibility of making those interfaces available in a standardized way within the browser.
Other efforts, for instance the W3C Web Applications and Device API working groups, focus on making standardized APIs and interfaces available, within or alongside the HTML5 effort, for those functions; this memo concentrates on specifying the protocols and subprotocols that are needed to specify the interactions that happen across the network.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
The section defines the set of protocols and selected subset profiles of these protocols that RTC-WEB client applications will need to implement. This set of protocols forms the requirements for the controlling APIs in [Section 4]. At a high level this section is split into five subsections that address requirements for RTC-WEB client application: connection management, signaling protocols, codec requirements, transports for real time media such as audio and video and transports for non media data .
It is quite probable that many RTC-WEB client applications, such as web browsers will be deployed behind a NAT. To set up secure data plane sessions, all RTC-WEB client application implementations will use ICE [RFC5245] or ICE-Lite Section 2.7 of [RFC5245]. ICE is leveraged here to address the security concerns discussed in [section] Section 7.
There are two deployment scenarios for RTC-WEB client applications. The first scenario is when applications are deployed behind NAT and have to worry about NAT traversal. The second scenario is when the application is not behind a NAT, such as an RTC-WEB application that is always connected to the public Internet. As stated in section 2.7 of [RFC5245], ICE requires that both endpoints to support it in order for ICE to be used on a call.
With regards to RTC-WEB client applications, all applications that are deployed behind a NAT or do not have a public IP address are REQUIRED to support ICE [RFC5245], applications that are not behind a NAT and have a public IP address are REQUIRED to support ICE-Lite and MAY fully support ICE. RTC-WEB client applications that fully support ICE are REQUIRED to support AGGRESSIVE NOMINATION, and MAY support REGULAR NOMINATION.
Implicit to supporting ICE, all RTC-WEB client applications are REQUIRED to implement Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) (STUN) [RFC3489] and Traversal Using Relays around NAT (TURN) [RFC5766].
[Open Issue: there is a strong interest to define a TURN-like protocol that looks like HTTP to intermediaries, so that media can be tunneled over HTTP. Should this be done?]
This section covers the signaling protocol to be used by RTC-WEB applications. To ensure interoperability not just between RTC-WEB applications, but with legacy IPPBX phone systems as well, a small subset of SIP will be REQUIRED for all RTC-WEB client application implementations. In addition to the subset of SIP specification [RFC3261], RTC-WEB client application implementations will be REQUIRED to support DNS resolutions as specified in [RFC3263] and the offer/answer model with SDP as specified in [RFC3264].
This section focuses on the subset of SIP functionality that will exist within all RTC-WEB client applications. The following User Agent Client (UAC) subset of the SIP specification [RFC3261] is REQUIRED.
In the SIP specification [RFC3261], the SIP features listed below are required for all UAC implementations. RTC-WEB client applications are not a fully featured SIP UAC and will only be implementing a subset of the SIP specification. Thusly, unlike SIP UACs, the following list of SIP features is to be considered OPTIONAL for RTC-WEB client application implementations.
This section outlines the REQUIRED SIP methods for all RTC-WEB client applications.
For handling SIP messages RTC-WEB client applications are required to implement the multipart MIME handling scheme as specified in [RFC5621].
Identity, for the purposes of this section, is defined as a SIP URI. There are two areas concerning SIP identity this specification will address.
The first area covers validation of the message originator. To securely validate a the identity of a SIP message originator, all RTC-WEB client applications are REQUIRED to implement the mechanism specified in [RFC4474].
To support cases were the identify of a caller/callee may change, such as when a call is parked and transferred from the original callee to another party, all RTC-WEB client applications are REQUIRED to implement the identity mechanism specified in [RFC4916]. [RFC3261]implicitly REQUIRES the implementation of the UPDATE method as specified in [RFC3311]
RTC-WEB client applications MUST support Network Address Translator (NAT) traversal. This section will address SIP-related areas to support NAT traversal.
As called for in [3.1] RTC-WEB client applications will implement STUN. To support client-managed connections, STUN-based keep-alives as specified in [RFC5626] are REQUIRED.
When SIP is used with UDP, responses to requests are returned to the source address the request came from, and to the port written into the topmost Via header field value of the request. This behavior is not desirable when the RTC-WEB client application is behind a Network Address Translator (NAT). To address UDP traversal problem the "rport" extension as specified in [RFC3581] is REQUIRED.
This section covers the audio and video codec requirements for RTC-WEB client applications. To ensure a baseline level of interoperability between RTC-Web applications, a minimum set of required codes is specified below. While this section specifies the codecs that will be supported by all RTC-Web application implementations, it leaves the question of supporting additional codecs to the will of the implementer.
RTC-WEB applications are REQUIRED to implement the following audio codecs.
Implementations of the PCMU and PMCA codecs are REQUIRED to support 1 channel with a rate of 8000 and a ptime of 20.
The following codecs are OPTIONAL for RTC-WEB application implementations.
[Open Issue: minimum profile and identifying any additional mandatory to implement audio codecs.]
RTC-WEB applications are REQUIRED to implement the following video codecs.
The following codecs are OPTIONAL for RTC-WEB application implementations.
[Open Issue: For the mandatory to implement video codec(s) what is the minimum profile?]
This section defines the real-time media transport requirements for RTC-Web client application implementation. This section breaks down the RTC-WEB RTP requirements into several sections. The sections cover the RTP requirements for: profile, optimizations, extensions, transport robustness and rate control.
[OPEN ISSUE: identify missing requirements]
RTC-Web applications to will need to provide a secure, interoperable, bandwidth friendly, media transport profile. The Secure Audio-visual Profile Feedback (SAVPF) as defined in [RFC5124] will meet the needs of RTC-Web applications by providing media encryption, interoperability and a flexible, bandwidth conscious RTCP packet transmission model. All RTC-Web applications are REQUIRED to implement SAVPF. Requiring the implementation of SAVPF also means that RTC-Web applications MUST implicitly support Audio-visual Profile Feedback (AVPF) [RFC4585], Audio-visual Profile (AVP) [RFC3551] and Secure Audio-visual Profile (SAVP) [RFC3711].
SAVPF supports SRTP by providing media encryption, integrity protection, replay protection and a limited form of source authentication. Though the SAVPF profile does support secure media transport, it does not specify an encryption keying mechanism. To support keying for SRTP, WEB-RTC application implementors are REQUIRED to implement DTLS-SRPT [RFC5764].
This section describes the optimization requirements for RTP within RTC-Web applications.
Historically, RTP and RTCP have been run on separate UDP ports. With the increased use of Network Address Port Translation (NAPT) so have the problems increased for maintaining multiple, costly NAT bindings for each UDP port. This dual UDP port paradigm also complicates firewall administration, since multiple ports must be opened to allow for RTP traffic. To reduce these costs and session setup times, support for multiplexing RTP data packets and RTCP control packets on a single port [RFC5761] is REQUIRED.
Note that the use of RTP and RTCP multiplexed on a single port ensures that there is occasional traffic sent on that port, even if there is no active media traffic. This may be useful to keep-alive NAT bindings.
RTCP packets are usually sent as compound RTCP packets and [RFC3550] demands that the RTCP compound packets always start with a Sender Report (SR) or Receiver Report (RR) packet. The SR and RR packets provide reception quality statistics and increase the mean RTCP packet size. Because the mean compound RTCP packet size is larger, the frequency at which RTCP packets can be sent within the RTCP bandwidth share decreases. The decreased transmission frequency creates a performance bottleneck that is especially noticeable when using frequent feedback messages.
As mentioned in section [Add ref] RTC-Web applications will be required to implement SAVPF, which implicitly requires feedback. [RFC5506] specifies how to reduce the mean RTCP message and allow for more frequent feedback. Frequent feedback, in turn, is essential to make real-time application quickly aware of changing network conditions and allow them to adapt their transmission and encoding behavior. Support for [RFC5506] is REQUIRED
RTP entities choose the RTP and RTCP transport addresses (IP addresses and port numbers), to bind to and receive packets on. However when sending RTP and RTCP packets, senders may use an IP address or port number that is different than the one specified for receiving packets. Using different transport addresses is problematic with regards to NAT traversal. The NAT traversal problem can be alleviated using symmetric RTP/RTCP [RFC4961]. Symmetric RTP/RTCP requires that the transport addresses for sending and receiving RTP/RTCP packets are identical. All RTC-WEB client applications are REQUIRED to implement Symmetric RTP/RTCP [RFC4961].
The RTCP Canonical Name (CNAME) provides a persistent transport-level identifier for an RTP endpoint. While the Synchronization Source (SSRC) identifier for an RTP endpoint may change if a collision is detected, or when the RTP application is restarted, it's RTCP CNAME is meant to stay unchanged, so that RTP endpoints can be uniquely identified and associated with their RTP media streams. For proper functionality, RTCP CNAMEs should be unique within the participants of an RTP session.
The RTP specification [RFC3550] includes guidelines for choosing a unique RTP CNAME. These guidelines are not sufficient in the presence of NAT devices or with regards to addressing privacy concerns resulting from the long-term, persistent identifiers.
To address the shortcomings of CNAME selection in[RFC3550], it is RECOMMENDED that RTP CNAME generation follows the approach specified in section 5 of [RFC6222].
For RTC-WEB client applications, such as a web browser, it may not be possible to retrieve the EUI-64 identifier or the host system's MAC address which is needed to fulfill the CNAME generation procedure outlined in section 5 of [RFC6222]. As an alternative to the EUI-64/MAC address, RTC-WEB client applications MAY generate and use a random number for the unique CNAME generation procedure.
.This section describes the RTP extensions that could be very useful within the RTC-WEB context.
RTC-Web applications will support conferencing capabilities. While this document remains silent regarding what conferencing topology should be supported for RTC-Web applications, the following section will provide guidance around RTP extensions to support centralized conferencing.
For more information on RTP conferencing topologies please refer to [RFC5117]
The Full Intra Request (FIR) command and message are defined in sections 3.5.1 and 4.3.1 of [RFC5104]. FIR messages will request that the currently distributed session participants send new intra coded pictures to the mixer. FIR is used when switching between sources to ensure that the receivers can decode the video or other predicted media encoding with long prediction chains. It is RECOMMENDED that the FIR message is supported.
The Picture Loss Indicator (PLI) is defined in Section 6.3.1 of [RFC4585]. PLI messages tell the encoder that a receiver has lost the decoder context and would like it repaired. It is RECOMMENDED that the PLI message is supported.
The Temporary Maximum Media Stream Bit Rate Request (TMMBR, "timber") message is defined in sections 3.5.4 and 4.2.1 of [RFC5104]. A receiver, translator, or mixer uses the TMMBR to request a sender to limit the maximum bit rate for a media stream to, or below, the provided value. An example of using TMMBR would be for an RTP mixer to constrain the media sender’s bit rate to fit within the lower bit rate range of other session participants. It is RECOMMENDED that the TMMBR message be supported.
This section describes the requirements for RTC-WEB RTP header extensions. For all RTC-WEB RTP header extensions it is REQUIRED that they are formatted and signaled according to the general mechanism defined in [RFC5285].
[Open Issue: should any of the following headers be added to the list:
Open Issue: There is also ongoing work to define RTP header extensions for providing audio levels:
Which, if any of the above should be required? optional?
]
Basic RTP session synchronization as described in [RFC3550] can be slow. To improve synchronization performance and maintain relative backwards compatibility it is RECOMMENDED that the rapid RTP synchronization extensions described in [RFC6051] be implemented.
This section identifies tools that can be used to add robustness to the RTP flows. Adding robustness to the RTP flow can reduce packet loss and thus have a positive impact upon media quality.
The retransmission scheme in RTP allows for flexibility of retransmissions. From the receiving side, only selected missing packets can be requested. From the sending side, packets can be prioritized based upon the senders knowledge of the receiver’s missing packets. Support for RTP retransmission as defined by [RFC4588] is RECOMMENDED.
[Open Issue: is [RFC4588] the way we want to tackle this issue?]
[Open issue - should there be a FEC scheme recommendation?]
RTC-WEB client applications support for multicast RTP is NOT REQUIRED.
[OPEN ISSUE - There are currently no available, standardized RTP rate control mechanism that uses media adaptation. Having a mechanism in place will be REQUIRED for RTC-WEB applications and which means there is a need for the IETF to produce this specification.
A potential starting point for defining a solution is "RTP with TCP Friendly Rate Control" [rtp-tfrc].]
The RTC-WEB will enable for rich voice and video communications from client applications, such as a web browser. One of the natural extensions of the RTC-WEB and the work emerging from the HTML5 community is video games. Video games have a similar stringent real-time requirement for exchanging non-media data types such as a player’s screen position.
The question of how best to handle non-media data types has been raised. There have been proposals to address this problem. Common to all proposals is how the data transport session is set up, using ICE [RFC5245] in a similar manner to that of RTP [RFC3550]. The proposals vary from once the session is set up; one proposal is just to use a thin shim on top of UDP or DTLS to de-multiplex the packets from other packets such as RTP on the same connection. Another proposal is DTLS over DCCP over UDP with some appropriate congestion control scheme chosen for DCCP. Lastly there has been a proposal to define a data codec to carry the data in RTP.
This section will answer the question regarding the addition of non-media data types into an RTC-WEB client application initiated RTP session.
RTP by design adheres to the application level framing architectural principle. This principle requires that RTP payload formats be specified. By requiring specific payload formats RTP provides a mechanism to optimize the transmission of encoded media. Other than this optimization there is no congestion control mechanisms for RTP.
This principle also implies that if a payload format cannot be specified, as the case is with generic data, it breaks one of the fundamental architectural principles of RTP and makes optimization impossible. Given that the ability to optimize the transmission of non-media data types is lost and there are no capabilities for congestion control within RTP, it follows that there is no benefit to using RTP instead of a more general data transport such as UDP. Until non-media data payload formats are created, the use of RTP as a non-media data transport SHALL NOT be used in conjunction with any RTC-WEB client application implementation.
[Open issue: There has been mention of actually creating new payload formats for non-media data types. If new payload formats are actually created for specific types of non-media data, the requirement above would still stand as the application level framing principle would be preserved and the new formats would have to adhere to the principle. Any new formats would be specified outside of this document but referred to]
[OPEN issue: need further discussion around this area]
There have been some ideas proposed but nothing has emerged as the dominant paradigm. The current thinking is that, for RTC-WEB client applications, RTP is not an option for non-media data types that do not have a payload format specification. Without a payload format specification a workable solution would resemble something that allows datagrams to be transmitted via a secure, congestion-controlled, unreliable transport mechanism.
One of the current proposed solutions could meet the requirements for a non-media data type transport for RTC-WEB client application is to use a DCCP via the following specifications:
The maturity of available implementations of DCCP is of concern along with the partiality of this proposed solution. Another way of tackling the problem of non-media data transport is to push the requirements into the RTC-WEB client application implementation.
The following is a proposed set of REQUIRED RTC-WEB client application non-media data transport requirements.
As an example of how these proposed requirements could be implemented within an RTC-WEB client application, lets explore a web browser-based implementation. In this specific implementation, the web browser would provide DTLS over UDP and implement a broad congestion control solution such as TFRC or TFRC-SP. This implementation will yield a coarse-grained congestion controlled non-media data transport solution that is accessible via JavaScript API calls. These non-media data transport capabilities would provide a flexible solution for web developers to build a full congestion control solution into their WEB-RTC client application.
[Open Issue: Given that there is no consensus with regards to a transport solution, this topic needs further discussion.
Open Issue: Areas for further discussion:
]
NOT Ready - need to decide on protocols first, API comes after that
RTP
The API needs to allow the DSCP REF for each RTP or media stream to be set.
The API needs to allow the browser app to observer and control the SSRC values in the RTP.
Codec
The API needs to support the following OPTIONAL codecs: H263-2000, H264, H264-SVC, raw and VP8.
The API needs to support the following OPTIONAL codecs: G729, G722, G7221, G723, AMR, AMR-WB, iLBC, L16 and opus.
There is no way to meet all the security requirements and maintain comparability with all legacy VoIP equipment. This draft tries to minimize the impedance mismatch. The requirements here would allow interoperability with legacy VoIP equipment as long as that equipment either directly supported, or was fronted by an SBC that supported, the following: a CORS [W3C.WD-cors-20090317] extension for SIP, ICE or ICE-Lite, the mandatory to implement codecs in [SECTION], supported SIP invites containing an offer, and supported DTMF over RTP with telephone events.
Of the items listed above, support for ICE-Lite has historically been lacking in VoIP equipment, this is changing and ICE-Lite becoming increasingly prevalent, particularly on devices designed to sit on the edge of a domain and connect to remote user agents that may be behind NATs. Given the increasing adoption of ICE-Lite, it could be conjectured that a substantial fraction of VoIP equipment meets the RTC-WEB interoperability list except for the CORS extensions.
For an edge device that was willing to receive SIP call from others, implementing the CORS is pretty trivial. When the UAS receives a SIP options request with an Origin header, it checks whether the header field value is on the white list, and if it is then the UAS copies the value to the Access-Control-Allow-Origin header field value in the response. For many situations the white list would be everything, while for others it would be just the list of websites that are expected to originate calls to this SIP device.
This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an RFC.
Because there are a number of security issues, considerations and requirements for RTC-WEB client applications there is a draft that specifically addresses the RTC-WEB application security considerations. This draft defers it’s security considerations and requirements to the security considerations for RTC-Web draft [I-D.ekr-security-considerations-for-rtc-web].
Many thanks to Harald Alvestrand, Magnus Westerlund, Colin Perkins, Joerg Ott for a signifcant amount of text and contributed ideas on this topic.