Network Working Group | C. Jennings |
Internet-Draft | Cisco |
Intended status: Standards Track | J.R. Rosenberg |
Expires: May 03, 2012 | jdrosen.net |
J. Uberti | |
R. Jesup | |
Mozilla | |
October 31, 2011 |
RTCWeb Offer/Answer Protocol (ROAP)
draft-jennings-rtcweb-signaling-01
This document describes an protocol used to negotiate media between browsers or other compatible devices. This protocol provides the state machinery needed to implement the offer/answer model (RFC 3264), and defines the semantics and necessary attributes of messages that must be exchanged. The protocol uses an abstract transport in that it does not actually define how these messages are exchanged. Rather, such exchanges are handled through web-based transports like HTTP or WebSockets. The protocol focuses solely on media negotiation and does not handle call control, call processing, or other functions.
The IETF has been notified of intellectual property rights claimed in regard to some or all of the specification contained in this document. For more information consult the online list of claimed rights.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 03, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.
This specification defines a protocol that allows an RTCWeb browser to exchange information to control the set up of media to another browser or device. The scope of this protocol is limited to functionality required for the setup and negotiation of media and the associated transports, referred to as media control. The protocol defines the minimum set of messages and state machinery necessary to implement the offer/answer model as defined in [RFC3264]. The offer answer model specifies rules for the bilateral exchange of Session Description Protocol (SDP) messages [RFC4566] for creation of media streams.
The protocol specified here defines the state machines, semantic behaviors, and messages that are exchanged between instances of the state machines. However, it does not specify the actual on the wire transport of these messages. Rather, it assumes that the implementation of this protocol would occur within the browser itself, and then browser APIs would allow the application's JavaScript to request creation of messages and insert messages into the state machine. The actual transfer of these messages would be the responsibility of the web application, and would utilize protocols such as HTTP and WebSockets. To facilitate implementation within a browser, messages are encoded in JSON [RFC4627]. This protocol, with appropriate selected transports, could also be implemented by a signalling gateway that converts ROAP to SIP or Jingle.
This protocol is designed to be closely aligned with the PeerConnection API defined in the RTCWeb API[webrtc-api] specification. It is important to note that while ROAP does not require what has been referred to as a low level API for media manipulation, ROAP does not prevent having a such an API as well and both styles of API could coexist and be used where appropriate.
The protocol defined here does not provide any call control. Concepts like ringing of phones, user search, call forwarding, redirection, transfer, hold, and so on, are all the domain of call processing and are out of scope for this specification. It is assumed that the application running within the browser provides any call control based on the needs of the application, the scope of which is not a matter for standardization.
Despite that fact that it has an abstract transport, ROAP is still a protocol. This means it has state machines, and it has rules governing the behavior of those state machines which guarantee that system operates properly based on any set of inputs. It is assumed that this state machinery is implemented in the browser and thus immutable by the application, which can then guarantee proper behavior regardless of the operation of the resident JavaScript.
The protocol is designed to operate between two entities (browsers for example), which exchange messages "directly" - meaning that a message output by one entity is meant to be directly processed by the other entity without further modification. In practice, this means that a web server can treat ROAP messages as opaque and just shuffle them between browser instances. This allows for simple implementations. However, more powerful applications can be built in which the web server or JavaScript can modify the messages in order to provide more complex features. As long as those modifications produce messages compliant to this specification, SDP Offer/Answer [RFC3264], SDP [RFC4566], ICE [RFC5245] and any other dependencies, interoperability is still possible.
This protocol is designed for two major use cases:
In the browser to SIP use case, the gateway obviously needs to be somewhat more sophisticated. However, because this design is a small subset of the design space covered by SIP [RFC3261], it is intended to be simple to translate to and from/SIP via a signalling gateway. Moreover, many of the elements in messages have clear mappings to elements in SIP messages, thus allowing simple, stateless translation.
There has been extensive debate about the best architecture for RTCWeb signaling. To a great extent this decision is dictated by the requirements that the signaling mechanism is intended to fit. The protocol in this document was designed to minimize the amount of implementation effort required outside the browser and RTC-Web signaling gateways. This implies the following requirements:
It should be possible to develop a simple browser to browser voice and video service in a small amount of code. In particular, it MUST be possible to implement a functional service such that:
It should be possible to implement a simple RTC-Web gateway that:
Finally it seems clear that SDP is too complicated to reinvent, so despite its manifest deficiencies we opt to take it as-is rather than trying to reinvent it.
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This draft uses the API and terminology described in [webrtc-api].
We start with a simple example. Consider the case where browser A wishes to setup up a media session with browser B. At the high level, A needs to communicate the following information:
The OFFER message is used to carry this information. For example, A might send B:
{ "messageType":"OFFER", "offererSessionId":"13456789ABCDEF", "seq": 1, "sdp":" v=0\n o=- 2890844526 2890842807 IN IP4 192.0.2.1\n s= \n c=IN IP4 192.0.2.1\n t=2873397496 2873404696\n m=audio 49170 RTP/AVP 0" }
The messageType field indicates that this is an OFFER and the offererSessionId indicates the media session that this OFFER is associated with. B can tell that this is for a new media session because it contains a offererSessionId that he has not seen before. The sdp field contains the offer itself, which is just an ordinary SDP offer rendered as a string.
If B elects to start a media session, B responds with an ANSWER message containing SDP, as shown below.
{ "messageType":"ANSWER", "offererSessionId":"13456789ABCDEF", "answererSessionId":"abc1234356", "seq": 1, "sdp":" v=0\n o=- 2890844526 2890842807 IN IP4 192.0.2.3\n s= \n c=IN IP4 192.0.2.3\n t=2873397496 2873404696\n m=audio 49175 RTP/AVP 0" }
The contents of this message are more or less the same as those in the OFFER, except that B also includes a answererSessionId to uniquely identify the session from B's perspective. The combination of offererSessionId and answererSessionId uniquely identifies this session.
Finally, in order to confirm that A has seen B's ANSWER, A responds with an OK message.
{ "messageType":"OK", "offererSessionId":"13456789ABCDEF", "answererSessionId":"abc1234356", "seq": 1 }
Note that all of these messages contain a seq field which contains a transaction sequence number. The seq field makes it possible to correlate messages which belong to the same transaction, as well as to detect duplicates, which is described later in section Section 5.1.
The messageType value of "OFFER" will always contain an SDP offer, and an object with a messageType value of "ANSWER" will always contain an SDP answer. The complete list of message types is defined in Section 5. Only a small number of messages are permitted and much of the message set is devoted to error handling.
In building web systems it is often useful for a request to contain some state that is passed back in future messages. This system includes two types of state: session state and request state. If a browser receives a message that contains state in a setSessionState attribute, any future messages it sends that have the same offererSessionId MUST include this state in a sessionState attribute. Similarly if a request contains an setResponseState attribute, that state MUST be included in any response to that request in a responseState attribute.
Once a session has been set up, additional rounds of offer/answer can be sent using the OFFER/ANSWER/OK sequence. Note that the seq attribute makes it easy to differentiate these additional rounds from the initial exchange and from each other.
At the point that one side which to end the session, it simply sends a SHUTDOWN message which is responded to with an OK response. A SHUTDOWN can be sent regardless of it any response has been received to the initial OFFER. The key purpose of the SHUTDOWN messages is to allow the other side to know they can clean up any state associated with the session.
ROAP messages are typically carried over a reliable transport (likely HTTP via XMLHttpRequest or WebSockets), so the chance of message loss is low (though non-zero), provided that the signaling service is up. However, the common web reliability and scaleability model is based on the principle that transactions are idempotent and that requests can just be discarded and will be retried. A retry of a transaction might happened if a given host was down and the DNS round robin approach wanted to move to the next server, or if a server was overloaded, or if there was a hiccup in the network. Web applications that want to work well need to deal with theses issues to get the advantages of the general web design pattern for scaleability and reliability. Because only the application knows what its internal reliability characteristics are, the JS application (and whatever associated servers it uses) are ultimately responsible for ensuring end-to-end delivery; the browser simply assumes that messages which are provided to the JS will be delivered eventually.
However, in order to maintain OFFER/ANSWER transaction state, the SDP state machine does need to understand when the far end has received an ANSWER if it caused an error or not. To support this model, OFFER and ANSWER messages are acknowledged end to end with an ANSWER or OK however any retransmission need to be handled by the JS or whatever is providing the transport of the ROAP messages. The combination of the sessionID and seq allow the browser to detect and discard duplicate requests and to detect glare.
Each call is identified by a pair of session identifiers:
The session ID values MUST be generated so that they are globally unique. Thus, the combination of both sessionIds is itself globally unique. Session IDs never change for the duration of an media session.
All messages MUST contain the "offererSessionId", and all messages other than OFFER or an error in response to an OFFER MUST contain both "offererSessionId" and "answererSessionId".
This is a sequence counter for the key requests that helps correlate responses to the correct request.
This is a 32-bit unsigned integer. On each new OFFER (from either browser) it is incremented by one. The Seq of an OK or ANSWER is set to the same Seq that was used in the OFFER which caused it. When a PeerConnection objects originates a new session by sending an OFFER type message, it starts the Seq at 1.
While session IDs serve to uniquely identify a session, it may be useful to allow one or another sides to offload state onto the other side (for instance to enable a stateless gateway). The "setSessionToken" and "sessionToken" fields are used for this purpose. When an implementation receives a message with a "setSessionToken" field, it MUST associate the field value with the session. For all future messages in the session MUST send the associated value in the "sessionToken" field (unless the session token is reset by another "setSessionToken" value). If no session token has yet been received, the "sessionToken" field MUST be omitted.
In addition to tokens which persist for the life of a session, it is also possible to have tokens which are only valid for the lifetime of a given request/response pair. The "setResponseToken" and "responseToken" fields are used for this purpose.
When an implementation responds to a message from the other side (e.g., supplies an answer to an offer, or replies to an answer with an OK), it MUST copy into the "responseToken" field any value found in a "setResponseToken" field in the message being responded to. If no "setResponseToken" field is present, then the "responseToken" field MUST be omitted.
In order to initiate sending media between the browsers, the offerer sends an OFFER message. In order to accept the media, the answerer responds with an ANSWER message. A sample message flow for this is shown below:
participant OffererUA participant OffererJS participant AnswererJS participant AnswererUA OffererJS->OffererUA: peer=new PeerConnection(); OffererJS->OffererUA: peer->addStream(); OffererUA->OffererJS: sendSignalingChannel(); OffererJS->AnswererJS: {"type":"OFFER", "sdp":"..."} AnswererJS->AnswererUA: peer=new PeerConnection(); AnswererJS->AnswererUA: peer->processSignalingMessage(); AnswererUA->AnswererJS: onconnecting(); AnswererUA->OffererUA: ICE starts checking note right of AnswererUA: User decides it is OK to send video AnswererJS->AnswererUA: peer->addStream(); AnswererUA->OffererUA: Media AnswererUA->AnswererJS: sendSignalingChannel(); AnswererJS->OffererJS: {"type":"ANSWER","sdp":"..."} OffererJS->OffererUA: peer->processSignalingMessage(); OffererUA->OffererJS: onaddstream(); OffererUA->AnswererUA: Media AnswererUA->OffererUA: ICE Completes AnswererUA->AnswererJS: onopen(); OffererUA->OffererJS: onopen(); OffererUA->OffererJS: sendSignalingChannel(); OffererJS->AnswererJS: {"type":"OK" } AnswererJS->AnswererUA: peer->processSignalingMessage(); AnswererUA->AnswererJS: onaddstream();
The above figure shows a simple message flow for negotiating media:
The contents of each of these messages is detailed below.
The first OFFER message with a given offererSessionId is used to indicate the desire to start a media session.
In order to start a new media session, a offerer constructs a new OFFER message with a fresh offererSessionId. The answererSessionId field MUST be empty. Like all SDP offers, the message MUST contain an "sdp" field with the offerer's offer. It MUST also contain the tieBreaker field, containing a 32 bit random integer used for glare resolution as described in Section 5.4.1.
A answerer can receive an OFFER in three cases:
The first two situations are described in this section. The third case is described in Section 5.4. Any other condition represents an alien packet and SHOULD be rejected with Error:NOMATCH
If no media session exists with the given "offererSessionId" value, then this is a new media session. The answerer has three primary options:
In either of the latter two cases, the answerer performs the following steps:
If an OFFER is received that has already been received and responded to and the media session still exists, then the answerer MUST respond with the same message as before. If the session has been terminated in the meantime, then an Error:NOMATCH message SHOULD be sent.
The ANSWER message is used by the receiver of an OFFER message to indicate that the offer has been accepted. The ANSWER message MUST contain the answererSessionId for this media session and an sdp parameter containing ICE candidates and the final media parameters for the session (although of course these can be adjusted by a new OFFER/ANSWER exchange. See Section 5.4). In addition, ANSWERs MAY contain the moreComing flag, as described below.
This is a boolean flag that can only appear in an ANSWER and, if set to true, indicates that this answer is not the final answer that will be sent for the associated OFFER. If this flag is not present, it is assumed to be false.
One motivating use case for moreComing is where an Agent wishes to respond immediately to an OFFER in order to start ICE checking before the user has provided authorization to send media. The Agent cannot send an ANSWER containing media information but can send ICE candidate. In this case, the Agent could send an ANSWER that had moreComing=true but that allowed ICE to start. Then later, when the user had authorized the media, the Agent could send an ANSWER with the moreComing flag=false that indicated this was the final media selection.
To see why simply having multiple independent offers (as opposed to multiple answers for a single offer), consider the case where browser A requests video with B. When the A side that sent the initial OFFER gets an ANSWER that rejects the video, it may very well present a UI indication that there is no media. Five seconds later when browser B sends an OFFER requesting video, browser A may present a UI element that asks is OK to do the video that was just rejected. This results in a bad user experience and in the extreme can result in both sides always rejecting the other side's OFFER of video, then waiting for the user to authorize video that results in a new OFFER that is always rejected.
It easier to be able to indicate that OFFER resulted in one valid ANSWER, but that the OFFER needs to be held open as other valid ANSWERS which would replace the current one. This stops the other side from generating new a new OFFER while this is taking place. This is also needed to support a SIP gateway doing early media.
The OK message is used by the receiver of an ANSWER message to indicate that it has received the ANSWER message. It has no contents itself and is merely used to stop the retransmissions of the ANSWER.
The ERROR message is used to indicate that there has been an error. The contents and semantics of this message are defined in Section 5.6.
Once a call has been set up, it is common to want to adjust the media parameters, e.g., to add video to an audio-only call. This is also done with the OFFER/ANSWER/OK sequence of messages, though the details are slightly different.
Either side may initiate a new OFFER/ANSWER exchange by sending an OFFER message. However, implementations MUST NOT attempt this for sessions which are still in active negotiation. Specifically, the offerer MUST NOT send a new OFFER until it has received the ANSWER, and the answerer MUST NOT send a new OFFER until it has received the OK indicating receipt of the ANSWER.
A new OFFER MUST contain a complete set of media parameters describing the proposed new media configuration as well as a full set of ICE parameters. The recipient of a new OFFER on a valid connection MUST respond with an appropriate ANSWER message. However that message MAY refuse to accept the proposed new configuration. If the session has been terminated in the meantime, then an Error:NOMATCH message SHOULD be sent.
Because a change of media parameters may be initiated by either side, there is a potential for the change requests to occur simultaneously (i.e., "glare"). This document defines a glare handling procedure that results in immediate resolution of the glare condition allowing one OFFER message to continue to be processed while the other is terminated. It is defined in such a way that it can interwork with SIP's glare handling mechanism. However SIP's timer based mechanism aren't suitable for the ROAP as strict requirements on ROAP message transport between end-points are not possible and thus easily could result in an repeated glare situation.
To achieve immediate resolution each OFFER message includes a 32 unsigned integer value, the tie breaker, that is randomly generated for each new OFFER message an end-point issues. Whenever a end-point receives an OFFER message that has the same sequence number as an outstanding OFFER the end-point itself sent, a glare condition has arisen. In a glare condition the end-point compares the received OFFER's tiebreaker value with the tiebreaker value of the tiebreaker in the OFFER outstanding. The OFFER with the greatest numerical value wins and that OFFER is allowed to continue being processed. IF the received OFFER lost the tie breaking an Error:CONFLICT message is sent. If it is the outstanding OFFER that lost, the end-point can expect an Error:CONFLICT message to be eventually received. However, that OFFER can immediately be considered as terminated.
Some special considerations has been made in this glare handling for interworking well with SIP glare handling as currently specified. Thus it has the notion of a gateway that converts the ROAP message into SIP message. This process is discussed in more detail below after the basic rules are defined normatively.
A regular end-point SHALL generate a random 32-bit unsigned numerical value for each OFFER message. In the case the random value becomes 0 or 4,294,967,295 a new random value SHALL be generated until it is neither values. The values 0 and 4,294,967,295 MAY be assigned to ROAP messages generated by gateways to ensure efficient glare handling towards other systems.
An ROAP message end-point that has an outstanding OFFER, i.e. an OFFER where it has not yet received an ANSWER SHALL upon receiving an OFFER perform the following processing:
The following figure assumes the previous message flow has happened and media is flowing.
participant OffererUA participant OffererJS participant AnswererJS participant AnswererUA note left of OffererJS: "Hi, Let's do video" note right of AnswererJS: "Sounds great" OffererJS->OffererUA: peer->addStream( new MediaStream() ); OffererUA->OffererJS: sendSignalingChannel(); AnswererJS->AnswererUA: peer->addStream( new MediaStream() ); AnswererUA->AnswererJS: sendSignalingChannel(); OffererJS->AnswererJS: {"type":"OFFER", tiebreaker="123", "sdp":"..."} AnswererJS->OffererJS: {"type":"OFFER", tiebreaker="456", "sdp":"..."} AnswererJS->AnswererUA: peer->processSignalingMessage(); OffererJS->OffererUA: peer->processSignalingMessage(); OffererUA->OffererJS: sendSignalingChannel(); AnswererUA->AnswererJS: sendSignalingChannel(); OffererJS->AnswererJS: {"type":"ERROR",error="conflict","sdp":"..."} AnswererJS->OffererJS: {"type":"ANSWER", "sdp":"..."} AnswererJS->AnswererUA: peer->processSignalingMessage(); OffererJS->OffererUA: peer->processSignalingMessage(); OffererUA->OffererJS: sendSignalingChannel(); OffererJS->AnswererJS: {"type":"OK"} AnswererJS->AnswererUA: peer->processSignalingMessage(); AnswererUA->AnswererJS: onaddstream(); AnswererUA->AnswererJS: sendSignalingChannel(); AnswererJS->OffererJS: {"type":"OFFER", tiebreaker="789", "sdp":"..."} OffererJS->OffererUA: peer->processSignalingMessage(); OffererUA->OffererJS: sendSignalingChannel(); OffererJS->AnswererJS: {"type":"ANSWER", "sdp":"..."} AnswererJS->AnswererUA: peer->processSignalingMessage(); AnswererUA->OffererUA: Both way Video AnswererUA->AnswererJS: sendSignalingChannel(); AnswererJS->OffererJS: {"type":"OK"} OffererJS->OffererUA: peer->processSignalingMessage(); OffererUA->OffererJS: onaddstream();
It is an error, though technically possible, for an agent to generate a second OFFER while it already has an unanswered OFFER pending. An agent which receives such an offer MUST respond with an Error:FAILED message containing a "RetryAfter" attribute generated as a random value from 0 to 10 seconds.
The SHUTDOWN message is used to indicate the termination of an existing session. Either side may initiate a SHUTDOWN at any time during the session, including while the initial OFFER is outstanding (i.e., before an ANSWER has been sent/received.)
TODO - FIX NAMES participant OffererUA participant OffererJS participant AnswererJS participant AnswererUA OffererJS->OffererUA: peer->close(); OffererUA->OffererJS: sendSignalingChannel(); OffererJS->AnswererJS: { "type":"SHUTDOWN" } AnswererJS->AnswererUA: peer->processSignalingMessage(); AnswererUA->AnswererJS: onclose(); AnswererUA->AnswererJS: sendSignalingChannel(); AnswererJS->OffererJS: {"type":"OK"} OffererJS->OffererUA: peer->processSignalingMessage(); OffererUA->OffererJS: onclose();
Upon receipt of a SHUTDOWN which corresponds to an existing session, an agent MUST immediately terminate the session and send an OK message. Subsequent messages directed to this session MUST result in an Error:NOMATCH message. Similarly, on receipt of the OK, the agent which sent the SHUTDOWN MUST terminate the session and SHOULD respond to future messages with Error:NOMATCH.
Errors are indicated by the messageType "ERROR". All errors MUST contain an "errorType" field indicating the type of error which occurred and echo the "seq" value (if any) and the session id values of the message which generated the error. The following sections describe each error type.
An implementation which receives a message with either an unknown offererSessionId (for an OFFER) or an unknown offererSessionId/answererSessionId pair SHOULD respond with a NOMATCH error.
The TIMEOUT error is used to indicate that the corresponding message required some processing which timed out. For instance, an agent which is a SIP gateway translates ROAP signaling messages into SIP messages. If those SIP messages time out, the gateway would generate a TIMEOUT error.
An agent which has received an initial OFFER MAY indicate its refusal of the media session by sending a REFUSED error. Note that this error is not required; an agent MAY simply drop the OFFER with no acknowledgement at all. However, agents which do not wish to accept subsequent OFFERS SHOULD [OPEN ISSUE: MUST?] send a REFUSED in order to avoid timeouts and confusion on the offerer side.
The CONFLICT error is used to indicate that an agent has received an OFFER while it has its own OFFER outstanding. The offerer's behavior in response to this error is defined in Section 5.4.1.
The DOUBLECONFLICT error is used to indicate the tiebreaker values in CONFLICT were the same. See Section 5.4.1.
FAILED is a catch-all error indicating that something went wrong while processing a message. A FAILED error MAY contain a "retryAfter" field, which indicates the time (in seconds) after which the message MAY be retried (though retries are OPTIONAL).
TBD
The offer / answer concepts in this draft are not enough to meet all the use cases of RTCWeb. They need to be combined with some additional functionality that the browser exposes to the JavaScript applications. This additional functionality loosely falls into three categories: capabilities, hints, and stats. The capabilities allow the JS application to find out what video codecs and capabilities a given browser supports before initiating a media session. The hints provide a way for the JS application to provide useful information to the browser about how the media will be used so that the browser can negotiate appropriate codecs and modes. Stats provides statistics about what the current media sessions. The capabilities, hints, and stats do not need to be communicated between the two browsers, so they are not specified in this draft. However, this drafts assumes the existence of API so that these three can be used to build complete systems. Some of the assumptions about these APIs are described in the following sections.
The APIs need to provide a way to find out the capabilities as defined in section 9 of RFC 3264. This allows the JS to find out the codecs that the browser supports.
When creating a new PeerConenction in a browser, the application needs to be able to provide optional hints to the browser about preferences for the media to be negotiated. These include:
The JS applications should also be able to update and change these hints mid-session. Some types of hint changes may simply impact the parameter on various codecs and require no signalling to the other end of the media stream. Other types of hint changes may cause a new offer answer exchange.
Several parts of the media session create statistics that are important to some applications. APIs should provide the JS applications with information on the following statistics:
The SIP [RFC3261] specifies an application protocol that provides a complete solution for setting up and managing communications on the Internet. It combines both "call processing" functions - identity and name spaces, call routing, user search, call features, authentication, and so on - as well as media processing through its transport of SDP and support for the offer/answer model.
In a web context, application processing can be done through proprietary logic implemented in Javascript/HTML, along with proprietary logic implemented in the web server, and proprietary messaging transported through HTTP and WebSockets. One of the advantages of the web is to allow a rich set of applications to be built without changing the browser. Although application processing and be done in JavaScript and the web servers, we do require raw media control in the browser. ROAP basically extracts the offer/answer media control processing used in SIP, and puts it into an protocol that can operate independently of SIP itself.
The information contained in ROAP messages corresponds closely to the offer/answer information carried by complete solutions such as SIP and Jingle, so it is straightforward to build gateways to and from ROAP. These gateways need only translate the signaling, while allowing end-to-end media without the need for media relays (except, of course, for NAT traversal.) In the case of SIP, which uses SDP directly, such gateways would translate between SIP and ROAP, while transporting SDP end-to-end. In the case of Jingle [XEP-0166], it would also be necessary to translate between SDP and the Jingle offer/answer format; [XEP-0167] describes such a mapping.
This document requires no actions from IANA.
The text for the glare resoltuion section was provided by Magnus Westerlund. Many thanks for comment, ideas, and text from Eric Rescorla, Harald Alvestrand, Magnus Westerlund, Ted Hardie, and Stefan Hakansson.
How to negotiate support for enhancements to this JSON message. (consider supported / required )
Common way to indicate destination in offer going to a signalling gateway.
Need to generate proper ASCII art version of message flows.
[RFC4627] | Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006. |
[RFC3264] | Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC4566] | Handley, M., Jacobson, V. and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. |