TOC |
|
The WebSocket protocol enables two-way communication between a user agent running untrusted code running in a controlled environment to a remote host that has opted-in to communications from that code. The security model used for this is the Origin-based security model commonly used by Web browsers. The protocol consists of an initial handshake followed by basic message framing, layered over TCP. The goal of this technology is to provide a mechanism for browser-based applications that need two-way communication with servers that does not rely on opening multiple HTTP connections (e.g. using XMLHttpRequest or <iframe>s and long polling).
Please send feedback to the hybi@ietf.org mailing list.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
This Internet-Draft will expire on May 13, 2011.
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1.
Opening Handshake
1.1.
Client Requirements
1.2.
Server-side requirements
1.2.1.
Reading the client's opening handshake
1.2.2.
Sending the server's opening handshake
2.
Normative References
§
Author's Address
TOC |
TOC |
User agents running in controlled environments, e.g. browsers on mobile handsets tied to specific carriers, may offload the management of the connection to another agent on the network. In such a situation, the user agent for the purposes of conformance is considered to include both the handset software and any such agents.
When the user agent is to establish a WebSocket connection to a host /host/, on a port /port/, from an origin whose ASCII serialization is /origin/, with a flag /secure/, with a string giving a /resource name/, with a (possibly empty) list of strings giving the /protocols/, and optionally with a /defer cookies/ flag, it must run the following steps. [ORIGIN] (Barth, A., Jackson, C., and I. Hickson, “The HTTP Origin Header,” September 2009.)
Otherwise, if the user agent is not configured to use a proxy, then open a TCP connection to the host given by /host/ and the port given by /port/.EXAMPLE: For example, if the user agent uses an HTTP proxy for all traffic, then if it was to try to connect to port 80 on server example.com, it might send the following lines to the proxy server:
CONNECT example.com:80 HTTP/1.1 Host: example.com
If there was a password, the connection might look like:
CONNECT example.com:80 HTTP/1.1 Host: example.com Proxy-authorization: Basic ZWRuYW1vZGU6bm9jYXBlcyE=
NOTE: This reads a field name, terminated by a colon, converting upper-case letters in the range A-Z to lowercase, and aborting if a stray CR or LF is found.
- -> If the byte is 0x0D (UTF-8 CR)
- If the /name/ byte array is empty, then jump to the fields processing step. Otherwise, fail the WebSocket connection and abort these steps.
- -> If the byte is 0x0A (UTF-8 LF)
- Fail the WebSocket connection and abort these steps.
- -> If the byte is 0x3A (UTF-8 :)
- Move on to the next step.
- -> If the byte is in the range 0x41 to 0x5A (UTF-8 A-Z)
- Append a byte whose value is the byte's value plus 0x20 to the /name/ byte array and redo this step for the next byte.
- -> Otherwise
- Append the byte to the /name/ byte array and redo this step for the next byte.
NOTE: This reads a field value, terminated by a CRLF, skipping past a single space after the colon if there is one.
- -> If the byte is 0x20 (UTF-8 space) and /count/ equals 1
- Ignore the byte and redo this step for the next byte.
- -> If the byte is 0x0D (UTF-8 CR)
- Move on to the next step.
- -> If the byte is 0x0A (UTF-8 LF)
- Fail the WebSocket connection and abort these steps.
- -> Otherwise
- Append the byte to the /value/ byte array and redo this step for the next byte.
- -> If the entry's name is "sec-websocket-accept"
- If the value is not exactly equal to the base64 encoding of the HMAC-SHA1 of the UTF-8 string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" keyed with the client-nonce, then fail the WebSocket connection and abort these steps.
- -> If the entry's name is "sec-websocket-key"
- If the read bytes are not the base64 encoding of a 16 byte sequence, then fail the WebSocket connection and abort these steps. Otherwise, let the server-nonce be that 16 byte sequence. If the server-nonce is identical to the client-nonce, then fail the WebSocket connection and abort these steps.
- -> If the entry's name is "sec-websocket-origin"
- If the value is not exactly equal to /origin/, then fail the WebSocket connection and abort these steps. [ORIGIN] (Barth, A., Jackson, C., and I. Hickson, “The HTTP Origin Header,” September 2009.)
- -> If the entry's name is "sec-websocket-location"
- If the value is not exactly equal to a string obtained from the steps to construct a WebSocket URL from /host/, /port/, /resource name/, and the /secure/ flag, then fail the WebSocket connection and abort these steps.
- -> If the entry's name is "sec-websocket-protocol"
- If there was a /protocols/ string specified, and the value is not exactly equal to one of the items in /protocols/, then fail the WebSocket connection and abort these steps. (If no /protocols/ was specified, the field is ignored.)
- -> If the entry's name is "set-cookie" or "set-cookie2" or another cookie-related field name
- If the relevant specification is supported by the user agent, add the cookie, interpreted as defined by the appropriate specification, to the /list of cookies/, with the resource being the one with the host /host/, the port /port/, the path (and possibly query parameters) /resource name/, and the scheme |http| if /secure/ is false and |https| if /secure/ is true. [RFC2109] (Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” February 1997.) [RFC2965] (Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” October 2000.)
If the relevant specification is not supported by the user agent, then the field must be ignored.
The cookies added to the /list of cookies/ are discarded if the connection fails to be established. Only if and when the connection is established do the cookies actually get applied.- -> Any other name
- Ignore it.
Where the algorithm above requires that a user agent fail the WebSocket connection, the user agent may first read an arbitrary number of further bytes from the connection (and then discard them) before actually failing the WebSocket connection. Similarly, if a user agent can show that the bytes read from the connection so far are such that there is no subsequent sequence of bytes that the server can send that would not result in the user agent being required to fail the WebSocket connection, the user agent may immediately fail the WebSocket connection without waiting for those bytes.
NOTE: The previous paragraph is intended to make it conforming for user agents to implement the algorithm in subtlely different ways that are equivalent in all ways except that they terminate the connection at earlier or later points. For example, it enables an implementation to buffer the entire handshake response before checking it, or to verify each field as it is received rather than collecting all the fields and then checking them as a block.
When the user agent is to "apply the cookies" in a /list of cookies/, it must handle each cookie in the /list of cookies/ as defined by the appropriate specification. [RFC2109] (Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” February 1997.) [RFC2965] (Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” October 2000.)
TOC |
This section only applies to servers.
Servers may offload the management of the connection to other agents on the network, for example load balancers and reverse proxies. In such a situation, the server for the purposes of conformance is considered to include all parts of the server-side infrastructure from the first device to terminate the TCP connection all the way to the server that processes requests and sends responses.
EXAMPLE: For example, a data center might have a server that responds to Web Socket requests with an appropriate handshake, and then passes the connection to another server to actually process the data frames. For the purposes of this specification, the "server" is the combination of both computers.
TOC |
When a client starts a WebSocket connection, it sends its part of the opening handshake. The server must parse at least part of this handshake in order to obtain the necessary information to generate the server part of the handshake.
The client handshake consists of the following parts. If the server, while reading the handshake, finds that the client did not send a handshake that matches the description below, the server should abort the WebSocket connection.
Let the handshake-mask be the HMAC-SHA1 of the UTF-8 string "C1BA787A-0556-49F3-B6AE-32E5376F992B" keyed with the client-nonce.
Let the metadata-string be the masked-metadata unmasked by XORing the /i/th byte of the masked-metadata with the /i mod 20/th byte of the handshake-mask.
Let the metadata-dictionary by the result of parsing the metadata-string as a UTF-8 encoded JSON string.
The expected dictionary keys, and the meaning of their corresponding values, are as follows.
- |host|
- The value gives the hostname that the client intended to use when opening the WebSocket. It would be of interest in particular to virtual hosting environments, where one server might serve multiple hosts, and might therefore want to return different data.
Can be safely ignored, though the server should abort the WebSocket connection if this field is absent or has a value that does not match the server's host name, to avoid vulnerability to cross-protocol attacks and DNS rebinding attacks.- |origin|
- The value gives the scheme, hostname, and port (if it's not the default port for the given scheme) of the page that asked the client to open the WebSocket. It would be interesting if the server's operator had deals with operators of other sites, since the server could then decide how to respond (or indeed, whether to respond) based on which site was requesting a connection. [ORIGIN] (Barth, A., Jackson, C., and I. Hickson, “The HTTP Origin Header,” September 2009.)
Can be safely ignored, though the server should abort the WebSocket connection if this field is absent or has a value that does not match one of the origins the server is expecting to communicate with, to avoid vulnerability to cross-protocol attacks and cross-site scripting attacks.- |protocols|
- The value gives an array of the subprotocols that the client is intending to select. It would be interesting if the server supports multiple protocols or protocol versions.
Can be safely ignored, though the server may abort the WebSocket connection if the field is absent but the conventions for communicating with the server are such that the field is expected; and the server should abort the WebSocket connection if the field has a value that does not match one of the subprotocols that the server supports, to avoid integrity errors once the connection is established.- Other keys
- Other fields can be used, such as "cookie", for authentication purposes. Their semantics are equivalent to the semantics of the HTTP headers with the same names.
Unrecognized fields can be safely ignored, and are probably either the result of clients that support future versions of the protocol offering options that the server doesn't support.
TOC |
When a client establishes a WebSocket connection to a server, the server must run the following steps.
- /host/
- The host name or IP address of the WebSocket server, as it is to be addressed by clients. The host name must be punycode-encoded if necessary. If the server can respond to requests to multiple hosts (e.g. in a virtual hosting environment), then the value should be derived from the client's handshake, specifically from the "Host" field. The /host/ value must be lowercase (not containing characters in the range U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z).
- /port/
- The port number on which the server expected and/or received the connection.
- /resource name/
- An identifier for the service provided by the server. If the server provides multiple services, then the value should be derived from the resource name given in the client's handshake.
- /secure flag/
- True if the connection is encrypted or if the server expected it to be encrypted; false otherwise.
- /origin/
- The ASCII serialization of the origin that the server is willing to communicate with, converted to ASCII lowercase. If the server can respond to requests from multiple origins (or indeed, all origins), then the value should be derived from the client's handshake, specifically from the "Origin" field. [ORIGIN] (Barth, A., Jackson, C., and I. Hickson, “The HTTP Origin Header,” September 2009.)
- /subprotocol/
- Either null, or a string representing the subprotocol the server is ready to use. If the server supports multiple subprotocols, then the value should be derived from the client's handshake, specifically by selecting one of the values from the "protocols" array. The absence of such a field is equivalent to the null value. The empty string is not the same as the null value for these purposes.
HTTP/1.1 200 OK
Optionally, include "Set-Cookie", "Set-Cookie2", or other cookie-related fields, with values equal to the values that would be used for the identically named HTTP headers. [RFC2109] (Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” February 1997.) [RFC2965] (Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” October 2000.)
- |Sec-WebSocket-Accept|
- The value must be the acceptance-proof.
- |Sec-WebSocket-Key|
- The value must be the server-nonce encoded in base64.
- |Sec-WebSocket-Location|
- The value must be /location/
- |Sec-WebSocket-Origin|
- The value must be /origin/
- |Sec-WebSocket-Protocol|
- This field must be included if /subprotocol/ is not null, and must not be included if /subprotocol/ is null.
If included, the value must be /subprotocol/
This completes the server's handshake. If the server finishes these steps without aborting the WebSocket connection, and if the client does not then fail the WebSocket connection, then the connection is established and the server may begin sending and receiving data, as described in the next section. The handshake has established two keys:
All subsequent bytes read by the server from the user agent are unmasked as follows:
The /i/th byte is XORed with the /i mod 20/th byte of the client-to-server-mask.
All subsequence bytes sent from the server to the user agent are masked as follows:
The /i/th byte is XORed with the /i mod 20/th byte of the server-to-client-mask.
TOC |
TOC |
Adam Barth | |
Google, Inc. | |
Email: | ietf@adambarth.com |
URI: | http://www.adambarth.com/ |