Internet-Draft | rush | May 2023 |
Pugin, et al. | Expires 12 November 2023 | [Page] |
RUSH is an application-level protocol for ingesting live video. This document describes the protocol and how it maps onto QUIC.¶
This note is to be removed before publishing as an RFC.¶
Discussion of this document takes place on the mailing list (), which is archived at .¶
Source for this draft and an issue tracker can be found at https://github.com/afrind/draft-rush.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 12 November 2023.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
RUSH is a bidirectional application level protocol designed for live video ingestion that runs on top of QUIC.¶
RUSH was built as a replacement for RTMP (Real-Time Messaging Protocol) with the goal to provide support for new audio and video codecs, extensibility in the form of new message types, and multi-track support. In addition, RUSH gives applications option to control data delivery guarantees by utilizing QUIC streams.¶
This document describes the RUSH protocol, wire format, and QUIC mapping.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
logical unit of information that client and server can exchange¶
presentation timestamp¶
decoding timestamp¶
advanced audio codec¶
network abstract layer unit¶
video parameter set (H265 video specific NALU)¶
sequence parameter set (H264/H265 video specific NALU)¶
picture parameter set (H264/H265 video specific NALU)¶
Audio Data Transport Stream Header¶
Audio specific config¶
Group of pictures, specifies the order in which intra- and inter-frames are arranged.¶
In order to live stream using RUSH, the client establishes a QUIC connection using the ALPN token "rush".¶
After the QUIC connection is established, client creates a new bidirectional
QUIC stream, choses starting frame ID and sends Connect
frame
Section 4.2.1 over that stream. This stream is called the Connect Stream.¶
The client sends mode of operation
setting in Connect
frame Section 4.2.1 payload.¶
One connection SHOULD only be used to send one media stream, for now 1 video and 1 audio track are supported. In the future we could send multiple tracks per stream.¶
The client can choose to wait for the ConnectAck
frame Section 4.2.2
or it can start optimistically sending data immediately after sending the Connect
frame.¶
A track is a logical organization of the data, for example, video can have one video track, and two audio tracks (for two languages). The client can send data for multiple tracks simultaneously.¶
The encoded audio or video data of each track is serialized into frames (see Section 4.2.6 or Section 4.2.5) and transmitted from the client to the server. Each track has its own monotonically increasing frame ID sequence. The client MUST start with initial frame ID = 1.¶
Depending on mode of operation (Section 4.3), the client sends audio and video frames on the Connect stream or on a new QUIC stream for each frame.¶
In Multi Stream Mode
(Section 4.3.2), the client can stop sending a
frame by resetting the corresponding QUIC stream. In this case, there is no
guarantee that the frame was received by the server.¶
Upon receiving Connect
frame Section 4.2.1, if the server accepts the stream, the server will reply with ConnectAck
frame Section 4.2.2 and it will prepare to receive audio/video data.¶
It's possible that in Multi Stream Mode
(Section 4.3.2), the server
receives audio or video data before it receives the Connect
frame Section 4.2.1. The
implementation can choose whether to buffer or drop the data.
The audio/video data cannot be interpreted correctly before the arrival of the Connect
frame Section 4.2.1.¶
In Single Stream Mode
(Section 4.3.1), it is guaranteed by the transport that
frames arrive into the application layer in order they were sent.¶
In Multi Stream Mode
, it's possible that frames arrive at the application
layer in a different order than they were sent, therefore the server MUST keep
track of last received frame ID for every track that it receives. A gap in the
frame sequence ID on a given track can indicate out of order delivery and the
server MAY wait until missing frames arrive. The server must consider frame lost
if the corresponding QUIC stream was reset.¶
Upon detecting a gap in the frame sequence, the server MAY wait for the missing
frames to arrive for an implementation defined time. If missing frames don't
arrive, the server SHOULD consider them lost and continue processing rest of the
frames. For example if the server receives the following frames for track 1: 1
2 3 5 6
and frame #4
hasn't arrived after implementation defined timeout,
thee server SHOULD continue processing frames 5
and 6
.¶
It is worth highlighting that in multi stream mode there is a need for a de-jitter function (that introduces latency). Also the subsequent processing pipeline should tolerate lost frames, so "holes" in the audio / video streams.¶
When the client is done streaming, it sends the End of Video
frame
(Section 4.2.3) to indicate to the server that there won't be any more
data sent.¶
If the QUIC connection is closed at any point, client MAY reconnect by simply
repeat the Connection establishment
process (Section 3.1) and
resume sending the same video where it left off. In order to support
termination of the new connection by a different server, the client SHOULD
resume sending video frames starting with I-frame, to guarantee that the video
track can be decoded from the 1st frame sent.¶
Reconnect can be initiated by the server if it needs to "go away" for
maintenance. In this case, the server sends a GOAWAY
frame (Section 4.2.7)
to advise the client to gracefully close the connection. This allows client to
finish sending some data and establish new connection to continue sending
without interruption.¶
The client and server exchange information using frames. There are different types of frames and the payload of each frame depends on its type.¶
The bytes in the wire are in big endian¶
Generic frame format:¶
0 1 2 3 4 5 6 7 +--------------------------------------------------------------+ | Length (64) | +--------------------------------------------------------------+ | ID (64) | +-------+------------------------------------------------------+ |Type(8)| Payload ... | +-------+------------------------------------------------------+¶
Each frame starts with length field, 64 bit size that tells size of the frame in bytes (including predefined fields, so if LENGTH is 100 bytes, then PAYLOAD length is 100 - 8 - 8 - 1 = 82 bytes).¶
64 bit frame sequence number, every new frame MUST have a sequence ID greater than that of the previous frame within the same track. Track ID would be specified in each frame. If track ID is not specified it's 0 implicitly.¶
1 byte representing the type of the frame.¶
Predefined frame types:¶
Frame Type | Frame |
---|---|
0x0 | connect frame |
0x1 | connect ack frame |
0x2 | reserved |
0x3 | reserved |
0x4 | end of video frame |
0x5 | error frame |
0x6 | reserved |
0x7 | reserved |
0x8 | reserved |
0x9 | reserved |
0xA | reserved |
0XB | reserved |
0xC | reserved |
0xD | video frame |
0xE | reserved |
0XF | reserved |
0X10 | reserved |
0x11 | reserved |
0x12 | reserved |
0x13 | reserved |
0x14 | audio frame |
0x15 | GOAWAY frame |
0x16 | Timed metadata |
+--------------------------------------------------------------+ | Length (64) | +--------------------------------------------------------------+ | ID (64) | +-------+-------+---------------+---------------+--------------+ | 0x0 |Version|Video Timescale|Audio Timescale| | +-------+-------+---------------+---------------+--------------+ | Live Session ID(64) | +--------------------------------------------------------------+ | Payload ... | +--------------------------------------------------------------+¶
version of the protocol (initial version is 0x0).¶
timescale for all video frame timestamps on this connection. For instance 25¶
timescale for all audio samples timestamps on this connection, recommended value same as audio sample rate, for example 44100¶
identifier of broadcast, when reconnect, client MUST use the same live session ID¶
application and version specific data that can be used by the server. OPTIONAL A possible implementation for this could be to add in the payload a UTF-8 encoded JSON data that specifies some parameters that server needs to authenticate / validate that connection, for instance: ~~~ payloadBytes = strToJSonUtf8('{"url": "/rtmp/BID?s_bl=1&s_l=3&s_sc=VALID&s_sw=0&s_vt=usr_dev&a=TOKEN"}') ~~~¶
This frame is used by the client to initiate broadcasting. The client can start sending other frames immediately after Connect frame Section 4.2.1 without waiting acknowledgement from the server.¶
If server doesn't support VERSION sent by the client, the server sends an Error
frame Section 4.2.4 with code UNSUPPORTED VERSION
.¶
If audio timescale or video timescale are 0, the server sends error frame Section 4.2.4 with
error code INVALID FRAME FORMAT
and closes connection.¶
If the client receives a Connect frame from the server, the client sends an
Error frame Section 4.2.4 with code TBD
.¶
0 1 2 3 4 5 6 7 +--------------------------------------------------------------+ | Length (64) = 17 | +--------------------------------------------------------------+ | ID (64) | +-------+------------------------------------------------------+ | 0x1 | +-------+¶
The server sends the "Connect Ack" frame in response to "Connect" Section 4.2.1 frame indicating that server accepts "version" and the stream is authenticated / validated (optional), so it is ready to receive data.¶
If the client doesn't receive "Connect Ack" frame from the server within a timeout, it will close the connection. The timeout value is chosen by the implementation.¶
There can be only one "Connect Ack" frame sent over lifetime of the QUIC connection.¶
If the server receives a Connect Ack frame from the client, the client sends an
Error frame with code TBD
.¶
+--------------------------------------------------------------+ | Length (64) = 17 | +--------------------------------------------------------------+ | ID (64) | +-------+------------------------------------------------------+ | 0x4 | +-------+¶
End of Video frame is sent by a client when it's done sending data and is about to close the connection. The server SHOULD ignore all frames sent after that.¶
+--------------------------------------------------------------+ | Length (64) = 29 | +--------------------------------------------------------------+ | ID (64) | +-------+------------------------------------------------------+ | 0x5 | +-------+------------------------------------------------------+ | Sequence ID (64) | +------------------------------+-------------------------------+ | Error Code (32) | +------------------------------+¶
ID of the frame sent by the client that error is generated for, ID=0x0 indicates connection level error.¶
Indicates the error code¶
Error frame can be sent by the client or the server to indicate that an error occurred.¶
Some errors are fatal and the connection will be closed after sending the Error frame.¶
See section Section 5.1 and Section 5.2 for more information about error codes¶
+--------------------------------------------------------------+ | Length (64) | +--------------------------------------------------------------+ | ID (64) | +-------+-------+----------------------------------------------+ | 0xD | Codec | +-------+-------+----------------------------------------------+ | PTS (64) | +--------------------------------------------------------------+ | DTS (64) | +-------+------------------------------------------------------+ |TrackID| | +-------+-------+----------------------------------------------+ | I Offset | Video Data ... | +---------------+----------------------------------------------+¶
specifies codec that was used to encode this frame.¶
presentation timestamp in connection video timescale¶
decoding timestamp in connection video timescale¶
Supported type of codecs:¶
Type | Codec |
---|---|
0x1 | H264 |
0x2 | H265 |
0x3 | VP8 |
0x4 | VP9 |
ID of the track that this frame is on¶
Distance from sequence ID of the I-frame that is required before this frame can be decoded. This can be useful to decide if frame can be dropped.¶
variable length field, that carries actual video frame data that is codec dependent¶
For h264/h265 codec, "Video Data" are 1 or more NALUs in AVCC format (4 bytes size header):¶
0 1 2 3 4 5 6 7 +--------------------------------------------------------------+ | NALU Length (64) | +--------------------------------------------------------------+ | NALU Data ... +--------------------------------------------------------------+¶
EVERY h264 video key-frame MUST start with SPS/PPS NALUs. EVERY h265 video key-frame MUST start with VPS/SPS/PPS NALUs.¶
Binary concatenation of "video data" from consecutive video frames, without data loss MUST produce VALID h264/h265 bitstream.¶
+--------------------------------------------------------------+ | Length (64) | +--------------------------------------------------------------+ | ID (64) | +-------+------------------------------------------------------+ | 0x14 | Codec | +-------+-------+----------------------------------------------+ | Timestamp (64) | +-------+-------+-------+--------------------------------------+ |TrackID| Header Len | +-------+-------+-------+--------------------------------------+ | Header + Audio Data ... +--------------------------------------------------------------+¶
specifies codec that was used to encode this frame.¶
Supported type of codecs:¶
Type | Codec |
---|---|
0x1 | AAC |
0x2 | OPUS |
timestamp of first audio sample in Audio Data.¶
ID of the track that this frame is on¶
Length in bytes of the audio header contained in the first portion of the payload¶
it carries the audio header and 1 or more audio frames that are codec dependent.¶
For AAC codec:
- "Audio Data" are 1 or more AAC samples, prefixed with Audio Specific Config (ASC) header defined in ISO 14496-3
- Binary concatenation of all AAC samples in "Audio Data" from consecutive audio frames, without data loss MUST produce VALID AAC bitstream.¶
For OPUS codec: - "Audio Data" are 1 or more OPUS samples, prefixed with OPUS header as defined in [RFC7845]¶
0 1 2 3 4 5 6 7 +--------------------------------------------------------------+ | 17 | +--------------------------------------------------------------+ | ID (64) | +-------+------------------------------------------------------+ | 0x15 | +-------+¶
The GOAWAY frame is used by the server to initiate graceful shutdown of a connection, for example, for server maintenance.¶
Upon receiving GOAWAY frame, the client MUST send frames remaining in current GOP and stop sending new frames on this connection. The client SHOULD establish a new connection and resume sending frames there, so when resume video frame will start with an IDR frame.¶
After sending a GOAWAY frame, the server continues processing arriving frames for an implementation defined time, after which the server SHOULD close the connection.¶
+--------------------------------------------------------------+ | Length (64) | +--------------------------------------------------------------+ | ID (64) | +-------+------------------------------------------------------+ | 0x16 |TrackID| +-------+-------+----------------------------------------------+ | Topic (64) | +--------------------------------------------------------------+ | EventMessage (64) | +-------+------------------------------------------------------+ | Timestamp (64) | +-------+------------------------------------------------------+ | Duration (64) | +-------+------------------------------------------------------+ | Payload ... +--------------------------------------------------------------+¶
ID of the track that this frame is on¶
PTS of the event¶
A unique identifier of the app level feature. May be used to decode payload or do other application specific processing¶
A unique identifier of the event message used for app level events deduplication¶
duration of the event in video PTS timescale. Can be 0.¶
variable length field. May be used by the app to send additional event metadata. UTF-8 JSON recommended¶
One of the main goals of the RUSH protocol was ability to provide applications a way to control reliability of delivering audio/video data. This is achieved by using a special mode Section 4.3.2.¶
In single stream mode, RUSH uses one bidirectional QUIC stream to send data and receive data. Using one stream guarantees reliable, in-order delivery - applications can rely on QUIC transport layer to retransmit lost packets. The performance characteristics of this mode are similar to RTMP over TCP.¶
In single stream mode Section 4.3.1, if packet belonging to video frame is lost, all packets sent after it will not be delivered to application, even though those packets may have arrived at the server. This introduces head of line blocking and can negatively impact latency.¶
To address this problem, RUSH defines "Multi Stream Mode", in which one QUIC stream is used per audio/video frame.¶
Connection establishment follows the normal procedure by client sending Connect frame, after that Video and Audio frames are sent using following rules:¶
The receiver reconstructs the track using the frames IDs.¶
Response Frames (Connect AckSection 4.2.2 and ErrorSection 4.2.4), will be in the response stream of the stream that sent it.¶
The client MAY control delivery reliability by setting a delivery timer for every audio or video frame and reset the QUIC stream when the timer fires. This will effectively stop retransmissions if the frame wasn't fully delivered in time.¶
Timeout is implementation defined, however future versions of the draft will define a way to negotiate it.¶
An endpoint that detects an error SHOULD signal the existence of that error to its peer. Errors can affect an entire connection (see Section 5.1), or a single frame (see Section 5.2).¶
The most appropriate error code SHOULD be included in the error frame that signals the error.¶
Affects the the whole connection:¶
1 - UNSUPPORTED VERSION - indicates that the server doesn't support version specified in Connect frame 4- CONNECTION_REJECTED - Indicates the server can not process that connection for any reason¶
There are two error codes defined in core protocol that indicate a problem with a particular frame:¶
2 - UNSUPPORTED CODEC - indicates that the server doesn't support the given audio or video codec¶
3 - INVALID FRAME FORMAT - indicates that the receiver was not able to parse the frame or there was an issue with a field's value.¶
RUSH permits extension of the protocol.¶
Extensions are permitted to use new frame types (Section 4), new error codes (Section 4.2.4), or new audio and video codecs (Section 4.2.6, Section 4.2.5).¶
Implementations MUST ignore unknown or unsupported values in all extensible
protocol elements, except codec id
, which returns an UNSUPPORTED CODEC error.
Implementations MUST discard frames that have unknown or unsupported types.¶
RUSH protocol relies on security guarantees provided by the transport.¶
Implementation SHOULD be prepared to handle cases when sender deliberately sends frames with gaps in sequence IDs.¶
Implementation SHOULD be prepare to handle cases when server never receives Connect frame (Section 4.2.1).¶
A frame parser MUST ensure that value of frame length field (see Section 4.1) matches actual length of the frame, including the frame header.¶
Implementation SHOULD be prepare to handle cases when sender sends a frame with large frame length field value.¶
TODO: add frame type registry, error code registry, audio/video codecs registry¶
This draft is the work of many people: Vlad Shubin, Nitin Garg, Milen Lazarov, Benny Luo, Nick Ruff, Konstantin Tsoy, Nick Wu.¶