Internet-Draft | WARP | February 2022 |
Curley | Expires 13 August 2022 | [Page] |
This document defines the core behavior for Warp, a segmented live video transport protocol. Warp maps live media to QUIC streams based on the underlying media encoding. Media is prioritized to minimize latency during congestion.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 13 August 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Warp is a live video transport protocol that utilizes the [QUIC] network protocol.¶
The live stream is split into segments (Section 2) at I-frame boundaries. These are fragmented MP4 files as defined in [ISOBMFF]. Initialization segments contain track metadata while media segments contain either video or audio samples.¶
QUIC streams (Section 3) are used to transfer messages and segments between endpoints. These streams are prioritized based on the contents, such that the most important media is delivered during congestion.¶
Messages (Section 4) are sent over streams alongside segments. These are used to carry necessary metadata and control messages.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Commonly used terms in this document are described below.¶
The live stream is split into segments before being transferred over the network. Segments are fragmented MP4 files as defined by [ISOBMFF].¶
There are two types of segments: initialization and media.¶
Initialization segments contain track metadata but no sample data.¶
Initialization segments MUST consist of a File Type Box ('ftyp') followed by a Movie Box ('moov'). This Movie Box consists of Movie Header Boxes ('mvhd'), Track Header Boxes ('tkhd'), Track Boxes ('trak'), followed by a final Movie Extends Box ('mvex'). These boxes MUST NOT contain any samples and MUST have a duration of zero.¶
Note that a Common Media Application Format Header [CMAF] meets all these requirements.¶
Media segments contain media samples for a single track.¶
Media segments MUST consist of a Segment Type Box ('styp') followed by at least one media fragment. Each media fragment consists of a Movie Fragment Box ('moof') followed by a Media Data Box ('mdat'). The Media Fragment Box MUST contain a Movie Fragment Header Box ('mfhd') and Track Box ('trak') with a Track ID ('track_ID') matching a Track Box in the initialization segment.¶
Note that a Common Media Application Format Segment [CMAF] meets all these requirements.¶
Warp uses unidirectional QUIC streams to transfer messages and segments over the network. The establishment of the QUIC connection is outside the scope of this document.¶
An endpoints MAY both send media (producer) and receive media (consumer). This is accomplished by sending messages and segments over unidirectional streams. Streams contain any number of messages and segments concatenated together.¶
Messages are used to control playback or carry metadata about upcoming segments.¶
A Warp Box ('warp') is a top-level MP4 box as defined in [ISOBMFF]. The contents of this box is a warp message. See the messages section (Section 4) for the encoding and types available.¶
Segments are transferred over streams alongside messages. Each segment MUST be preceded by an init
(Section 4.1) or media
(Section 4.2) message, indicating the type of segment and providing additional metadata.¶
The media producer SHOULD send each segment as a unique stream to avoid head-of-line blocking. The media producer MAY send multiple segments over a single stream, for simplicity, when head-of-line blocking is desired.¶
A segment is the smallest unit of delivery, as the tail of a segment can be safely delayed/dropped without decode errors. A future version of Warp will support layered coding (additional QUIC streams) to enable dropping or downscalling frames in the middle of a segment.¶
Warp utilizes precedence to deliver the most important content during congestion.¶
The media producer assigns a numeric presidence to each stream. This is a strict prioritzation scheme, such that any available bandwidth is allocated to streams in descending order. QUIC supports stream prioritization but does not standardize any mechanisms; see Section 2.3 in [QUIC]. The media producer MUST support sending priorized streams. The media producer MAY choose to delay retransmitting lower priority streams when possible within QUIC flow control limits.¶
The media consumer determines how long to wait for a given segment (buffer size) before skipping ahead. The media consumer MAY cancel a skipped segment to save bandwidth, or leave it downloading in the background (ex. to support rewind).¶
Prioritization allows a single media producer to support multiple media consumers with different latency targets. For example, one consumer could have a 1s buffer to minimize latency, while another conssumer could have a 5s buffer to improve quality, while a yet another consumer could have a 30s buffer to receive all media (ex. VOD recorder).¶
Live content is encoded and delivered in real-time. Media delivery is blocked on the encoder throughput, except during congestion causing limited network throughput. To best deliver live content:¶
For example, this formula will prioritze audio segments, but only up to 3s in the future:¶
if is_audio: precedence = timestamp + 3s else: precedence = timestamp¶
Recorded content has already been encoded. Media delivery is blocked exclusively on network throughput.¶
Warp is primarily designed for live content, but can switch to head-of-line blocking by changing stream prioritization. This is also useful for content that should not be skipped over, such as advertisements. To enable head-of-line blocking:¶
For example, this formula will prioritize older segments:¶
precedence = -timestamp¶
During congestion, prioritization intentionally cause stream starvation for the lowest priority streams. Some form of starvation will last until the network fully recovers, which may be indefinite.¶
The media consumer SHOULD cancel a stream (via a QUIC STOP_SENDING
frame) after it has been skipped to save bandwidth. The media producer SHOULD reset the lowest priority stream (via QUIC RESET_STREAM
frame) when nearing resource limits. Both of these actions will effectively drop the tail of the segment.¶
Media may go through multiple hops and processing steps on the path from the broadcaster to player. The full effectiveness of warp as an end-to-end protocol depends on middleware support.¶
priority
message (Section 4.3) for downstream servers.¶
Warp endpoints communicate via messages contained in the top-level Warp Box (warp).¶
A warp message is JSON object, where the key defines the message type and the value depends on the message type. Unknown messages MUST be ignored.¶
An endpoint MUST send messages sequentially over a single stream when ordering is required. Messages MAY be combined into a single JSON object when ordering is not required.¶
The init
message indicates that the remainder of the stream contains an initialization segment.¶
{ init: { id: int } }¶
The media
message contains metadata about the next media segment in the stream.¶
{ segment: { init: int, timestamp: int, } }¶
init
message to arrive.¶
The priority
message informs middleware about the intended priority of the current stream. Any middleware MAY ignore this value but SHOULD forward it.¶
{ priority: { precedence: int, } }¶
Custom messages MUST start with x-
. Unicode LATIN SMALL LETTER X (U+0078) followed by HYPHEN-MINUS (U+002D).¶
Custom messages SHOULD use a unique prefix to reduce collisions. For example: x-twitch-load
would contain identification required to start playback of a Twitch stream.¶
This document has no IANA actions.¶