Internet-Draft | VBF | July 2022 |
Li | Expires 27 January 2023 | [Page] |
This document describes an RTP header extension used to convey decoding time information about video when Bi-directional predicted frames exist.It adds CompositionTime(CTS) as value so that receiver can decode video with correct sequence.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 27 January 2023.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
As video codec, H264/HEVC is widely used in RTP base system. Those codec support I-Frame, B-Frame, and P-frame . Most RTP systems do not support B-Frame, while B-Frame is widely used in streaming systems, with the rapid deploy of Real Time Communication(RTC) in low latency streaming scenario, support for Bi-directional predicted frames in RTP base system are necessary.¶
Video streams contain a lot of details, including timestamps, so a decoder knows how to handle the content properly. The DTS(DecodingTimeStamp) decides when a frame has to be decoded, while the PTS(PresentationTimeStamp) describes when a frame has to be presented.This difference becomes important when using B-frames, which are frames that can have references to frames in the past, but also to frames in the future. Given that, there will be frames in the future, which a decoder needs to decode first in order to use them as reference. Therefore, decoder needs DTS when B-frames exist, while, the RTP timestamp reflects the presentation time(PTS) only. This document specifies an RTP extension header that allows video rtp senders deliver CTS(CompositionTime) to rtp receiver .¶
The CTS value is PTS minus DTS. Therefore , the rtp receiver gets DTS value via RTP timestamp adding CTS value.¶
This new header extension uses the general mechanism for RTP header extensions as described in ([RFC5285])]. Rtp sender only needs to add CTS to the first rtp packet when the video frame contains several packets, which reduces overhead.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The general RTP payload format follows the RTP header format ([RFC3550]) and generic RTP header extensions ([RFC8285]), RTP header extension MAY encoded using the one-byte header or two-byte header as described in ([RFC8285]). The two-byte header format is used as an example in this memo.¶
The following RTP header extension is RECOMMENDED. The ID is assigned per ([RFC8285]), and format is shown below.¶
ID: extension id.¶
cts: PTS minus DTS and divide by 90 (Video Clock Rate)¶
The video sender here MAY be video client or middle box perform RTP switch. Video client MAY encode video with B-frame, it SHOULD add this rtp header extension in the rtp packetization module . Only adding in the first rtp packet is RECOMMENDED when the video frame contains multi rtp packets, which will reduce overhead. The middle box MAY perform RTMP or other streaming video protocols translate to rtp streams work, it SHOULD add this header extension when streaming video contains B-frame.¶
The video rtp receiver here is a client which decodes video . It SHOULD extract cts value when this extension exists , and calculate DTS value with rtp timestamp(PTS) and CTS.¶
DTS = PTS - CTS * 90¶
90 is video clock rate, Video receiver construction frame and put to jitter buffer, decoder MUST decode frame by DTS sequence, and video render module MUST render the decoded frame with PTS sequence, which come from rtp timestamp.¶
In practice, when receiver that decode video does not support B-frame, In order to successfully decode an incoming video stream, it is RECOMMENDED An RTP middle box discard B-frame when video rtp sender contains B-frame, the decoder at the Endpoint SHOULD add whether it support video B-frame capability in SDP payload format specific paramaters(a=fmtp), and follow the Offer/Answer procedure describe in ([RFC8285]).¶
The URI for declaring this header extension in an extmap attribute is "urn:ietf:params:rtp-hdrext:CompositionTime". It does not contain any extension attributes, It follows the standard mechanism described in ([RFC8285]) An example attribute line in SDP:¶
a=extmap:19 uri:ietf:rtc:rtp-hdrext:video:CompositionTime;¶
The security considerations of the RTP specification ([RFC3550]) and the general mechanism for RTP header extensions ([RFC8285]) apply. and all the security considerations of typologies ([RFC7667]) ([RFC7201]) for these two types of RTP intermediaries are applicable to this header extension.¶
Security considerations for SDP are described in the corresponding section in ([RFC8866]), In the Secure Real-time Transport Protocol (SRTP) ([RFC3711]), RTP header extensions are authenticated but not encrypted. When this header extension is used, cts are therefore visible on a frame-by-frame basis to an attacker passively observing the video stream, In scenarios where this is a concern, additional mechanisms MUST be used to protect the confidentiality of the header extension. This mechanism could be header extension encryption ([RFC6904]), or a lower-level security and authentication mechanism such as IPsec ([RFC4301]).¶
IANA has registered the following entry in the "RTP Compact Header Extensions" registry: Extension URI: uri:ietf:rtc:rtp-hdrext:video:CompositionTime Description: video B frame compositionTime Contact: lideping.byter@bytedance.com¶