Internet-Draft	VBF	July 2022
Li	Expires 27 January 2023	[Page]

Workgroup:: avtcore
Internet-Draft:: draft-deping-avtcore-video-bframe-00
Published:: 26 July 2022
Intended Status:: Standards Track
Expires:: 27 January 2023
Author:: D. Li

ByteDance

Video BFrame RTP Header Extension

Abstract

This document describes an RTP header extension used to convey decoding time information about video when Bi-directional predicted frames exist.It adds CompositionTime(CTS) as value so that receiver can decode video with correct sequence.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 27 January 2023.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

1. Introduction

As video codec, H264/HEVC is widely used in RTP base system. Those codec support I-Frame, B-Frame, and P-frame . Most RTP systems do not support B-Frame, while B-Frame is widely used in streaming systems, with the rapid deploy of Real Time Communication(RTC) in low latency streaming scenario, support for Bi-directional predicted frames in RTP base system are necessary.¶

Video streams contain a lot of details, including timestamps, so a decoder knows how to handle the content properly. The DTS(DecodingTimeStamp) decides when a frame has to be decoded, while the PTS(PresentationTimeStamp) describes when a frame has to be presented.This difference becomes important when using B-frames, which are frames that can have references to frames in the past, but also to frames in the future. Given that, there will be frames in the future, which a decoder needs to decode first in order to use them as reference. Therefore, decoder needs DTS when B-frames exist, while, the RTP timestamp reflects the presentation time(PTS) only. This document specifies an RTP extension header that allows video rtp senders deliver CTS(CompositionTime) to rtp receiver .¶

The CTS value is PTS minus DTS. Therefore , the rtp receiver gets DTS value via RTP timestamp adding CTS value.¶

This new header extension uses the general mechanism for RTP header extensions as described in ([RFC5285])]. Rtp sender only needs to add CTS to the first rtp packet when the video frame contains several packets, which reduces overhead.¶

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

RTP: Real-time Transport Protocol (RFC 3550)¶
RTCP: RTP Control Protocol (RFC 3550)¶
RTCP RR: RTCP Receiver Report¶
RTCP SR: RTCP Sender Report¶
SDP: Session Description Protocol (RFC 4566)¶
Clock Rate: The multiplier used to convert from a wallclock value in seconds to an equivalent RTP timestamp value (without the fixed random offset). Note that RFC 3550 uses various terms like "clock frequency", "media clock rate", "timestamp unit", "timestamp frequency", and "RTP timestamp clock rate" as synonymous to clock rate.¶
RTP Sender: A logical network element that sends RTP packets, sends RTCP SR packets, and receives RTCP reception report blocks.¶
RTP Receiver: A logical network element that receives RTP packets, receives RTCP SR packets, and sends RTCP reception report blocks.¶
RTC: Real Time Communication¶
PTS: Video Presentation TimeStamp¶
DTS: Video Decoding TimeStamp¶
CTS: Video CompositionTime¶

3. RTP header extension format

The general RTP payload format follows the RTP header format ([RFC3550]) and generic RTP header extensions ([RFC8285]), RTP header extension MAY encoded using the one-byte header or two-byte header as described in ([RFC8285]). The two-byte header format is used as an example in this memo.¶

The following RTP header extension is RECOMMENDED. The ID is assigned per ([RFC8285]), and format is shown below.¶

 0                   1                   2
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | Len=2 |              cts              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 1: extension format

ID: extension id.¶

cts: PTS minus DTS and divide by 90 (Video Clock Rate)¶

3.1. Video rtp sender

The video sender here MAY be video client or middle box perform RTP switch. Video client MAY encode video with B-frame, it SHOULD add this rtp header extension in the rtp packetization module . Only adding in the first rtp packet is RECOMMENDED when the video frame contains multi rtp packets, which will reduce overhead. The middle box MAY perform RTMP or other streaming video protocols translate to rtp streams work, it SHOULD add this header extension when streaming video contains B-frame.¶

3.2. Video rtp receiver

The video rtp receiver here is a client which decodes video . It SHOULD extract cts value when this extension exists , and calculate DTS value with rtp timestamp(PTS) and CTS.¶

DTS = PTS - CTS * 90

90 is video clock rate, Video receiver construction frame and put to jitter buffer, decoder MUST decode frame by DTS sequence, and video render module MUST render the decoded frame with PTS sequence, which come from rtp timestamp.¶

3.3. Usage considerations

In practice, when receiver that decode video does not support B-frame, In order to successfully decode an incoming video stream, it is RECOMMENDED An RTP middle box discard B-frame when video rtp sender contains B-frame, the decoder at the Endpoint SHOULD add whether it support video B-frame capability in SDP payload format specific paramaters(a=fmtp), and follow the Offer/Answer procedure describe in ([RFC8285]).¶

4. Session Description Protocol (SDP) Signaling

The URI for declaring this header extension in an extmap attribute is "urn:ietf:params:rtp-hdrext:CompositionTime". It does not contain any extension attributes, It follows the standard mechanism described in ([RFC8285]) An example attribute line in SDP:¶

a=extmap:19 uri:ietf:rtc:rtp-hdrext:video:CompositionTime;

5. Security Considerations

The security considerations of the RTP specification ([RFC3550]) and the general mechanism for RTP header extensions ([RFC8285]) apply. and all the security considerations of typologies ([RFC7667]) ([RFC7201]) for these two types of RTP intermediaries are applicable to this header extension.¶

Security considerations for SDP are described in the corresponding section in ([RFC8866]), In the Secure Real-time Transport Protocol (SRTP) ([RFC3711]), RTP header extensions are authenticated but not encrypted. When this header extension is used, cts are therefore visible on a frame-by-frame basis to an attacker passively observing the video stream, In scenarios where this is a concern, additional mechanisms MUST be used to protect the confidentiality of the header extension. This mechanism could be header extension encryption ([RFC6904]), or a lower-level security and authentication mechanism such as IPsec ([RFC4301]).¶

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC3550]: Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, <https://www.rfc-editor.org/info/rfc3550>.
[RFC3711]: Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, DOI 10.17487/RFC3711, March 2004, <https://www.rfc-editor.org/info/rfc3711>.
[RFC4301]: Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, December 2005, <https://www.rfc-editor.org/info/rfc4301>.
[RFC5285]: Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July 2008, <https://www.rfc-editor.org/info/rfc5285>.
[RFC6904]: Lennox, J., "Encryption of Header Extensions in the Secure Real-time Transport Protocol (SRTP)", RFC 6904, DOI 10.17487/RFC6904, April 2013, <https://www.rfc-editor.org/info/rfc6904>.
[RFC7201]: Westerlund, M. and C. Perkins, "Options for Securing RTP Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, <https://www.rfc-editor.org/info/rfc7201>.
[RFC7667]: Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, DOI 10.17487/RFC7667, November 2015, <https://www.rfc-editor.org/info/rfc7667>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8285]: Singer, D., Desineni, H., and R. Even, Ed., "A General Mechanism for RTP Header Extensions", RFC 8285, DOI 10.17487/RFC8285, October 2017, <https://www.rfc-editor.org/info/rfc8285>.
[RFC8866]: Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: Session Description Protocol", RFC 8866, DOI 10.17487/RFC8866, January 2021, <https://www.rfc-editor.org/info/rfc8866>.

Author's Address

Deping li

ByteDance

Email: lideping.byter@bytedance.com