TOC |
|
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 21, 2010.
Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
This document proposes a new generic session setup attribute to make it possible to negotiate different image attributes such as image size. A possible use case is to make it possible for a low-end hand-held terminal to display video without the need to rescale the image, something that may consume large amounts of memory and processing power. The draft also helps to maintain an optimal bitrate for video as only the image size that is desired by the receiver is transmitted.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.) [RFC2119].
1.
Introduction
2.
Conventions, Definitions and Acronyms
3.
Defintion of Attribute
3.1.
Requirements
3.2.
Attribute syntax
3.2.1.
Overall view of syntax
3.2.2.
Syntax description
3.3.
Considerations
3.3.1.
No imageattr in 1st offer
3.3.2.
Asymmetry
3.3.3.
sendonly and recvonly
3.3.4.
Sample aspect ratio
3.3.5.
SDPCapNeg support
3.3.6.
Interaction with codec parameters
3.3.7.
Change of display in middle of session
3.3.8.
Use with layered codecs
3.3.9.
Addition of parameters
4.
Examples
4.1.
Example 1
4.2.
Example 2
4.3.
Example 3
4.4.
Example 4
5.
IANA Considerations
6.
Security Considerations
7.
Acknowledgements
8.
Changes
9.
References
9.1.
Informative References
9.2.
Normative References
§
Authors' Addresses
TOC |
This document proposes a new attribute to make it possible to negotiate different image attributes such as image size. The term image size is defined here as it may differ from the physical screen size of for instance a hand-held terminal. As an example it may be beneficial to display a video image on a part of the physical screen and leave space on the screen for other features such as menus and other info.
There are a number of benefits with a possibility to negotiate the image size:
In cases where rescaling is not implemented (for example, rescaling is not mandatory to implement in H.264), the indication of the image attributes may still provide an optimal use of bandwidth because the attribute will anyway give the encoder a better indication about what image size is preferred and will thus help to avoid wasting bandwidth by encoding with an unnecessarily large resolution.
For implementers that are considering rescaling issues, it is worth notice note that there are several benefits to doing it on the sender side:
Several of the existing standards ([H.263], [H.264] and [MPEG-4]) have support for different resolutions at different framerates. The purpose of this document is to provide for a generic mechanism and is targeted mainly at the negotiation of the image size but to make it more general the attribute is named "imageattr".
The draft is limited to unicast scenarios in general and more specific poit-to-point communication. The attribute may be used in centralized conferencing scenarios as well but due to the abundance of configuration options it may then be difficult to come up with a configuration that fits all parties.
TOC |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
TOC |
This section defines the SDP image attribute "imageattr" that can be used in an SDP Offer/Answer exchange to indicate various image attribute parameters. In this document, we define the following image attribute parameters: image resolution, sample aspect ratio (sar), allowed range in picture aspect ratio and the preference of a given parameter set over another. The attribute is however extensible and guidelines for defining extensions are provided in Section 3.3.9 (Addition of parameters).
TOC |
The image attribute MUST meet the following requirements:
- REQ-1:
- Support the indication of one or more set(s) of image attributes that the SDP endpoint wish to receive or send. An image attribute set MUST include a specific image size.
- REQ-2:
- Support setup/negotiation of image attributes, meaning that each side in the Offer/Answer SHOULD be able to negotiate the image attributes if prefers to send and receive.
- REQ-3:
- Interoperate with codec specific parameters such as sprop-parameter-sets in H.264 or config in MPEG4.
- REQ-4:
- Make the attribute generic with as little codec specific details/tricks as possible in order to be codec agnostic.
Besides the above mentioned requirements, the requirement below MAY be applicable.
- OPT-1:
- The image attribute SHOULD support the description of image-related attributes for various types of media, including video, pictures, images, etc.
TOC |
In this section the syntax of the image attribute is described. The image attribute is a media attribute. The section is split up in two parts, the first gives an overall view of the syntax while the second describes how the syntax is used.
TOC |
The syntax for the image attribute is in ABNF (Crocker, D., Ed. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” October 2005.) [RFC4234]:
---- image-attr = "imageattr:" PT 1*2( 1*WSP ( "send" / "recv" ) 1*WSP attr-list ) PT = 1*DIGIT / "*" attr-list = ( set *(1*WSP set) ) / "*" ; see below for a definition of set. ----
The syntax for the set is given by:
---- set= "[" "x=" range "," "y=" range [ ",sar=" range ] [ ",par=" range ] [ ",q=" value ] "]" x is the horizontal image size range y is the vertical image size range sar (sample aspect ratio) is the sample aspect ratio associated with the set (optional and MAY be ignored) par (picture aspect ratio) is the allowed ratios between the displays x and y physical size (optional) q (optional with range [0.0..1.0], default value 0.5) is the preference for the given set, a higher value means higher preference from the sender point of view range is expressed in a few different formats 1) range= value a single value 2) range= "[" value1 ":" [ step ":" ] value2 "]" values between value1 and value2 inclusive, if step is omitted a stepsize of 1 is implied 3) range= "[" value 1*( "," value ) "]" any value from the list of values 4) range= "[" value1 "-" value2 "]" any real value between value1 and value2 inclusive value is a positive integer or real value step is a positive integer or real value If step is left out in the syntax a stepsize of 1 is implied Real values are only applicable for the sar, par and q parameters Note the use of brackets [..] if more that one value is specified. ----
Some further guidelines for the use of the attribute is given below:
---- par=[ratio_min-ratio_max] ----
TOC |
In the description of the syntax we here assume that Alice wish to setup a session with Bob and that Alice takes the first initiative. The syntactical white-space delimiters (1*WSP) and double-quotes are removed to make reading easier.
In the offer Alice provides with information for both the send and receive (recv) directions using syntax version 1. For the send direction Alice provides with a list that the answerer can select from. For the receive direction Alice may either specify a desired image size range right away or a * to instruct Bob to fill with a list of image size that Bob can support to send. Using the overall high level syntax the image attribute may then look like
---- a=imageattr:PT send attr-list recv attr-list ----
or
---- a=imageattr:PT send attr-list recv * ----
In the first alternative the recv direction may be a full list of desired image size formats. It may however (and most likely) just be a list with one alternative for the preferred x and y resolution.
If Bob supports an x and y resolution in the given x and y range the answer from Bob will look like:
---- a=imageattr:PT send attr-list recv attr-list ----
And the offer answer negotiation is done. Worth notice here is that the attr-list will likely be pruned in the answer. While it may contain many different alternatives in the offer it may in the end contain just one or two alternatives in the end.
If Bob does not support any x and y resolution in the given x and y range in attr-list or a * was given for the recv direction then he MUST either:
---- a=imageattr:PT recv attr-list send attr-list ----
---- a=imageattr:PT send attr-list recv attr-list ----
---- a=imageattr:PT recv attr-list ----
If the 1st offer (from Alice) already defines a desired image size for the recv direction the answerer can do one of the following:
---- a=imageattr:PT recv attr-list ----
TOC |
TOC |
A high end device (Alice) may not see any need for the image attribute as it most likely has the processing capacity to rescale incoming video and may therefore not include the attribute in the offer as it otherwise does not see any use for it. The answerer (Bob) MAY include imageattr in the answer. This has two implications:
TOC |
While the image attribute supports asymmetry there are some limitations to this. One important limitation is that the codec being used can only support up to a given maximum resolution for a given profile level.
As an example H.264 with profile level 1.2 does not support higher resolution than 352x288 (CIF). The offer/answer rules essentially gives that the same profile level must be used in both directions. This means that for an asymmetric scenario where Alice wants an image size of 580x360 and Bob wants 150x120 profile level 2.2 is needed in both directions even though profile level 1 would have been enough in one direction.
Currently, the only solution to this problem is to specify two unidirectional media descriptions. Note however that the asymmetry issue for the H.264 codec is solved in [RFC3984bis] (IETF, “RTP Payload Format for H.264 Video, http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-rfc3984bis/,” .).
TOC |
If the directional attributes a=sendonly or a=recvonly are given for a media, there is of course no need to specify the image attribute for both directions. Therefore one of directions in the attribute MAY be omitted. However it may be good to do the image attribute negotiation in both directions in case the session is updated for media in both directions at a later stage.
TOC |
The sar parameter in relation to the x and y pixel resolution deserves some extra discussion. Consider the offer from Alice to Bob (we set the recv direction aside for the moment):
---- a=imageattr:97 send [x=720,y=576,sar=1.1] ----
If the receiver display has square pixels the 720x576
image would need to be rescaled to for example 792x576 or 720x524 to
ensure a correct image aspect ratio. This in practice means that
rescaling would need to be performed on the receiver side, something
that is contrary to the spirit of this draft.
To avoid
this problem Alice MAY specify a range of values for the sar
parameter like:
---- a=imageattr:97 send [x=720,y=576,sar=[0.91,1.0,1.09,1.45]] ----
Meaning that Alice can encode with any of the mentioned sample aspect ratios, leaving to Bob to decide which one he prefers.
The response MUST NOT include the sar parameter if there is no acceptable value given.
TOC |
The image attribute can be used within the SDP Capability Negotiation [SDPCapNeg] (IETF, “SDP Capability Negotiation, http://tools.ietf.org/wg/mmusic/draft-ietf-mmusic-sdp-capability-negotiation,” .) framework and its use is then specified using the "a=acap" parameter. An example is
---- a=acap:1 imageattr:97 send [x=720,y=576,sar=[0.91,1.0,1.09,1.45]] ----
For use with SDP Media Capability Negotiation extension [SDPMedCapNeg] (IETF, “SDP media capabilities Negotiation, http://tools.ietf.org/wg/mmusic/draft-ietf-mmusic-sdp-media-capabilities,” .), where it is no longer possible to specify payload type numbers, it is possible to use the parameter substitution rule, an example of this is.
---- ... a=mcap:1 video H264/90000 a=acap:1 imageattr:%1% send [x=720,y=576,sar=[0.91,1.0,1.09,1.45]] ... ----
Where %1% maps to media capability number 1.
TOC |
As most codecs specifies some kind of indication of for example the image size already at session setup, some measures must be taken to avoid that the image attribute conflicts with this already existing information.
The following subsections describes the most well known codecs and how they define image-size related information.
TOC |
The payload format for H.263 is described in [RFC4629] (Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. Even, “RTP Payload Format for ITU-T Rec,” January 2007.).
H.263 defines (on the fmtp line) a list of image sizes and their maximum frame rates (profiles) that the offerer can receive. The answerer is not allowed to modify this list and must reject a payload type that contains an unsupported profile. The CUSTOM profile may be used for image size negotiation but support for asymmetry requires the specification of two unidirectional media descriptions using the sendonly/recvonly attributes.
TOC |
The payload format for H.264 is described in [RFC3984] (Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund, M., and D. Singer, “RTP Payload Format for H.264 Video,” February 2005.) and updated in [RFC3984bis] (IETF, “RTP Payload Format for H.264 Video, http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-rfc3984bis/,” .).
H.264 defines image size related information in the fmtp line by means of sprop-parameter-sets. According to the specification several sprop-parameter-sets may be defined for one payload type. The sprop-parameter-sets describe the image size (+ more) that the offerer sends in the stream and need not be complete. This means that this does not represent any negotiation. Moreover an answer is not allowed to change the sprop-parameter-sets.
This configuration may be changed later inband if for instance image sizes need to be changed or added.
TOC |
The payload format for MPEG-4 is described in [RFC3016] (Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. Kimata, “RTP Payload Format for MPEG-4 Audio/Visual Streams,” November 2000.).
MPEG-4 defines a config parameter on the fmtp line which is a hexadecimal representation of the MPEG-4 visual configuration information. This configuration does not represent any negotiation and the answer is not allowed to change the parameter.
Currently it is not possible to change the configuration using inband signaling.
TOC |
The subsections above clearly indicate that this kind of information must be aligned well with the image attribute to avoid conflicts. There are a number of possible solutions:
TOC |
A very likely scenario is that a user switches to another phone during a video telephony call or plugs the cellphone into an external monitor. In both cases it is very likely that a renegotiation is initiated using the SIP-REFER or SIP-UPDATE methods. It is RECOMMENDED to negotiate the image size during this renegotiation.
TOC |
As the image attribute is a media line attribute, its use with layered codecs cause some concern. If the layers are transported in different RTP streams the layers are specified on different media descriptions and the relation is specified using the grouping framework [GROUPING] (IETF, “The SDP Grouping Framework, http://tools.ietf.org/html/draft-ietf-mmusic-rfc3388bis-03,” .) and the depend attribute [RFC5583] (Schierl, T. and S. Wenger, “Signaling Media Decoding Dependency in the Session Description Protocol (SDP),” July 2009.). As it is not possible to specify only one image attribute for several media descriptions the solution is either to specify the same image attribute for each media description, or to only specify the image attribute for the base layer. [Ed. note, TBD].
TOC |
The image attribute opens up for the addition of parameters in the future. To make backwards adaptation possible; an entity that process the attribute MUST remove parameters that are not recognized before returning the attribute in the SDP answer. Addition of future parameters that are not understood by the receiving endpoint may lead to ambiguities if mutual dependencies between parameters exist, therefore addition of parameters must be done with great care.
TOC |
A few examples to highlight the syntax, here is assumed where needed that Alice initiates a session with Bob
TOC |
---- a=imageattr:97 send [x=800,y=640,sar=1.1,q=0.6] [x=480,y=320] \ recv [x=330,y=250] ----
Two image resolution alternatives are offered with 800x640 with sar=1.1 having the highest preference
The example also indicates that Alice wish to display video with a resolution of 330x250 on her display
In case Bob accepts the "recv [x=330,y=250]" the answer may look like
---- a=imageattr:97 recv [x=800,y=640,sar=1.1] \ send [x=330,y=250] ----
Indicating that the receiver (Bob) wish the encoder (on Alice's side) to compensate for a sample aspect ratio of 1.1 (11:10) and desires an image size on its screen of 800x640.
There is however a possibility that "recv [x=330,y=250]" is not supported. If the case, Bob may completely remove this part or replace it with a list of supported image sizes.
---- a=imageattr:97 recv [x=800,y=640,sar=1.1] \ send [x=[320:16:640],y=[240:16:480],par=[1.2-1.3]] ----
Alice can then select a valid image size which is closest to the one that was originally desired (336x256) and performs a second offer/answer
---- a=imageattr:97 send [x=800,y=640,sar=1.1] \ recv [x=336,y=256] ----
Bob replies with (actually not necessary):
---- a=imageattr:97 recv [x=800,y=640,sar=1.1] \ send [x=336,y=256] ----
TOC |
---- a=imageattr:97 \ send [x=[480:16:800],y=[320:16:640],par=[1.2-1.3],q=0.6] \ [x=[176:8:208],y=[144:8:176],par=[1.2-1.3]] \ recv * ----
Two image resolution sets are offered with the first having a higher preference (q=0.6). The x-axis resolution can take the values 480 to 800 in 16 pixels steps and 176 to 208 in 8 pixels steps. The par parameter limits the set of possible x and y screen resolution combinations such that 800x640 (ratio=1.25) is a valid combination while 720x608 (ratio=1.18) or 800x608 (ratio=1.31) are invalid combinations.
For the recv direction (Bob->Alice) Bob is requested to provide with a list of supported image sizes
TOC |
In this example is defined a complete SDP offer for the video media part
---- m=video 49154 RTP/AVP 99 a=rtpmap:99 H264/90000 a=fmtp:99 packetization-mode=0;profile-level-id=42e011; \ sprop-parameter-sets=Z0LgC5ZUCg/I,aM4BrFSAa a=imageattr:99 \ send [x=176,y=144] [x=224,y=176] [x=272,y=224] [x=320,y=240] \ recv [x=176,y=144] [x=224,y=176] [x=272,y=224,q=0.6] [x=320,y=240] ----
In the send direction, sprop-parameter-sets is defined for a resolution of 320x240 which is the largest image size offered in the send direction. This means that if 320x240 is selected, no additional offer/answer is necessary. In the receive direction four alternative image sizes are offered with 272x224 being the preferred choice.
The answer may look like:
---- m=video 49154 RTP/AVPF 99 a=rtpmap:99 H264/90000 a=fmtp:99 packetization-mode=0;profile-level-id=42e011; \ sprop-parameter-sets=Z0LgC5ZUCg/I,aM4BrFSAa a=imageattr:99 send [x=320,y=240] recv [x=320,y=240] ----
Indicating (in this example) that the image size is 320x240 in both directions. Although the offerer preferred 272x224 for the receive direction, the answerer might not be able to offer 272x224 or not allow encoding and decoding of video of different image sizes simultaneously. The answerer sets new sprop-parameter-sets, constructed for both send and receive directions at the restricted conditions and image size of 320x240.
TOC |
This example illustrates in more detail how compensation for different sample aspect ratios can be negotiated with the image attribute.
We setup a session between Alice and Bob, Alice is the offerer of the session. The offer (from Alice) contains the image attribute below:
---- a=imageattr:97 \ send [sar=[1.0-1.3],x=400:16:800],y=[320:16:640],par=[1.2-1.3]] \ recv [sar=1.1,x=800,y=600] ----
First we consider the recv direction: The offerer (Alice) explicitly states that she wish to receive the screen resolution 800x600, however she also indicates that the screen on her display does not use square pixels, the sar value=1.1 means that Bob must (preferably) compensate for this. So.. If Bob's video camera produces square pixels, and wish to satisfy Alice's sar requirement, the image processing algorithm must rescale a 880x600 pixel image (880=800*1.1) to 800x600 pixels (could be done other ways).
... and now the send direction: Alice indicates that she can (in the image processing algorithms) rescale the image for sample aspect ratios in the range 1.0 to 1.3. She can also provide with a number of different image sizes (in pixels) ranging from 400x320 to 800x640. Bob inspects the offered sar and image sizes and responds with the modified image attribute
---- a=imageattr:97 \ recv [sar=1.15,x=464,y=384] \ send [sar=1.1,x=800,y=600] ----
Alice will, in order to satisfy Bob's request, need to rescale the image from her video camera from 534x384 (534=464*1.15) to 464x384.
Neither part is required to rescale like this (sar MAY be ignored), the consequence will of course be a distorted image.
TOC |
Following the guidelines in [RFC4566] (Handley, M., Jacobson, V., and C. Perkins, “SDP: Session Description Protocol,” July 2006.), the IANA is requested to register one new SDP attribute:
This attribute defines the ability to negotiate various image attributes such as image sizes. The attribute contains a number of parameters which can be modified in and offer/answer exchange.
Note to RFC Editor: please replace "RFC XXXX" above with the RFC number of this memo, and remove this note.
TOC |
This draft does not add any additional security issues other than those already existing with currently specified offer/answer procedures.
TOC |
The authors would like to thank the people who has contributed with objections and suggestions to this draft and provided with valuable guidance in the amazing video-coding world. Special thanks go to Clinton Priddle, Roni Even, Randell Jesup, and Dan Wing.
TOC |
The main changes are:
- From WG -02 to WG -03
- Partial update based on review comments from Jean-Francois Mule
- From WG -01 to WG -02
- Added extra example that highlights the negotiation of sar
- From WG -00 to WG -01
- Added info about future addition of parameters and backwards compatibility
- Added IANA considerations
- From individual -02 to WG -00
- Cleanup of syntax, ABNF form
- Additional example
- From -01 to -02
- Cleanup of the sar and par parameters to make them match the established conventions
- Requirement specification added
- New bidirectional syntax
- Interoperability considerations with well known video codecs discussed
TOC |
TOC |
[GROUPING] | IETF, “The SDP Grouping Framework, http://tools.ietf.org/html/draft-ietf-mmusic-rfc3388bis-03.” |
[H.264] | ITU-T, “ITU-T Recommendation H.264, http://www.itu.int/rec/T-REC-H.264-200711-I/en.” |
[RFC3016] | Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. Kimata, “RTP Payload Format for MPEG-4 Audio/Visual Streams,” RFC 3016, November 2000 (TXT). |
[RFC3264] | Rosenberg, J. and H. Schulzrinne, “An Offer/Answer Model with Session Description Protocol (SDP),” RFC 3264, June 2002 (TXT). |
[RFC3984] | Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund, M., and D. Singer, “RTP Payload Format for H.264 Video,” RFC 3984, February 2005 (TXT). |
[RFC3984bis] | IETF, “RTP Payload Format for H.264 Video, http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-rfc3984bis/.” |
[RFC4234] | Crocker, D., Ed. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” RFC 4234, October 2005 (TXT, HTML, XML). |
[RFC4566] | Handley, M., Jacobson, V., and C. Perkins, “SDP: Session Description Protocol,” RFC 4566, July 2006 (TXT). |
[RFC4587] | Even, R., “RTP Payload Format for H.261 Video Streams,” RFC 4587, August 2006 (TXT). |
[RFC4629] | Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. Even, “RTP Payload Format for ITU-T Rec,” RFC 4629, January 2007 (TXT). |
[RFC5583] | Schierl, T. and S. Wenger, “Signaling Media Decoding Dependency in the Session Description Protocol (SDP),” RFC 5583, July 2009 (TXT). |
[S4-080144] | 3GPP, “Signaling of Image Size: Combining Flexibility and Low Cost, http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_48/Docs/S4-080144.zip.” |
[SDPCapNeg] | IETF, “SDP Capability Negotiation, http://tools.ietf.org/wg/mmusic/draft-ietf-mmusic-sdp-capability-negotiation.” |
[SDPMedCapNeg] | IETF, “SDP media capabilities Negotiation, http://tools.ietf.org/wg/mmusic/draft-ietf-mmusic-sdp-media-capabilities.” |
TOC |
[RFC2119] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
TOC |
Ingemar Johansson | |
Ericsson AB | |
Laboratoriegrand 11 | |
SE-971 28 Lulea_ | |
SWEDEN | |
Phone: | +46 73 0783289 |
Email: | ingemar.s.johansson@ericsson.com |
Kyunghun Jung | |
Samsung Electronics Co., Ltd. | |
Dong Suwon P.O. Box 105 | |
416, Maetan-3Dong, Yeongtong-gu | |
Suwon-city, Gyeonggi-do | |
Korea 442-600 | |
Phone: | +82 10 9909 4743 |
Email: | kyunghun.jung@samsung.com |