TOC 
AVT Working GroupD. Wing
Internet-DraftCisco
Intended status: Standards TrackJuly 14, 2008
Expires: January 15, 2009 


DTLS-SRTP Key Transport
draft-wing-avt-dtls-srtp-key-transport-02

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on January 15, 2009.

Abstract

The existing DTLS-SRTP specification allows SRTP keys to be established between a pair of SRTP endpoints. However, when there are more than two participants in an RTP session, DTLS-SRTP is unable to provide a single key for all of the participants. This existing limitation of DTLS-SRTP prevents deploying DTLS-SRTP in certain scenarios.

This document describes an extension to DTLS-SRTP, called Key Transport (KTR). This extension transports SRTP keying material from one DTLS-SRTP peer to another, so the same SRTP keying material can be used by multiple DTLS-SRTP peers. This extension eliminates the need to key each SRTP session individually, allowing cost-effective deployment of several DTLS-SRTP scenarios.



Table of Contents

1.  Introduction
2.  Notational Conventions
3.  Scenarios
    3.1.  Point to Multipoint using the RFC 3550 mixer model
    3.2.  Point to Multipoint using Multicast
    3.3.  Point to Multipoint Using Video Switching MCUs
    3.4.  Scaling to Large Groups
        3.4.1.  Rekeying SRTP Quickly
        3.4.2.  Distributed Key Servers
    3.5.  Interworking with Other SRTP Key Management Systems
        3.5.1.  Security Descriptions
4.  Protocol Description
    4.1.  key_transport (KTR) extension to DTLS-SRTP
    4.2.  KTR Primitives
    4.3.  Procedures for Network Elements
        4.3.1.  Speaker
        4.3.2.  Mixer
        4.3.3.  Switcher
        4.3.4.  Listener
    4.4.  Key Transport SSRC and RTP SSRC Collisions
    4.5.  Fragmentation, Reassembly, and Retransmission
    4.6.  SDP extensions
5.  Examples
6.  Security Considerations
    6.1.  Group Policy when Joining/Leaving
    6.2.  Two-Time Pad
7.  Acknowledgements
8.  IANA Considerations
9.  References
    9.1.  Normative References
    9.2.  Informational References
Appendix A.  Relationship with EKT
Appendix B.  Changes
    B.1.  Changes from -00 to -01
    B.2.  Changes from -01 to -02
§  Author's Address
§  Intellectual Property and Copyright Statements




 TOC 

1.  Introduction

When DTLS-SRTP (McGrew, D. and E. Rescorla, “Datagram Transport Layer Security (DTLS) Extension to Establish Keys for Secure Real-time Transport Protocol (SRTP),” February 2009.) [I‑D.ietf‑avt‑dtls‑srtp] establishes Secure RTP (Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, “The Secure Real-time Transport Protocol (SRTP),” March 2004.) [RFC3711] master keys, each peer contributes part of the keying material to derive the SRTP master key. In some scenarios it is desirable for one peer to change its SRTP key and to transmit SRTP packets using an SRTP key that was not derived from the DTLS key exchange. This allows one peer to significantly reduce cryptographic operations in many scenarios as described in detail in Section 3 (Scenarios).

The extension described in this document allows transporting an SRTP master key from one DTLS peer to the other. Thus, DTLS Key Transport differs from normal DTLS-SRTP in that the SRTP master key is not derived from the TLS handshake.



 TOC 

2.  Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).

A "listener" is an endpoint that only receives an SRTP stream. A "speaker" is an endpoint that only transmits an SRTP stream. And endpoint can be both a listener and a speaker.



 TOC 

3.  Scenarios

KTR allows mixers and video switchers to avoid having to encrypt each packet multiple times under multiple SRTP keys, by allowing a single SRTP key to be shared with the multiple recipients that are receiving the SRTP stream.

Several SRTP scenarios that benefit from KTR are described in the following sections, using terminology from RTP Topologies (Westerlund, M. and S. Wenger, “RTP Topologies,” January 2008.) [RFC5117].



 TOC 

3.1.  Point to Multipoint using the RFC 3550 mixer model

This RTP scenario is described in Section 3.4 of RTP Topologies (Westerlund, M. and S. Wenger, “RTP Topologies,” January 2008.) [RFC5117].

With DTLS-SRTP, this topology is computationally expensive for the video switcher because it has to encrypt the payload uniquely for each SRTP listener. Additionally, the architecture of a typical mixer requires each listener's SRTP to be encrypted serially, incurring additional delay for each successive listener. This is depicted below in Figure 1 (Point to Multipoint Mixer, without DTLS Key Transport).



     +-------key=F-------+
     |                   |
     V               +-------+         +------------+
+----+----+          |       +--key=C->+ listener 1 |
| speaker +--key=A-->+       |         +------------+
+---------+          |       |         +------------+
                     | mixer +--key=D->+ listener 2 |
+---------+          |       |         +------------+
| speaker +--key=B-->+       |         +------------+
+----+----+          |       +--key=E->+ listener 3 |
     ^               +---+---+         +------------+
     |                   |
     +-------key=G-------+

 Figure 1: Point to Multipoint Mixer, without DTLS Key Transport 

With KTR, the mixer can maintain one outbound SRTP cryptographic context, and encrypt the SRTP once for all listeners. This is depicted below in Figure 2 (Point to Multipoint Mixer, with DTLS Key Transport).



In the following figure, "=" indicates sessions where DTLS-SRTP Key Transport is used, and "-" indicates where only DTLS-SRTP is necessary. In this topology, only the listeners need support KTR so that the switcher and the listeners can benefit from KTR. In this scenario with DTLS-SRTP Key Transport, the mixer assumes an additional role -- group's key server -- and provides a common group SRTP key ("C") to all of the listeners. This group SRTP key is shared between all of the listeners. The two speakers, however, receive a unique stream (just as in the scenario above), but to prevent a two-time (padSection 6.2 (Two-Time Pad)), their content is encrypted using a different SRTP keys ("D" and "E").

     +=======key=D=======+
     |                   |
     V               +---+---+         +------------+
+---------+          |       +==key=C=>+ listener 1 |
| speaker +--key=A-->+       |         +------------+
+---------+          |       |         +------------+
                     | mixer +==key=C=>+ listener 2 |
+---------+          |       |         +------------+
| speaker +--key=B-->+       |         +------------+
+----+----+          |       +==key=C=>+ listener 3 |
     ^               +---+---+         +------------+
     |                   |
     +=======key=E=======+

 Figure 2: Point to Multipoint Mixer, with DTLS Key Transport 

The mixer is aware of listeners leaving or joining, and the mixer can rekey the remaining active listeners.



 TOC 

3.2.  Point to Multipoint using Multicast

This RTP topology is described in Section 3.2 of RTP Topologies (Westerlund, M. and S. Wenger, “RTP Topologies,” January 2008.) [RFC5117].

With DTLS-SRTP, this scenario is not attainable because each listener has a unique SRTP key. For this reason, [I‑D.ietf‑msec‑gdoi‑srtp] (Baugher, M., Rueegsegger, A., and S. Rowles, “GDOI Key Establishment for the SRTP Data Security Protocol,” December 2007.) was developed by the MSEC working group.

With KTR, this scenario is attainable because the same key can be provided to multiple listeners, as depicted below in Figure 3 (Point to Multipoint using Multicast with Key Transport). This compares favorably with [I‑D.ietf‑msec‑gdoi‑srtp] (Baugher, M., Rueegsegger, A., and S. Rowles, “GDOI Key Establishment for the SRTP Data Security Protocol,” December 2007.) when the group size is small enough that the speaker can perform key server functions (i.e., perform KTR) for all of the listeners.



                        +-------+            +------------+
                       /         \==key=A===>+ listener 1 |
                      /           \          +------------+
+---------+           | multicast |          +------------+
| speaker +==key=A===>+  network  +==key=A==>+ listener 2 |
+---------+           |           |          +------------+
                      \           /          +------------+
                       \         /===key=A==>+ listener 3 |
                        +-------+            +------------+

 Figure 3: Point to Multipoint using Multicast with Key Transport 



 TOC 

3.3.  Point to Multipoint Using Video Switching MCUs

This RTP topology is described in Section 3.5 of RTP Topologies (Westerlund, M. and S. Wenger, “RTP Topologies,” January 2008.) [RFC5117].

With DTLS-SRTP, this topology is computationally expensive for the video switcher because it has to encrypt the payload uniquely for each SRTP listener. Additionally, the architecture of a typical video switcher requires each listener's SRTP to be encrypted serially, incurring additional delay for each successive listener. This is depicted below in Figure 4 (Point to Multipoint Video Switching, without DTLS Key Transport).



In the following figure, KTR is used on all sessions and depicted by "=". In this scenario, both the speakers and listeners must support KTR so that the switcher and the listeners can benefit from KTR.

     +-------key=F-------+
     |                   |
     V               +---+------+         +------------+
+---------+          |          +==key=C=>+ listener 1 |
| speaker +==key=A==>+selected  |         +------------+
+---------+          |          |         +------------+
                     | switcher +==key=D=>+ listener 2 |
+---------+          |          |         +------------+
| speaker +==key=B==>+dropped   |         +------------+
+----+----+          |          +==key=E=>+ listener 3 |
     ^               +---+------+         +------------+
     |                   |
     +-------key=G-------+

 Figure 4: Point to Multipoint Video Switching, without DTLS Key Transport 

With DTLS key transport, this becomes easier; in fact, the video switcher doesn't need to decrypt the SRTP at all, but just make its decision (select the stream or drop the stream) and transmit the SRTP packets to the listeners. This is depicted below in Figure 5 (Point to Multipoint Video Switching, with DTLS Key Transport).



     +-------key=B-------+
     |                   |
     V               +---+------+         +------------+
+----+----+          |          +==key=A=>+ listener 1 |
| speaker +==key=A==>+selected  |         +------------+
+---------+          |          |         +------------+
                     | switcher +==key=A=>+ listener 2 |
+---------+          |          |         +------------+
| speaker +==key=B==>+prev.spkr |         +------------+
+---------+          |          +==key=A=>+ listener 3 |
     ^               +----------+         +------------+
     |                   |
     +-------key=A-------+

 Figure 5: Point to Multipoint Video Switching, with DTLS Key Transport 

The video switcher is aware of listeners leaving or joining. The protocol described in this document allows the switcher to dictate, to the speaker, that the speaker use a new encryption key. This allows the switcher to enforce security, based on the switcher's policy (Section 6.1 (Group Policy when Joining/Leaving)). This is done by the video switcher sending a DTLS "your_new_srtp_key" message. The speaker will respond with a DTLS "new_srtp_key" message which echos the same key. The "new_srtp_key" message is relayed, by the switcher, to each of the active listeners.

When there are multiple speakers, as shown in Figure 5 (Point to Multipoint Video Switching, with DTLS Key Transport) above, each speaker transmits with his own SRTP key. That SRTP key is derived from the DTLS handshake with the switcher. Each speaker uses KTR to signal the SSRC that it will use.



 TOC 

3.4.  Scaling to Large Groups

This section describes how DTLS-SRTP-Key-Transport supports large groups of listeners, both for unicast and multicast scenarios.



 TOC 

3.4.1.  Rekeying SRTP Quickly

When a new listener is added, or an existing listener is removed, a new SRTP master key is necessary to retain the security of the SRTP media. Normally this causes "n" cryptographic operations for "n" listeners. These cryptographic operations take time, and if the group is large enough or the processor slow enough, there can be a considerable delay before all listeners receive the new SRTP key (and can decrypt the stream).

A solution to the problem is to use a subset difference based key management scheme [I‑D.irtf‑smug‑subsetdifference] (Lotspiech, J., Naor, M., and D. Naor, “Subset-Difference based Key Management for Secure Multicast,” .). In this scheme, the key server (the speaker) can send a message so that every authorized listener -- but no unauthorized listeners -- can decrypt the message. The message contains the new SRTP key. The advantage of this scheme is that subset difference allows the message to be encrypted just once, no matter how many listeners there are.

An implementation of subset-difference based key management is Logical Key Heirarchy (LKH) (Wallner, D., Harder, E., and R. Agee, “Key Management for Multicast: Issues and Architectures,” June 1999.) [RFC2627]), which is useful for unicast and multicast. LKH is supported by primitives defined in this document, and the LKH "NET KEY" is communicated using the KTR primitive "LKH_NET_KEY".



 TOC 

3.4.2.  Distributed Key Servers

Another problem with all group scenarios is that because each listener establishes a DTLS-SRTP session with the speaker, only a finite number of listeners can be supported (the speaker cannot handle millions of DTLS-SRTP sessions). This is especially problematic for multicast, but is also a problem for "large" groups.

One workaround to the problem is distributing the DTLS-SRTP keying to other devices in the network. In this scheme, one key server is responsible for a sensible number of listeners and has sufficient CPU power to update those listeners with new SRTP master keys. This is done with a new SDP attribute, dtls-srtp-ktr-server, which indicates the IP address and port of DTLS-SRTP server associated with the media line.

There would need to be some communication between the KTR servers to communicate new SRTP keys to the listeners. This communication is for future study.



 TOC 

3.5.  Interworking with Other SRTP Key Management Systems



 TOC 

3.5.1.  Security Descriptions

Today, Security Descriptions (Andreasen, F., Baugher, M., and D. Wing, “Session Description Protocol (SDP) Security Descriptions for Media Streams,” July 2006.) [RFC4568] is used for distributing SRTP keys in several different IP PBX systems and is expected to be used by 3GPP's Long Term Evolution (LTE). The IP PBX systems are typically used within a single enterprise, and LTE is used within the confines of a mobile operator's network. A Session Border Controller is a reasonable solution to interwork between Security Descriptions (inside the enterprise or mobile operator) and DTLS-SRTP (outside the enterprise), and would be placed at the edge of the enterprise network or the edge of the mobile operator's network.

However, due to the way Security Descriptions and DTLS-SRTP manage their SRTP keys, such an SBC has to authenticate, decrypt, re-encrypt, and re-authenticate the SRTP (and SRTCP) packets in one direction, as shown in Figure 6 (Interworking Security Descriptions and DTLS-SRTP), below. This is not desirable as it increases the cost of this SBC.



RFC4568 endpoint             SBC               DTLS-SRTP endpoint
       |                      |                       |
  1.   |---key=A------------->|                       |
  2.   |                      |<-DTLS-SRTP handshake->|
  3.   |<--key=B--------------|                       |
  4.   |                      |<--SRTP, encrypted w/B-|
  5.   |<-SRTP, encrypted w/B-|                       |
  6.   |-SRTP, encrypted w/A->|                       |
  7.   |            (decrypt, re-encrypt)             |
  8.   |                      |-SRTP, encrypted w/C-->|
       |                      |                       |

 Figure 6: Interworking Security Descriptions and DTLS-SRTP 

The message flow is as follows (similar steps occur with SRTCP):

  1. The Security Descriptions (Andreasen, F., Baugher, M., and D. Wing, “Session Description Protocol (SDP) Security Descriptions for Media Streams,” July 2006.) [RFC4568] endpoint discloses its SRTP key to the SBC, using a=crypto in its SDP.
  2. SBC completes DTLS-SRTP handshake. From this handshake, the SBC derives the SRTP key for traffic from the DTLS-SRTP endpoint (key B) and to the DTLS-SRTP endpoint (key C).
  3. The SBC communicates the SRTP encryption key (key B) to the Security Descriptions endpoint (using a=crypto). (There is no way, with DTLS-SRTP, to communicate the Security Descriptions key to the DTLS-SRTP key endpoint.)
  4. The DTLS-SRTP endpoint sends an SRTP key, encrypted with its key B. This is received by the SBC.
  5. The received SRTP packet is simply forwarded; the SBC does not need to do anything with this packet as its key (key B) was already communicated in step 3.
  6. The Security Descriptions endpoint sends an SRTP packet, encrypted with its key A.
  7. The SBC has to authenticate and decrypt the SRTP packet (using key A), and re-encrypt it and generate an HMAC (using key C).
  8. The SBC sends the new SRTP packet.

KTR can help avoid the computaionally-expensive operation so the SBC does not need not perform any per-packet operations on the SRTP (or SRTCP) packets in either direction. With KTR the SBC can simply forward the SRTP (and SRTCP) packets in both directions without per-packet HMAC or cryptographic operations.

To accomplish this, KTR must be supported on the DTLS-SRTP endpoint, which allows the SBC to transport the Security Description key to the KTR endpoint and send the DTLS-SRTP key to the Security Descriptions endpoint. This works equally well for both incoming and outgoing calls. An abbreviated message flow is shown in Figure 7 (Interworking Security Descriptions and KTR), below.



RFC4568 endpoint             SBC               DTLS-SRTP endpoint
       |                      |                       |
  1.   |---key=A------------->|                       |
  2.   |                      |<-DTLS-SRTP handshake->|
  3.   |<--key=B--------------|                       |
  4.   |                      |--new_srtp_key:A------>|
  5.   |                      |<--SRTP, encrypted w/B-|
  5.   |<-SRTP, encrypted w/B-|                       |
  6.   |-SRTP, encrypted w/A->|                       |
  7.   |                      |-SRTP, encrypted w/A-->|
       |                      |                       |

 Figure 7: Interworking Security Descriptions and KTR 

The message flow is as follows (similar steps occur with SRTCP):

  1. Security Descriptions endpoint discloses its SRTP key to the SBC (a=crypto).
  2. SBC completes DTLS-SRTP handshake. From this handshake, the SBC derives the SRTP key for traffic from the DTLS-SRTP endpoint (key B) and to the DTLS-SRTP endpoint (key C).
  3. The SBC communicates the SRTP encryption key (key B) to the Security Descriptions endpoint.
  4. The SBC uses the KTR to indicate the key (key A) the SBC will encrypt packets with key A to the DTLS-SRTP endpoint.
  5. The DTLS-SRTP endpoint sends an SRTP key, encrypted with its key B. This is received by the SBC.
  6. The received SRTP packet is simply forwarded; the SBC does not need to do anything with this packet as its key (key B) was communicated in step 3.
  7. The Security Descriptions endpoint sends an SRTP packet, encrypted with its key A.
  8. The received SRTP packet is simply forwarded; the SBC does not need to do anything with this packet as its key (key A) was communicated in step 4.


 TOC 

4.  Protocol Description

This section describes the extension to the DTLS protocol for KTR, which allows securely communicating the SRTP key to the DTLS peer.



 TOC 

4.1.  key_transport (KTR) extension to DTLS-SRTP

This document adds a new negotiated extension called "key_transport", which MUST only be requested in conjunction with the "use_srtp" extension (Section 3.2 of [I‑D.ietf‑avt‑dtls‑srtp] (McGrew, D. and E. Rescorla, “Datagram Transport Layer Security (DTLS) Extension to Establish Keys for Secure Real-time Transport Protocol (SRTP),” February 2009.)). The DTLS server indicates its support for key_transport by including key_transport in its ServerHello message. If a DTLS client includes key_transport in its ClientHello, but does not receive key_transport in the ServerHello, the DTLS client MUST NOT send DTLS packets with the srtp_key_transport content-type.

Support for the DTLS Key Transport extension is indicated in SDP with the "srtp-kt" attribute. Advertising support for the extension is necessary in SDP because in some cases it is required to establish an SRTP call. For example, a mixer may be able to only support SRTP listeners if those listeners implement DTLS Key Transport (because it lacks the CPU cycles necessary to encrypt SRTP uniquely for each listener).



A message flow showing a DTLS client and DTLS server using the key_transport extension

Client                                               Server

ClientHello + use_srtp + key_transport
                             -------->
                     ServerHello + use_srtp + key_transport
                                               Certificate*
                                         ServerKeyExchange*
                                        CertificateRequest*
                             <--------      ServerHelloDone
Certificate*
ClientKeyExchange
CertificateVerify*
[ChangeCipherSpec]
Finished                     -------->
                                         [ChangeCipherSpec]
                             <--------             Finished
SRTP packets                 <------->      SRTP packets

 Figure 8: Handshake Message Flow 

After successful negotiation of the key_transport extension, the DTLS client and server MAY exchange SRTP packets, encrypted using the KDF described in [I‑D.ietf‑avt‑dtls‑srtp] (McGrew, D. and E. Rescorla, “Datagram Transport Layer Security (DTLS) Extension to Establish Keys for Secure Real-time Transport Protocol (SRTP),” February 2009.). This is normal and expected, even if Key Transport was negotiated by both sides, as neither side may (yet) have a need to alter the SRTP key. However, it is also possible that one (or both) peers will immediately send a new_srtp_key message before sending any SRTP.



 TOC 

4.2.  KTR Primitives

A new protocol is defined, called the srtp_key_transport protocol which uses srtp_key_transport content-type which consists of the following message types (primitives):

new_srtp_key_request:
request that the DTLS peer choose a new key. Valid responses are new_srtp_key and new_srtp_key_error.
your_new_srtp_key:
Dictates a new SRTP key for the peer to use when the peer transmits its SRTP packets.
new_srtp_key:
contains the new SRTP keying material, the master key, master salt, SSRC, rollover counter, and sequence number. This message is sent by a DTLS endpoint that wants to change its SRTP key beginning at the indicated sequence number. This does not change any cryptographic parameters (a new DTLS handshake is necessary for that), but only the DTLS key for the associated SRTP session. This message includes the SSRC that will be used for this key, which allows listeners to establish one SRTP crypto-context per speaker (necessary for the video switching scenario). The key chosen MUST be cryptographically random [RFC4086] (Eastlake, D., Schiller, J., and S. Crocker, “Randomness Requirements for Security,” June 2005.). This master keying material is processed by the standard SRTP key deriviation function (Section 4.3.1 of SRTP (Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, “The Secure Real-time Transport Protocol (SRTP),” March 2004.) [RFC3711]) to provide the session keys.
new_key_activate:
indicates receiver is prepared to receive SRTP packets encrypted with the new key.
lkh_net_key
The Logical Key Hierarchy NET KEY.
new_srtp_key_failure:
indicates a failure.

At any time, the DTLS client or DTLS server MAY send a key_transport message, as shown in Figure 9 (New Key Message Flow). The sender of the new_srtp_key message MAY immediately start transmitting SRTP packets with this new key. However, to account for loss of the new_srtp_key message it is RECOMMENDED that the sender wait before changing to the new SRTP key until it receives the new_key_activate message or it times out waiting for the new_key_activate_message. The duration of this timeout may vary depending on the sensitivity of the content (e.g., 1 second or 10 seconds). In any case, the new_srtp_key message is retransmitted until acknowledged by receipt of a new_key_activate message.



Client / Server                             Server / Client

[new_srtp_key_request]        -------->
                             <--------         new_srtp_key
new_srtp_key_activiate        -------->

 Figure 9: New Key Message Flow 

The following figure shows the state machine for the protocol.



      receive new_srtp_key_request from peer
         or decide to choose new SRTP key
                     |
                     |
send                 V
new_srtp_key  +---------------+    timeout
    +---------| Communicate   |--------+
    |         |     Key       |        |
    +-------->|               |        |
              +---------------+        |
                |           ^          |
     receive    |           |   +----------------+
new_key_activate|           +---| send SRTP using|
                |               |  new SRTP key  |
        +----------------+      +----------------+
        | send SRTP using|
        |  new SRTP key  |
        +----------------+
                |
                V
               done

 Figure 10: Key Transport protocol state machine 



Using the syntax described in TLS (Dierks, T. and E. Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.1,” April 2006.) [RFC4346], the following structures are used:

enum {
   new_srtp_key_request(0),
   your_new_srtp_key(1),
   new_srtp_key(2),
   new_srtp_key_activate(3),
   lkh_net_key(4),
   new_srtp_key_failure(128),
   (255)
} SRTPKeyTransportType;

struct {
   SRTPKeyTransportType keytrans_type;
   uint24 length;
   uint16 message_seq;
   uint24 fragment_offset;
   uint24 fragment_length;
   select (SRTPKeyTransportType) {
      case new_srtp_key_request:         NewSRTPKeyRequest;
      case your_new_srtp_key:            YourNewSRTPKey;
      case new_srtp_key:                 NewSRTPKey;
      case new_srtp_key_activate:        NewSRTPKeyActivate;
      case lkh_net_key:                  LKHNetKey;
      case new_srtp_key_failure:         NewSRTPKeyFailure;
    };
} KeyTransport;

struct {
    uint  random<64>;           // additional entropy for peer
} NewSRTPKeyRequest;

struct {
    boolean any_ssrc;           // true=this key applies to any SSRC
    uint32 ssrc;                // SSRC used for this key.
    uint   key<16..32>;         // change_cipher_spec decides
    uint   auth_tag<4..10>;     // the key and auth_tag length.
    uint   salt<112>;
    uint   roc<32>;
    uint   sequence<16>;
    uint   random<64>;          // random provides additional entropy
                                // for peer
} NewSRTPKey;

struct {
    uint  random<64>;           // additional entropy for peer
} NewSRTPKeyActivate;

struct {
    uint  lkhnetkeylength;      // length in bits, divided by 8
    uint  lkhnetkey<128..1024>; // LKH NET KEY
} LKHNetKey;

struct { } NewSRTPKeyFailure;

 Figure 11: Data Structures 



 TOC 

4.3.  Procedures for Network Elements

A 'speaker' is an endpoint that terminates the DTLS-SRTP exchange and also sends SRTP data towards its peer(s). This is usually indicated by 'sendrecv' or 'sendonly'.

A 'listener' is an endpoint that terminates the DTLS-SRTP exchange and also receives SRTP data from its peer(s). This is usually indicated by 'sendrecv' or 'recvonly'.

As the Key Transport extension was negotiated during the DTLS-SRTP handshake, an endpoint can send Key Transport primitives, and can become a speaker or become a listener, at any point.



 TOC 

4.3.1.  Speaker

When a new speaker joins, the speaker can immediately send SRTP using the key derived from the DTLS-SRTP handshake. There is no scaling advantage to all of the speakers using the same key (because their content is different), and if the speakers did use the same key it would also introduce the risk of a two-time pad.

Once a speaker begins sending SRTP packets using a key communicated via KTR (i.e., the NewSRTPKey primitive), the speaker MUST NOT revert to using the SRTP key derived from the DTLS-SRTP handshake.

If the speaker wants to use KTR, or is requested to change its SRTP key (via the NewSRTPKeyRequest primitive), the speaker chooses a new SRTP master key and salt, and chooses a sequence number a reasonable distance in the future (1 second is recommended). The speaker then sends this new key using the NewSRTPKey primitive. The NewSRTPKey primitive message is re-transmitted until acknowledged with a NewKeyActivate message. No matter if a NewKeyActiviate is received or not, the speaker changes keys at its previously-chosen sequence number.

continue SRTP key may be determined via DTLS-SRTP or by a KTR primitives. In either case, the speaker's SRTP key and SSRC is communicated, to each peer.



 TOC 

4.3.2.  Mixer

When a new speaker joins a mixer, the speaker does not need to support KTR, and no KTR procedures need to occur with the speaker. This is because the listener needs to decrypt and examine the speaker's stream, and the mixer will mix, re-originate (with its own SSRC) and re-encrypt the speaker's stream to the listeners.

The mixer functions as a speaker (Section 4.3.1 (Speaker)) towards the listeners connected to the mixer.

When a speaker leaves, there is no need to propagate that information beyond the mixer.

When a listener joins or leaves, the mixer MUST rekey all of the listeners based on the conference policy (Section 6.1 (Group Policy when Joining/Leaving)).



 TOC 

4.3.3.  Switcher

When a new speaker joins, the switcher communicates the speaker's key to all listeners using the NewSRTPKey primitive. In this way, whenever one of the speakers becomes the active speaker, the active speaker's SRTP can be immediately sent to all listeners.

In the event there are a large number of (potentially active) speakers and it is not feasible to inform all listeners of all speaker's keys, the switcher MAY decide to defer informing listeneners of a speaker's key until the speaker becomes the active speaker. This can cause some clipping when a speaker becomes the active speaker.



 TOC 

4.3.4.  Listener

When a listener joins, the listener is provided the same SRTP master key as the other listeners. This is done with the NewSRTPKey primitive. SRTP master keys are associated with both an SSRC and the RTP sequence number. A single SRTP stream might have multiple keys active at any point in time, such as when other listeners are joining or leaving. For example, two NewSRTPKey primitives can indicate that for a single SSRC value, key "A" is for sequence numbers 100-200, and key B is for 200-300.

If a listener is also a speaker, it also follows the rules of a speaker.

A listener can receive an SRTP packet with an unknown SSRC which could caused by either:

the listener can attempt to authenticate the packet using the most-recently-used SRTP master key, which helps in the first case. If the second case has occurred, the listener can only wait until the sender (the speaker, the mixer, or the switcher) sends a NewSRTPKey primitive.



 TOC 

4.4.  Key Transport SSRC and RTP SSRC Collisions

Per [RFC3550] (Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications,” July 2003.), if an RTP source notices an RTP SSRC collision, it is required to change its SSRC. If it has negotiated support for KTR, it then MUST also send a NewSRTPKey message indicating the new SSRC. The communication of the new SSRC is necessary if there are multiple speakers in the video switching scenario. However, because a speaker is not able to determine if their audio or their video is being switched, a speaker MUST always indicate a change in SSRC by following the procedure in this section for any SRTP stream (audio, video, or other).

When this is done, in order to prevent clipping in listeners, it is RECOMMENDED that the speaker retain the same SRTP master key and salt.



 TOC 

4.5.  Fragmentation, Reassembly, and Retransmission

Much like the DTLS handshake itself, the KTR extension also needs to handle fragmentation and reassembly (to send a large key) and retransmission (to account for packet loss). This is to allow communicating SRTP keys which are longer than the network MTU. The same technique as DTLS's handshake are used to provide this function: message_seq, fragment_offset, and fragment_length.

When transmitting the key transport message, the sender divides the message into a series of N contiguous data ranges. These ranges MUST NOT be larger than the maximum handshake fragment size and MUST jointly contain the entire key transport message. The ranges SHOULD NOT overlap. The sender then creates N key transport messages, all with the same message_seq value as the original key transport message. Each new message is labelled with the fragment_offset (the number of bytes contained in previous fragments) and the fragment_length (the length of this fragment). The length field in all messages is the same as the length field of the original message. An unfragmented message is a degenerate case with fragment_offset=0 and fragment_length=length.

When a DTLS implementation receives a key transport message fragment, it MUST buffer it until it has the entire key transport message. DTLS implementations MUST be able to handle overlapping fragment ranges. This allows senders to retransmit key transport messages with smaller fragment sizes during path MTU discovery.



 TOC 

4.6.  SDP extensions

Two new SDP attributes are defined, dtls-srtp-ktr and dtls-srtp-ktr-server. dtls-srtp-ktr merely indicates the endpoint is capable of DTLS-SRTP-KTR, and is helpful to diagnose interoperability issues. dtls-srtp-ktr-server causes the DTLS handshake to occur with a different host than that indicated by the c/m lines, which is useful to help offload computational effort from the speaker (Section 3.4.2 (Distributed Key Servers)). Either attribute can appear at the media level or session level.

The ABNF (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.) [RFC5234] for new SDP (Handley, M., Jacobson, V., and C. Perkins, “SDP: Session Description Protocol,” July 2006.) [RFC4566] attributes is as follows:

  ktr-server  = "dtls-srtp-ktr-server:" port
                [space nettype space addrtype
                 space connection-address]
  ktr-capable = "dtls-srtp-ktr"

Only the port is required; if the nettype is not indicated, the network type, address type, and connection-address are all the same as on the associated c= line.



 TOC 

5.  Examples



The following example shows how Key Transport would be requested in an offer, using "a=dtls-srtp-kt".

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      c=IN IP4 192.0.2.1
      t=0 0
      m=audio 53456 UDP/TLS/RTP/SAVP 0
      a=fingerprint:SHA-1 \
        4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      a=dtls-srtp-ktr

 Figure 12: Simple SDP offer showing Key Transport is required 



Using the SDP syntax described in [I‑D.ietf‑mmusic‑sdp‑capability‑negotiation] (Andreasen, F., “SDP Capability Negotiation,” March 2010.), the following figure shows an offerer that requires DTLS Key Transport in order to set up this call as an SRTP call, otherwise it can set up this call as an RTP call. This is indicated by the ",2" on the "a=pcfg" line. If the answerer does not understand "a=dtls-srtp-kt" but does understand DTLS-SRTP and [I‑D.ietf‑mmusic‑sdp‑capability‑negotiation] (Andreasen, F., “SDP Capability Negotiation,” March 2010.), this can cannot be established using DTLS-SRTP; however, it can be established using RTP.

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      c=IN IP4 192.0.2.1
      t=0 0
      m=audio 53456 RTP/AVP 0
      a=tcap:1 UDP/TLS/RTP/SAVP
      a=acap:1 fingerprint:SHA-1 \
        4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      a=acap:2 dtls-srtp-ktr
      a=pcfg:1 t=1 a=1,2

 Figure 13: Example SDP offer showing Key Transport is required 



Using the SDP syntax described in [I‑D.ietf‑mmusic‑sdp‑capability‑negotiation] (Andreasen, F., “SDP Capability Negotiation,” March 2010.), the following figure shows an offerer that indicates support for DTLS Key Transport but does not require DTLS Key Transport in order to set up this call as an SRTP call. This is indicated by the ",[2]" on the "a=pcfg" line. If the answerer does not understand "a=dtls-srtp-kt" but does understand DTLS-SRTP and [I‑D.ietf‑mmusic‑sdp‑capability‑negotiation] (Andreasen, F., “SDP Capability Negotiation,” March 2010.), this call can still be established using DTLS-SRTP.

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      c=IN IP4 192.0.2.1
      t=0 0
      m=audio 53456 RTP/AVP 0
      a=tcap:1 UDP/TLS/RTP/SAVP
      a=acap:1 fingerprint:SHA-1 \
          4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      a=acap:2 dtls-srtp-ktr
      a=pcfg:1 t=1 a=1,[2]

 Figure 14: Example SDP offer showing Key Transport is optional 



The following example shows a Key Transport offer where the DTLS-SRTP-KTR exchange occurs with another server.

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      c=IN IP4 192.0.2.1
      t=0 0
      m=audio 53456 UDP/TLS/RTP/SAVP 0
      a=fingerprint:SHA-1 \
        4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      a=dtls-srtp-ktr
      a=dtls-srtp-ktr-server:37382 IN IP4 192.0.2.2

 Figure 15: Example showing alternate key server 



 TOC 

6.  Security Considerations

In the point-to-multipoint scenario, Section 3.1 (Point to Multipoint using the RFC 3550 mixer model), all of the listeners know the key being used by the mixer. Any of those listeners could create SRTP packets that are encrypted with this same key, and send those SRTP packets to other listeners. In order to reduce the vulnerability to this threat, it is RECOMMENDED that the source transport address of received SRTP packets be discarded if they do not match the source transport address of the associated DTLS-SRTP session. Additionally, the network SHOULD prevent IP address spoofing [RFC2827] (Ferguson, P. and D. Senie, “Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing,” May 2000.).



 TOC 

6.1.  Group Policy when Joining/Leaving

When sharing SRTP keys with several listeners, it is imperative that the SRTP is changed when a new listener is added or a listener is removed. This is because a legitimate listener should only be able to decrypt the SRTP stream while he is listening; he should not be able to decrypt the SRTP stream prior to joining the conference or after leaving the conference. Failing to change the key when a listener joins (or leaves) allows a listener to decrypt SRTP traffic prior to (or after) they are authorized participants in the conference.

Policies for a specific user's access to a conference may be different from conference to conference. For example, a company-internal event announcing promotions might be accessible to all employees and have no need for re-keying when listeners join or leave the conference. As another example, a conference where a job candidate is interviewed should be rekeyed when the job candidate joins the conference and again when the job candidate leaves the conference.

The protocol described in this document allows whichever policy is needed for a particular situation. The protocol itself does not enforce a certain policy; that is, the protocol itself does not ensure the SRTP key is changed when a listener leaves (or joins) the conference.

The RTP sequence number in the NewSRTPKey primitive allows the old key to be used for a predictable period of time before switching to the new key. This can provide sufficient time for all listeners to learn the new SRTP key before the sender switches to the new key.



 TOC 

6.2.  Two-Time Pad

[[expand this section.]]

In some scenarios, different data is sent to different participants. For example, in the audio mixer scenario, the active speaker receives a different stream than the other listeners; the active speaker's stream does not contain the active speaker's own input. It is critical that the same SRTP key is not used for the different content, or else a (so-called) "two-time pad" occurs (Section 9.1 of [RFC3711] (Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, “The Secure Real-time Transport Protocol (SRTP),” March 2004.)).

The same SRTP key MUST NOT be used to send different data.



 TOC 

7.  Acknowledgements

Thanks to David McGrew for his improvements to this document and to the underlying protocol. Thanks to Brian Weis, Sheela Rowles, and Mark Baugher for suggesting how GDOI-SRTP's key management could be used by DTLS-SRTP.

Thanks to Flemming Andreasen for the reminder regarding two-time pads, to John Floroiu for reminder of salting key.



 TOC 

8.  IANA Considerations

[[This section will be completed in a future version of this document.]]

To do:



 TOC 

9.  References



 TOC 

9.1. Normative References

[I-D.ietf-avt-dtls-srtp] McGrew, D. and E. Rescorla, “Datagram Transport Layer Security (DTLS) Extension to Establish Keys for Secure Real-time Transport Protocol (SRTP),” draft-ietf-avt-dtls-srtp-07 (work in progress), February 2009 (TXT).
[I-D.ietf-mmusic-sdp-capability-negotiation] Andreasen, F., “SDP Capability Negotiation,” draft-ietf-mmusic-sdp-capability-negotiation-13 (work in progress), March 2010 (TXT).
[RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML).
[RFC2827] Ferguson, P. and D. Senie, “Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing,” BCP 38, RFC 2827, May 2000 (TXT).
[RFC4346] Dierks, T. and E. Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.1,” RFC 4346, April 2006 (TXT).
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, “SDP: Session Description Protocol,” RFC 4566, July 2006 (TXT).
[RFC5234] Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” STD 68, RFC 5234, January 2008 (TXT).


 TOC 

9.2. Informational References

[I-D.ietf-msec-gdoi-srtp] Baugher, M., Rueegsegger, A., and S. Rowles, “GDOI Key Establishment for the SRTP Data Security Protocol,” draft-ietf-msec-gdoi-srtp-01 (work in progress), December 2007 (TXT).
[I-D.irtf-smug-subsetdifference] Lotspiech, J., Naor, M., and D. Naor, “Subset-Difference based Key Management for Secure Multicast.”
[I-D.mcgrew-srtp-ekt] McGrew, D., Andreasen, F., Wing, D., and L. Dondeti, “Encrypted Key Transport for Secure RTP,” draft-mcgrew-srtp-ekt-06 (work in progress), October 2009 (TXT).
[RFC2627] Wallner, D., Harder, E., and R. Agee, “Key Management for Multicast: Issues and Architectures,” RFC 2627, June 1999 (TXT).
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications,” STD 64, RFC 3550, July 2003 (TXT, PS, PDF).
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, “The Secure Real-time Transport Protocol (SRTP),” RFC 3711, March 2004 (TXT).
[RFC4086] Eastlake, D., Schiller, J., and S. Crocker, “Randomness Requirements for Security,” BCP 106, RFC 4086, June 2005 (TXT).
[RFC4568] Andreasen, F., Baugher, M., and D. Wing, “Session Description Protocol (SDP) Security Descriptions for Media Streams,” RFC 4568, July 2006 (TXT).
[RFC5117] Westerlund, M. and S. Wenger, “RTP Topologies,” RFC 5117, January 2008 (TXT).


 TOC 

Appendix A.  Relationship with EKT

Encrypted Key Transport (EKT) (McGrew, D., Andreasen, F., Wing, D., and L. Dondeti, “Encrypted Key Transport for Secure RTP,” October 2009.) [I‑D.mcgrew‑srtp‑ekt] uses RTCP to send new SRTP keys. For EKT to operate, it needs to distribute its Key Encryption Key (KEK) to all authorized listeners, and EKT describes how Security Descriptions can be provide that function. While KTR could also provide the same function, KTR as described in this document does not support EKT.

This is because EKT can not satisfy the video switching scenario (Section 3.3 (Point to Multipoint Using Video Switching MCUs)) when listeners are ejected or added to the group. In order for EKT to work in that scenario, the video switcher would have to synthesize RTCP packets on behalf of the video sender, or the video switcher would have to tell the video sender exactly how to generate its EKT KEK message for consumption by the DTLS-SRTP-Key-Transport listeners -- which is something only the video switcher should be responsible for doing. Even more complexity would be introduced if LKH is used between the video switcher and the listeners, because only the video switcher is aware of the group membership (the speaker is not) and the video switcher would have to communicate LKH hierarchical information to the speaker so the speaker could generate the EKT message. This would distribute LKH between the speaker and the video switcher. It is more desirable to retain LKH complexity within the video switcher -- as is proposed in Section 3.4 (Scaling to Large Groups).

For the other scenarios, EKT or KTR would work equally well. But EKT still needs a way to securely communicate its Key Encryption Key to the authorized listeners, and if KTR was used to provide that function, there seems no value in using EKT to distribute new keys -- KTR can do that.

For these reasons, KTR does not describe how it would work with EKT.



 TOC 

Appendix B.  Changes

[[Note to RFC Editor: Please remove this section prior to publication.]]



 TOC 

B.1.  Changes from -00 to -01



 TOC 

B.2.  Changes from -01 to -02



 TOC 

Author's Address

  Dan Wing
  Cisco Systems, Inc.
  170 West Tasman Drive
  San Jose, CA 95134
  USA
Email:  dwing@cisco.com


 TOC 

Full Copyright Statement

Intellectual Property

Acknowledgment