Network Working Group Dino Farinacci
Internet-Draft Greg Shepherd
Intended status: Experimental Protocol Yiqun Cai
Expires: January 07, 2012 Stig Venaas
cisco Systems
July 06, 2011

Population Count Extensions to PIM
draft-ietf-pim-pop-count-04.txt

Abstract

This specification defines a method for providing multicast distribution-tree accounting data. Simple extensions to the PIM protocol allow a rough approximation of tree-based data in a scalable fashion.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on January 07, 2012.

Copyright Notice

Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Requirements Notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2. Introduction

This draft proposes a mechanism to convey accounting information using the PIM protocol [RFC4601] [RFC5015]. Putting the mechanism in PIM allows efficient distribution and maintenance of such accounting information. Previous mechanisms require data to be correlated from multiple router sources.

This proposal allows a single router to be queried to obtain accounting and statistic information for a multicast distribution tree as a whole or any distribution sub-tree downstream from a queried router. The amount of information is fixed and does not increase as multicast membership, tree diameter, or branching increase.

The sort of accounting data this draft provides, on a per multicast route basis, are:

  1. The number of branches in a distribution tree.
  2. The membership type of the distribution tree, that is SSM or ASM.
  3. Routing domain and time zone boundary information.
  4. On-tree node and tree diameter counters.
  5. Effective MTU and bandwidth.

This draft adds a new PIM Join Attribute type [RFC5384] to the Join/Prune message as well as a new Hello TLV. The mechanism is applicable to IPv4 and IPv6 multicast.

2.1. Terminology

This section defines the terms used in this draft.

Multicast Route:
A (S,G) or (*,G) entry regardless if the route is in ASM, SSM, or Bidir mode of operation.
Stub Link:
A link with members joined to the group via IGMP or MLD.
Transit Link:
A link put in the oif-list for a multicast route because it was joined by PIM routers.

Note that a link can be both a Stub Link and a Transit Link at the same time.

3. New Hello TLV Pop-Count Support

When a PIM router sends a Join/Prune message to a neighbor, it will encode the data in a new PIM Join Attribute type (described in this draft) when the PIM router determines the neighbor can support this draft. If a PIM router supports this draft, it must send the Pop-Count-Supported TLV. The format of the TLV is defined to be:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          OptionType           |         OptionLength          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          OptionValue                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

OptionType = 29, OptionLength = 4, there is no OptionValue semantics defined at this time but will be included for expandability and be defined in future revisions of this draft. The format will look like:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             29                |              4                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Unallocated Flags                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

Unallocated Flags:
for now should be sent as 0 and ignored on receipt.

4. New Pop-Count Join Attribute Format

When a PIM router supports this draft and has determined from a received Hello, the neighbor supports this draft, it will send Join/Prune messages that MAY include a Pop-Count attribute. The mechanism to process PIM Join Attribute is described in [RFC5384]. The format of the new attribute is described in the following.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|E| Attr Type |    Length     |        Effective MTU          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Flags             |        Options Bitmap         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            Options                            |
   .                               .                               .
   .                               .                               .
   .                               .                               .
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

The above format is used only for entries in the join-list section of the Join/Prune message.

        0                   1
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  Unalloc/Reserved   |P|a|t|A|S|
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

        A-flag   S-flag   Description
        ------   ------   -----------------------------------------
          0        0      There are no members for the group 
                          ('Stub Oif-List Count' is 0)
          0        1      All group members are only SSM capable
          1        0      All group members are only ASM capable
          1        1      There is a mixture of SSM and ASM capable 
            

         0                   1
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |T|s|m|M|d|n|D|z| Unalloc/Rsrvd |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

         bit     Option
        -----   ------------------------
          T      Transit Oif-List Count
          s      Stub Oif-List Count
          m      Minimum Speed Link
          M      Maximum Speed Link
          d      Domain Count
          n      Node Count
          D      Diameter Count
          z      TZ Count
              

F bit:
0 Non-Transitive Attribute.
E bit:
As specified by [RFC5384].
Attr Type:
2.
Length:
The minimum length is 6.
Effective MTU:
This contains the minimum MTU for any link in the oif-list. The sender of Join/Prune message takes the minimum value for the MTU (in bytes) from each link in the oif-list. If this value is less than the value stored for the multicast route (the one received from downstream joiners) then the value should be reset and sent in Join/Prune message. Otherwise, the value should remain unchanged.
This provides one to obtain the MTU supported by multicast distribution tree when examined at the first-hop router(s) or for sub-tree for any router on the distribution tree.
Flags:
The flags field has the following format:
Unallocated Flags:
The flags which are currently not defined. If a new flag is defined and sent by a new implementation, an old implementation should preserve the bit settings. This means that if a bit was set in a PIM Join message from any of the downstream routers, then it MUST also be set in any PIM Join sent upstream.
S flag:
If an IGMPv3 or MLDv2 report was received on any oif-list entry or the bit was set from any PIM Join message. This bit should only be cleared when the above becomes untrue.
A flag:
If an IGMPv1, IGMPv2, or MLDv1 report was received on any oif-list entry or the bit was set from any PIM Join message. This bit should only be cleared when the above becomes untrue.
A combination of settings for these bits indicate:
t flag:
If there are any tunnels on the distribution tree. If a tunnel is in the oif-list, a router should set this bit in its Join/Prune messages. Otherwise, it propagates the bit setting from downstream joiners.
a flag:
If there are any auto-tunnels on the distribution tree. If an auto-tunnel is in the oif-list, a router should set this bit in its Join/Prune messages. Otherwise, it propagates the bit setting from downstream joiners. An example of an auto-tunnel is an tunnel setup by the AMT [AMT] protocol.
P flag:
This flag remains set if all downstream routers support this specification. That is, they are PIM pop-count capable. This allows one to tell if the entire sub-tree is completely accounting capable.
Options Bitmap:
This is a bitmap that shows which options are present. The format of the bitmap is as follows: Section 4.1 for details on the different options. The unallocated bits are reserved. Any unknown bits MUST be set to 0 when a message is sent, and treated as 0 (ignored) when received. This means that unknown options which are denoted by unknown bits are ignored.
By using this bitmap we can specify at most 16 options. If there becomes a need for more than 16 options, one can define a new option that contains a bitmap, which can then be used to specify which further options are present. The last bit in the current bitmap could be used for that option. The exact definition of this is however left for future documents.
Options:
This field contains options. Which options are present are determined by the flag bits. As new flags and options may be defined in the future, any unknown/reserved flags MUST be ignored, and any additional trailing options MUST be ignored. See Section 4.1 for details on the options defined in this document.

4.1. Options

There are several options defined in this document. For each option, there is also a related flag that shows whether the option is present. See the Options Bitmap above for a list of the options and their respective bits. Each option has a fixed size.

Transit Oif-List Count:
This is filled in by a router sending a Join/Prune message which is equal to the number of oifs for the multicast route that has been joined by PIM. This indicates the transit branches on a multicast distribution tree (no members on the links between this router and joining routers). This is added to the value advertised by all downstream PIM routers that have joined on this oif. Length 2 octets.
Stub Oif-List Count:
This is filled in by a router sending a Join/Prune message which is equal to the number of oifs for the multicast route that has been joined by IGMP or MLD. This indicates the links where there are host members for the multicast route. This is added to the value advertised by all downstream PIM routers that have joined on this oif. Length 2 octets.
Minimum Speed Link:
This contains the minimum bandwidth rate for any link in the oif-list and is encoded as specified in Section 4.1.1. The sender of Join/Prune message takes the minimum value for each link in the oif-list for the multicast route. If this value is less than the value stored for the multicast route (the one received from downstream joiners) then the value should be reset and sent in Join/Prune message. Otherwise, the value should remain unchanged. This together with the Maximum Speed Link option provides a way to obtain the lowest and highest speed link for the multicast distribution tree. Length 2 octets.
Maximum Speed Link:
This contains the maximum bandwidth rate for any link in the oif-list and is encoded as specified in Section 4.1.1. The sender of Join/Prune message takes the maximum value for each link in the oif-list for the multicast route. If this value is greater than the value stored for the multicast route (the one received from downstream joiners) then the value should be reset and sent in Join/Prune message. Otherwise, the value should remain unchanged. This together with the Minimum Speed Link option provides a way to obtain the lowest and highest speed link for the multicast distribution tree. Length 2 octets.
Domain Count:
This indicates the number of routing domains the distribution tree traverses. A router should increment this value if it is sending a Join/Prune message over a link which traverses a domain boundary. Length 1 octet.
Node Count:
This indicates the number of routers on the distribution tree. Each router will sum up all the Node Counts from all joiners on all oifs and increment by 1 before including this value in the Join/Prune message. Length 1 octet.
Diameter Count:
This indicates the longest length of any given branch of the tree in router hops. Each router that sends a Join increments the max value received by all downstream joiners by 1. Length 1 octet.
TZ Count:
This indicates the number of timezones the distribution tree traverses. A router should increment this value if it is sending a Join/Prune message over a link which traverses a time zone. This can be a configured link attribute or use other means to determine the timezone is acceptable. Length 1 octet.

4.1.1. Link Speed Encoding

The speed is encoded using 2 octets as follows:

         0                   1           
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        | Exponent  |    Significand    | 
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                

Here are some examples how this is used:

         Link Speed     Exponent     Significand
        ------------   ----------   -------------
         500 kbps       0            500
         500 kbps       2              5
         155 Mbps       3            155
         40 Gpbs        6             40
         100 Gpbs       6            100
         100 Gpbs       8              1
                

4.2. Example message layouts

We will here give a few examples to illustrate the use of flags and options.

A minimum size message has no option flags set, and looks like this:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|E| Attr Type |  Length = 6   |        Effective MTU          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Unalloc/Reserved   |P|a|t|A|S|0|0|0|0|0|0|0|0| Unalloc/Rsrvd |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

A message containing all the options defined in this document would look like this:

            <figure>
            <preamble></preamble>
            <artwork><![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|E| Attr Type |  Length = 18  |        Effective MTU          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Unalloc/Reserved   |P|a|t|A|S|1|1|1|1|1|1|1|1| Unalloc/Rsrvd |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Transit Oif-List Count     |     Stub Oif-List Count       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Minimum Speed Link       |      Maximum Speed Link       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Domain Count |  Node Count   | Diameter Count|    TZ Count   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

A message containing only Stub Oif-List Count and Node Count would look like this:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|E| Attr Type |  Length = 9   |        Effective MTU          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Unalloc/Reserved   |P|a|t|A|S|0|1|0|0|0|1|0|0| Unalloc/Rsrvd |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Stub Oif-List Count       |  Node count   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            

5. How to use Pop-Count Encoding

A router supporting this draft MUST include PIM Join Attribute TLV in its PIM Hellos. See [RFC5384] and [HELLO] for details.

It is very important to note that any changes to the values maintained in this draft MUST NOT trigger a new Join/Prune message. Due to the periodic nature of PIM, the values can be accurately obtained at 1 minute intervals (or whatever Join/Prune interval used).

When a router removes a link from an oif-list, it must be able to reevaluate the values that it will advertise upstream. This happens when an oif-list entry is timed out or a Prune is received.

It is recommended that the Join Attribute defined in this draft be used for entries in the join-list part of the Join/Prune message. If the new encoding is used in the prune-list or an Assert message, an implementation must ignore them but still process the Prune as if it was in the original encoding described in [RFC4601].

It is also recommended that join suppression be disabled on a LAN when Pop-Count is used.

6. Implementation Approaches

An implementation can decide how the accounting attributes are maintained. The values can be stored as part of the multicast route data structure by combining the local information it has with the joined information on a per oif basis. So when it is time to send a Join/Prune message, the values stored in the multicast route can be copied to the message.

Or, an implementation could store the accounting values per oif and when a Join/Prune message is sent, it can combine the oifs with its local information. Then the combined information can be copied to the message.

When a downstream joiner stops joining, accounting values cached must be evaluated. There are two approaches which can be taken. One is to keep values learned from each joiner so when the joiner goes away the count/max/min values are known and the combined value can be adjusted. The other approach is to set the value to 0 for the oif, and then start accumulating new values as subsequent Joins are received.

The same issue arises when an oif is removed from the oif-list. Keeping per-oif values allows you to adjust the per-route values when an oif goes away. Or, alternatively, a delay for reporting the new set a values from the route can occur while all oif values are zeroed (where accumulation of new values from subsequent Joins cause re-population of values and a new max/min/ count can be reevaluated for the route).

It is recommended that when triggered Join/Prune messages are sent by a downstream router, that the accounting information not be included in the message. This way when convergence is important, avoiding the processing time to build an accounting record in a downstream router and processing time to parse the message in the upstream router will help reduce convergence time. An upstream router should not interpret a Join/Prune message received with no accounting data to mean clearing or resetting what accounting data it has cached.

7. Caveats

This draft requires each router on a multicast distribution tree to support this draft or else the accounting attributes for the tree will not be known.

However, if there are a contiguous set of routers downstream in the distribution tree, they can maintain accounting information for the sub-tree.

If there are a set of contiguous routers supporting this draft upstream on the multicast distribution tree, accounting information will be available but it will not represent an accurate assessment of the entire tree. Also, it will not be clear for how much of the distribution tree the accounting information covers.

8. IANA Considerations

A new PIM Hello Option type, 29, has been assigned. See [HELLO] for details.

A new PIM Join Attribute type needs to be assigned. 2 is proposed in this draft.

9. Security Considerations

There are no security considerations for this design other than what is already in the main PIM specification [RFC4601].

10. Acknowledgments

The authors would like to thank John Zwiebel, Amit Jain, and Clayton Wagar for their review comments on the initial versions of this draft. Further review and comments were provided by Thomas Morin and Zhaohui (Jeffrey) Zhang.

11. References

11.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4601] Fenner, B., Handley, M., Holbrook, H. and I. Kouvelas, "Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)", RFC 4601, August 2006.
[RFC5015] Handley, M., Kouvelas, I., Speakman, T. and L. Vicisano, "Bidirectional Protocol Independent Multicast (BIDIR-PIM)", RFC 5015, October 2007.
[RFC5384] Boers, A., Wijnands, I. and E. Rosen, "The Protocol Independent Multicast (PIM) Join Attribute Format", RFC 5384, November 2008.

11.2. Informative References

[HELLO] IANA, , "PIM Hello Options", PIM-HELLO-OPTIONS per RFC4601 http://www.iana.org/assignments/pim-hello-options, March 2007.
[AMT] Thaler, D., Talwar, M., Aggarwal, A., Vicisano, L. and T. Pusateri, "Automatic IP Multicast Without Explicit Tunnels (AMT)", Internet-Draft draft-ietf-mboned-auto-multicast-10.txt, March 2010.

Authors' Addresses

Dino Farinacci cisco Systems Tasman Drive San Jose, CA 95134 USA EMail: dino@cisco.com
Greg Shepherd cisco Systems Tasman Drive San Jose, CA 95134 USA EMail: gjshep@gmail.com
Yiqun Cai cisco Systems Tasman Drive San Jose, CA 95134 USA EMail: ycai@cisco.com
Stig Venaas cisco Systems Tasman Drive San Jose, CA 95134 USA EMail: stig@cisco.com