Opsawg | M. Cociglio |
Internet-Draft | A. Capello |
Intended status: Experimental Protocol | A. Tempia Bonda |
Expires: September 08, 2011 | L. Castaldelli |
Telecom Italia | |
March 07, 2011 |
A packet-based method for passive performance monitoring
draft-tempia-opsawg-p3m-00.txt
This document describes a method to accomplish performance monitoring measurements on real traffic, applicable to any packet-based stream, including L2, L3, MPLS traffic, unicast and multicast. The method can be easily implemented using tools and features already available on existing routing platforms without any protocol extension and, for this reason, it does not raise any interoperability issue. However, the method could be further improved by means of some extension to existing protocols, but this aspect is left for further study and it is out of the scope of the document.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 08, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The increasing deployment in service provider networks of highly sensitive applications demands for mechanisms able to monitor and measure network performances in terms of packet loss, one-way delay, two-way delay and delay variation.
The main driver for a network operator to measure performance metrics is Service Level Agreements (SLA) verification: when a SLA is agreed between service provider and customer, it is very important for the operator to know the quality of end-user experience and whether the performance of the production network meets SLA requirements. On the other hand, performance monitoring provides useful information on the network itself, facilitating the troubleshooting and the localization of network problems.
This document describes a mechanism to enable simple and efficient performamce monitoring on real traffic. The method can be applied to any kind of packet-based traffic, L2, L3, MPLS, unicast and multicast and doesn't require any protocol extension or interaction with existing protocols, releasing the mechanism from any interoperability issue. In addition, one important advantage of the mechanism is the ability to perform passive measurements: compared to active measurements performed with artificial traffic and probes, retrieving performance statistics from the real traffic has many benefits. Measurements relying on real traffic are more precise because sampled probing packets can yield biased results. Moreover, active measurements can produce inaccurate results if artificial traffic doesn't follow the same path followed by user traffic. Finally, even if probes follow the same path as the real flow, it is not guaranteed that they are treated like user traffic from a QoS point of view.
There is a lot of work related to OAM and [I-D.ietf-opsawg-oam-overview] provides a good overview of existing OAM mechanisms defined in IETF, ITU-T and IEEE. In IETF in particular there is a lot of work on fault detection and connectivity verification, while a minor effort is dedicated to performance monitoring. IPPM WG has defined standard metrics to measure network performance, however the metrics developed in the WG refer to an active measurement context where the devices used to measure the metrics produce their own traffic. [I-D.ietf-mpls-loss-delay] specifies protocol mechanisms to enable the measurement of packet loss, one-way and two-way delay and delay variation in MPLS networks.
This document provides a general mechanism to enable accurate performance monitoring of any kind of traffic in a service provider network.
The method addresses primarily packet loss measurement, but it can be easily extended to one-way delay and delay variation measurements as well.
In order to perform packet loss measurements on a real flow it is possible to follow several approaches. The most intuitive one consists in numbering the packets so that each router receiving that flow is able to immediately detect a missing packet. Such approach, though very simple in theory, is not as simple to achieve: it requires to insert a sequence number in each packet and to have an equipment able to extract the number and check it in real time. A similar task can be difficult to implement on real traffic: if UDP is used as the transport protocol, the sequence number is not available, on the other hand, if a higher layer sequence number (e.g. in the RTP header) is used, extracting the information from the RTP header on every packet and process it in real-time can overload the equipment.
An alternative approach is to count the number of packets sent on one end of the measurement and the packets received on the other end of the measurement and to compare those two values. This operation is much simpler to implement than numbering each packet, but requires a kind of synchronization between the routers performing the measurement: in order to compare two counters it is required that they refer exactly to the same set of packets. Since a flow is continuous and cannot be stopped when a counter has to be read, it can be difficult to determine exactly at which point of time to read the counter. A possible solution to overcome this problem is to virtually split the flow in blocks inserting periodic delimitations so that each counter refers exactly to a single block of packets. The delimitation can be done f.i. inserting periodically in the flow a packet generated to this purpose.
Compared to numbering, the second approach is easier to implement, however, delimiting the flow using specific packets can have some limits. First of all it requires to generate additional packets within the flow and requires the equipment to be able to process those packets. In addition the method is vulnerable to delimiting packets losses: if a delimiting packet is loss, blocks are affected thus producing wrong measurements.
The method proposed in this document follows the second approach described, but doesn't use additional packets to virtually split the flow in blocks. Instead, it "colors" the flow so that packets belonging to different consecutive blocks have a different color and all the packets belonging to the same block have the same color. Using this simple principle it is possible to efficiently measure packet loss on real traffic streams without the need to number packets or generate additional artificial packets.
Considering the principles just mentioned, the method allows a network operator to perform performance monitoring on real traffic with the following advantages:
Figure 1 represents a very simple network and shows how the method can be used to measure packet loss on different network segments: enabling the measurement on several interfaces along the path, it is possible to perform link monitoring, node monitoring or end-to-end monitoring. More generally the method is flexible enough to measure packet loss on any segment of the network.
Traffic flow ========================================================> +------+ +------+ +------+ +------+ ---<> R1 <>-----<> R2 <>-----<> R3 <>-----<> R4 <>--- +------+ +------+ +------+ +------+ . . . . . . . . . . . . . <------> <-------> . . Node Packet Loss Link Packet Loss . . . <---------------------------------------------------> End-to-End Packet loss
This section describes in detail how the method can be used to measure packet loss on real traffic in a packet-switched network.
The basic idea is to virtually split the traffic in blocks that could be easily and unambiguously identified: counting the number of packets of each block and comparing the values obtained in different measurement points along the path, it is possible to measure packet loss occurred in any single block between any two points.
The following figure shows how blocks are created generating periodic delimitation points in the flow.
| | | | | | | Traffic flow | | ========|===========|===========|===========|===========|==========> ... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1 | | | | |
A simple way to create delimitation points is to "color" the traffic with two different colors and change the color periodically. Whenever the color changes the current block terminates and the following one begins. Hence all the packets belonging to the same block have the same color and packets of different consecutive blocks have a different color. The number of packets in each block depends on the criterion used to create the blocks: if the color switches every fixed number of packets, each block contains the same number of packets, but if the color switches according to a timer, the number of packets may be different in each block and dependent on the bit rate.
The following figure shows how a flow looks like when it is split in traffic blocks coloring packets.
A: packet with A coloring B: packet with B coloring | | | | | | | Traffic flow | | -------------------------------------------------------------------> BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA -------------------------------------------------------------------> ... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1 | | | | |
Figure 4 shows how the method can be used to measure link packet loss between two adjacent nodes.
Referring to the figure, let's assume we want to monitor the packet loss on the link between R1 and R2. According to the method here described, traffic is colored alternatively with two different colors, A and B, and whenever the color changes, a demarcation point is created, generating a sort of square-wave signal where original traffic flow is virtually split in a sequence of blocks.
Color A ----------+ +-----------+ +---------- | | | | Color B +-----------+ +-----------+ Block n ... Block 3 Block 2 Block 1 <---------> <---------> <---------> <---------> <---------> Traffic flow ===========================================================> Color ... AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA... ===========================================================> +---------+ +---------+ -------<> R1 <>----------------------<> R2 <>------ +---------+ +---------+
Traffic coloring can be executed by R1 itself or by an upward router. R1 needs two counters, C(A)R1 and C(B)R1, in its egress interface in order to count the number of packets sent out the interface and colored respectively with color A and B. As long as traffic is colored A, only counter C(A)R1 is incremented while C(B)R1 is still, viceversa during a B block only C(B)R1 is incremented while C(A)R1 is still. C(A)R1 and C(B)R1 can be used as reference values to determine the packet loss from R1 to any other measurement point down the path. Router R2, similarly, will need on its ingress interface two counters, C(A)R2 and C(B)R2, to count the number of packets received on the interface and colored with color A and B respectively. When an A block terminates it is possible to compare C(A)R1 and C(A)R2 determining any packet loss within the block, similarly when the successive B block terminates it is possible to compare C(B)R1 with C(B)R2 determining any packet loss in that block, and so on for every successive block.
Similarly, using other 2 counters on R2 egress interface it is possible to count the number of packets sent out R2 interface and use them as reference values to determine the packet loss from R2 to any measurement point down R2.
The method doesn't require any synchronization in the network, as the traffic flow implicitly carries the synchronization in the alternation of colors. In addition, splitting the flow in blocks, the method is able not only to detect any packet loss, but also to provide information about when the packet loss has occurred and in which point of the network.
The following table shows how router counters can be used to calculate the packet loss between R1 and R2. The first column lists the sequence of traffic blocks while the other columns contain the counters of A-colored packets and B-colored packets for R1 and R2. In this example, we assume that counter values are reset whenever a block ends and the relative counter is read: with this assumption the table shows only relative counter values, that is the exact number of packets of each color within each block. If counter values were not reset, the table would contain cumulative counters, but the relative values could be equally determined by difference from the counter of the previous block of the same color.
Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss |
---|---|---|---|---|---|
1 | 375 | 0 | 375 | 0 | 0 |
2 | 0 | 388 | 0 | 388 | 0 |
3 | 382 | 0 | 381 | 0 | 1 |
4 | 0 | 377 | 0 | 374 | 3 |
... | ... | ... | ... | ... | ... |
n | 0 | 387 | 0 | 387 | 0 |
n+1 | 379 | 0 | 377 | 0 | 2 |
As table shows, counters increase according to traffic coloring: during an A block (blocks 1, 3 and n+1) all the packets are A-colored therefore C(A) counter indicates the number of packets of that block, while C(B) counter is zero. Viceversa, during a B block (blocks 2, 4 and n) all the packets are B-colored therefore C(A) counter is zero, while C(B) counter indicates the number of packets of that block.
When a block terminates (because the coloring has switched to the other color) the relative counters stop incrementing and it is possible to read them and compare the values measured on router R1 and R2 thus determining any packet loss within that block.
For example, looking at the table above, during the first block (A-colored) C(A)R1 and C(A)R2 have the same value (375) which corresponds to the exact number of packets of the first block. Also during the second block (B-colored) R1 and R2 counters have the same value (388) which corresponds to the number of packets of the second block. During blocks 3 and 4 on the other hand R1 and R2 counters are different meaning that some packet has been lost: precisely, 1 packet (382-381) belonging to block 3 and 3 packets (377-374) belonging to block 4 were lost.
The method here described for R1 and R2 can be extended to any router and applied to more complex networks, as far as the measurement is enabled on the path followed by the traffic flow being analyzed, thus providing a precise measurement of packet loss between any couple of network equipment.
This version of the draft addresses only packet loss measurement, one-way delay will be discussed in future versions of the document.
This version of the draft addresses only packet loss measurement, delay variation will be discussed in future versions of the document.
The methodology described in the previous sections can be applied to different scenarios adopting different strategies. Specifically, it can be used in two basic ways:
The flow-based strategy is used when only a limited number of traffic flows needs to be monitored. This could be the case, for example, of IPTV channels or other specific applications traffic with high QoS requirements.
According to this strategy, only a subset of the flows is colored. Counters for packet loss measurements can be instantiated for each single flow, or for the set as a whole, depending on the desired granularity.
A relevant problem with this approach is the necessity to know in advance the path followed by flows that are subject to measurement. Path rerouting and traffic load-balancing increase the issue complexity, especially for unicast traffic. The problem is easier to solve for multicast traffic where load balancing is seldom used, especially for IPTV traffic where static joins are frequently used to force traffic forwarding and replication.
The link-based strategy is similar to performance monitoring tools usually used in transport networks, where the goal is to monitor the network behavior as a whole, without distinguishing among different services.
Measurements are performed on all the traffic on a link. The link could be a physical link or a logical link (for instance an Ethernet VLAN or a MPLS PW). Counters can be instantiated for the traffic as a whole or for each traffic class (in case it is desired to monitor each class separately), but in the second case a couple of counters is needed for each class.
This section describes, as an example, a practical implementation of the method and shows how the monitoring could be easily activated in a real network, without the need to activate any specific protocol, using instead basic features already available on major routing platforms.
Traffic coloring can be implemented setting a specific bit on the packet header and changing the value of that bit periodically. For example it is possible to use the two less significant bits of the DSCP field (bit 0 and bit 1). One of them (bit 0) is used to identify flows subject to traffic monitoring (and therefore it is always set to 1 on these flows), the other one (bit 1) changes periodically and is used to create traffic blocks assuming alternately values 0 and 1. The choice of coloring traffic using the DSCP field implies that differentiated packet scheduling must not be based on that field but, for instance, only on IP Precedence bits.
In practice, coloring traffic using the DSCP field can be performed configuring on the router interface an access list that intercepts the flow(s) to be monitored (or all the traffic in the link-based approach) and a policy that sets the DSCP field accordingly. Since traffic coloring must change over time, it is necessary to modify the policy periodically: an automatic script can easily perform this task.
The operation of counting packets can be implemented very easily. If traffic is colored using the DSCP field, an access list that matches specific DSCP values can be used to count packets of the flow being monitored. The access list can also be configured to match different flow properties (such as source or destination address) besides the DSCP value, hence monitoring just a subset of the colored traffic. An important feature of this approach, in fact, is that coloring and counting are two decoupled operations: it is possible to color all the traffic, but monitor just one or few flows.
In order to properly elaborate packet counters it is necessary to correlate values coming from different nodes. If we cannot use any specific protocol to exchange this information among routers, it is possible to use an external system. Its task is to collect data (counter values) from the network and make correlations to determine packet loss. This operation can be done for instance transferring data to the external system via FTP or TFTP.
This section describes some aspects that should be taken into account when the method is deployed in a real network.
In the previous section it was outlined that flow-based measurements require the identification of the flow to be monitored and the discovery of the path followed by the selected flow. It is possible to monitor a single flow or multiple flows grouped together, but in this case measurement is consistent only if all the flows in the group follow the same path. Moreover, a network operator must be aware that, if a measurement is performed on many flows, it is not possible to determine exactly which flow was affected by packets loss.
Once the flow(s) to be monitored have been identified, it is important to enable the monitoring system in the proper nodes. In order to have just an end-to-end monitoring it is sufficient to enable the monitoring system on the first and last hop routers of the path: the mechanism is completely transparent to intermediate nodes and independent from the path followed by traffic flows. On the contrary, to monitor the flow along its whole path and on every segment (every node and link) it is necessary to enable the monitoring system on every node from the source to the destination. In case the exact path followed by the flow is not known a priori (i.e. the flow has multiple paths to reach the destination) it is necessary to enable the monitoring system on every path: counters on interfaces traversed by the flow will report packet count, counters on other interfaces will be null.
In case the link-based strategy is used, flow identification is not necessary because all the traffic has to be colored and measured.
In both strategies, flow-based and link-based, the fundamental operation is to "color" the flow in order to create packet blocks. This means choosing where to activate the coloring and how to "color" packets.
In case of flow-based measurements, it is desirable, in general, to have a single coloring node because it is simpler to manage and doesn't rise any risk of conflict (consider the case where two nodes color the same flow). To this purpose it is necessary to color the flow as close as possible to the source. In addition, coloring a flow close to the source allows an end-to-end measure if a measurement point is enabled on the last-hop router as well. The only requirement is that coloring must change periodically and every node along the path must be able to identify unambiguously colored packets.
For link-based measurements, all traffic needs to be colored when transmitted on the link. If traffic had already been colored, then it has to be re-colored because the coloring must be consistent on the link. This means that each hop along the path must (re-)color the traffic but coloring is not required to be consistent along different links.
In the previous section it was explained that, in case of flow-based measurement, the operation of coloring packets to be monitored can be accomplished by a single node. All the intermediate nodes are not required to perform any particular operation except counting colored packets they receive and forward: this operation can be enabled on every router along the path or only on a subset, depending on which network segment is being monitored (a single link, a particular metro area, the backbone, the whole path).
Since coloring changes periodically between two values, two counters (one for each value) are needed for a single flow being monitored: one counter for packets colored A and one counter for packets colored B.
In case of link-based measurements the behavior is similar except that coloring and counting operations are performed on a link by link basis at each endpoint of the link itself.
Another important aspect to take into consideration is when to read counters: in order to count the exact number of packets of a block routers must perform this operation when a block has terminated. The task can be performed in two ways. The most general approach suggests to read counters periodically, many times during a block, and to compare successive readings: when the counter stops incrementing means that the relative block has finished and its counter can be elaborated. Alternatively, if coloring is performed based on a timer and the duration of the blocks is fixed and known, it is possible to synchronize counter collection with that timer (f.i. if each block is 5 minutes long it is possible to read counters every 5 minute in the middle of the block to overcome eventual time shifts from the router that colors traffic).
Nodes enabled to perform performance monitoring collect the value of counters for colored packets, but they are not able to use this information to measure packet loss, because they only have local information and lack a global view of the network. For this reason an external Network Management System (NMS) is required to collect and elaborate data and to perform packet loss calculation. The NMS compares values of counters from different nodes and is then able to determine if some packets were lost (even a single packet) and also where packets were lost.
Information collected by the routers (counter values) needs to be transferred to the NMS periodically. This can be accomplished f.i. via FTP or TFTP and can be done in Push Mode or Polling Mode. In the first case, each router sends periodically the information it collects to the NMS, in the latter case it is the NMS that periodically polls routers to collect information.
If link-based measurement is used, it is also possible to use a protocol to exchange values of counters between the two endpoints in order to let them perform the packet loss calculation for each traffic direction. A similar approach would be complicate if applied to a flow-based measurement.
This section describes what is needed on a node in order to enable the performance measurement system to the purpose of understanding its scalability.
The coloring can be easily performed on a single flow as well as on the entire traffic. Regarding the counting, what is needed are two counters for every flow (or group of flows) being monitored and for every interface where the monitoring system is activated. For example, in order to monitor separately 3 flows on a router with 4 interfaces involved, 24 counters are needed (2 counters for each of the 3 flows on each of the 4 interfaces).
The method described in this document doesn't raise any interoperability issue, since it doesn't require any new protocol or any kind of interaction among nodes. Traffic coloring can be performed by a single node, while counting of packets is performed locally by each router and the correlation between counters can be done by an external NMS which collects and correlates the data coming from the network.
The only requirement is that every node should be able to identify colored flows, but, as explained in Section 5, this can be accomplished using simple functionalities that doesn't have any interoperability issue and are already available on major routing platforms.
This document specifies a method to perform measurements in the context of a Service Provider's network and has not been developed to conduct Internet measurements, so it does not directly affect Internet security nor applications which run on the Internet. However, implementation of this method must be mindful of security and privacy concerns.
There are two types of security concerns: potential harm caused by the measurements and potential harm to the measurements. For what concerns the first point, the measurements described in this document are passive, so there are no packets injected into the network causing potential harm to the network itself and to data traffic. Nevertheless, the method implies modifications on the fly to the IP header of data packets: this must be performed in a way that doesn't alter the quality of service experienced by packets subject to measurements and that preserve stability and performance of routers doing the measurements. The measurements themselves could be harmed by routers altering the coloring of the packets, or by an attacker injecting artificial traffic. Authentication techniques, such as digital signatures, may be used where appropriate to guard against injected traffic attacks.
The privacy concerns of network measurement are limited because the method only relies on information contained in the IP header without any release of user data.
There are no IANA actions required.
The authors would like to thank Domenico Laforgia, Daniele Accetta and Mario Bianchetti for their contribution to the definition and the implementation of the method.
[I-D.ietf-opsawg-oam-overview] | Mizrahi, T, Sprecher, N, Bellagamba, E and Y Weingarten, "An Overview of Operations, Administration, and Maintenance (OAM) Mechanisms", Internet-Draft draft-ietf-opsawg-oam-overview-05, May 2011. |
[I-D.ietf-mpls-loss-delay] | Frost, D and S Bryant, "Packet Loss and Delay Measurement for MPLS Networks", Internet-Draft draft-ietf-mpls-loss-delay-04, July 2011. |