TOC |
|
This draft describes a mechanism for use in conjunction with link state routing protocols which prevents the transient loops which would otherwise occur during topology changes. It does this by correctly sequencing the FIB updates on the routers.
This mechanism can be used in the case of non-urgent link or node shutdowns and restarts or link metric changes. It can also be used in conjunction with a FRR mechanism which converts a sudden link or node failure into a non-urgent topology change. This is possible where a complete repair path is provided for all affected destinations.
After a non-urgent topology change, each router computes a rank that defines the time at which it can safely update its FIB. A method for accelerating this loop-free convergence process by the use of completion messages is also described.
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 6, 2010.
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License.
1.
Conventions used in the document
2.
Introduction
3.
The required FIB update order
3.1.
Single Link Events
3.1.1.
Link Down / Metric Increase
3.1.2.
Link Up / Metric Decrease
3.2.
Multi-link events
3.2.1.
Router Down events
3.2.2.
Router Up events
3.2.3.
Linecard Failure/Restoration Events
4.
Applying ordered FIB updates
4.1.
Deducing the topology change
4.2.
Deciding if ordered FIB updates applies
5.
Computation of the ordering
5.1.
Link or Router Down or Metric Increase
5.2.
Link or Router Up or Metric Decrease
6.
Acceleration of Ordered Convergence
6.1.
Construction of the waiting list and notification list
6.1.1.
Down events
6.1.2.
Up Events
6.2.
Format of Completion Messages
7.
Fall back to Conventional Convergence
8.
oFIB state machine
8.1.
OFIB_STABLE
8.2.
OFIB_HOLDING_DOWN
8.3.
OFIB_HOLDING_UP
8.4.
OFIB_ONGOING
8.5.
OFIB_ABANDONED
9.
IANA considerations
10.
Security considerations
11.
Acknowledgments
12.
Informative References
§
Authors' Addresses
TOC |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, [4] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).
TOC |
With link-state protocols, such as IS-IS [1] (International Organization for Standardization, “Intermediate system to Intermediate system intra-domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service (ISO 8473),” Nov 2002.) and OSPF [5] (Moy, J., “OSPF Version 2,” April 1998.), each time the network topology changes, some routers need to modify their Forwarding Information Base (FIB) to take into account the new topology. Each topology change causes a convergence phase. During this phase, routers may transiently have inconsistent FIBs, which may lead to packet loops and losses, even if the reachability of the destinations is not compromised after the topology change. Packet losses and transient loops can also occur in the case of a link down event implied by a maintenance operation, even if this operation is predictable and not urgent. When the link state change is a metric update and when a new link is brought up in the network, there is no direct loss of connectivity, but transient packet loops and loss can still occur.
For example, in Figure 1 (A simple topology), if the link between X and Y is shut down by an operator, packets destined to X can loop between R and Y when Y has updated its FIB while R has not yet updated its FIB, and packets destined to Y can loop between X and S if X updates its FIB before S. According to the current behaviour of ISIS and OSPF, this scenario will happen most of the time because X and Y are the first routers to be aware of the failure, so that they will update their FIBs first.
1 X-------------/-------------Y | | | | | | | | 1 | | 1 | | | | | | | | S---------------------------R 2
Figure 1: A simple topology |
It should be noted that the loops can occur remotely from the failure, not just adjacent to it.
The goal of this draft is to define a mechanism which sequences the router FIB updates to maintain consistency throughout the network. By correctly setting the FIB change order no looping or packet loss can occur. This mechanism may be applied to the case of managed link-state changes, i.e. link metric change, manual link down/up, manual router down/up, and managed state changes of a set of links attached to one router. It may also be applied to the case where one or more network elements are protected by a fast re-route mechanism [7] (Shand, M. and S. Bryant, “IP Fast Reroute Framework,” January 2010.) [6] (Pan, P., Swallow, G., and A. Atlas, “Fast Reroute Extensions to RSVP-TE for LSP Tunnels,” May 2005.). The mechanisms that are used in the failure case are exactly the same as those used for managed changes. For simplicity this draft makes no further distinction between managed and unplanned changes.
TOC |
This section provides an overview of the required ordering of the FIB updates. A more detailed analysis of the rerouting dynamics and correctness proofs of the mechanism can be found in [3] (P. Francois and O. Bonaventure, “Avoiding transient loops during IGP convergence in IP Networks,” December 2007.).
TOC |
For simplicity the correct ordering for single link changes are described first. The draft then builds on this to demonstrate that the same principles can be applied to more complex scenarios such as line card or node changes.
TOC |
First consider the non-urgent failure of a link or the increase of a link metric. In this case, a router R MUST NOT update its FIB until all other routers that send traffic via R and the affected link have first updated their FIBs.
The following argument shows that this rule ensures the correct order of FIB change when the link X->Y is shut down or its metric is increased.
An "outdated" FIB entry for a destination is defined as being a FIB entry that still reflects the shortest path(s) in use before the topology change. Once a packet reaches a router R that has an outdated FIB entry for the packet destination, then, provided the oFIB ordering is respected, the packet will continue to X only traversing routers that also have an outdated FIB entry for the destination. The packet thus reaches X without looping and will be forwarded to Y via X->Y (or in the case of FRR, the X->Y repair path) and hence reach its destination.
Since it can be assumed that the original topology was loop-free, Y will never use the link Y->X to reach the destination and hence the path(s) between Y and the destination are guaranteed to be unaffected by the topology change. It therefore follows that the packet arriving at Y will reach its destination without looping.
Since it can also be assumed that the new topology is loop-free, by definition a packet cannot loop while being forwarded exclusively by routers with an updated FIB entry.
In other words, when the oFIB ordering is respected, if a packet reaches an outdated router, it can never subsequently reach an updated router, and cannot loop because from this point on it will only be forwarded on the consistent path that was used before the event. If it does not reach an outdated router, it will only be forwarded on the loop free path that will be used after the convergence.
According to the proposed ordering, X will be the last router to update its FIB. Once it has updated its FIB, the link X->Y can actually be shut down (or the repair removed).
If the link X-Y is bidirectional a similar process must be run to order the FIB update for destinations using the link in the direction Y->X. As has already been shown, no packet ever traverses the X-Y link in both directions, and hence the operation of the two ordering processes is orthogonal.
TOC |
In the case of link up events or metric decreases, a router R MUST update its FIB BEFORE all other routers that WILL use R to reach the affected link.
The following argument shows that this rule ensures the correct order of FIB change when the link X->Y is brought into service or its metric is decreased.
Firstly, when a packet reaches a router R that has already updated its FIB, all the routers on the path from R to X will also have updated their FIB, so that the packet will reach X and be forwarded along X->Y, ultimately reaching its destination.
Secondly, a packet cannot loop between routers that have not yet updated their FIB. This proves that no packet can loop.
TOC |
The following sections describe the required ordering for single events which may be manifest as multiple link events. For example, the failure of a router may be notified to the rest of the network as the individual failure of all its attached links. The means of identifying the event type from the collection of received link events is described in Section 4.1 (Deducing the topology change).
TOC |
In the case of the non-urgent shut-down of a router, a router R MUST NOT update its FIB until all other routers that send traffic via R and the affected router have first updated their FIBs.
Using a proof similar to that for link failure, it can be shown that no loops will occur if this ordering is respected [3] (P. Francois and O. Bonaventure, “Avoiding transient loops during IGP convergence in IP Networks,” December 2007.).
TOC |
In the case of a router being brought into service, a router R MUST update its FIB BEFORE all other routers that WILL use R to reach the affected router.
A proof similar to that for link up, shows that no loops will occur if this ordering is respected [3] (P. Francois and O. Bonaventure, “Avoiding transient loops during IGP convergence in IP Networks,” December 2007.).
TOC |
The failure of a line card involves the failure of a set of links all of which have a single node in common, i.e. the parent router. The ordering to be applied is the same as if it were the failure of the parent router.
In a similar way, the restoration of an entire linecard to service as a single event can be treated as if the parent router were returning to service.
TOC |
TOC |
As has been described, a single event such as the failure or restoration of a single link, single router or a linecard may be notified to the rest of the network as a set of individual link change events. It is necessary to deduce from this collection of link state notifications the type of event that has occurred in the network and hence the required ordering.
When a link change event is received which impacts the receiving router's FIB, the routers at the near and far end of the link are noted.
If all events received within some hold-down period have a single router (R) in common, then it is assumed that the change reflects an event (line-card or router change) concerning the common router (R).
In the case of a link change event, the router at the far end of the link is deemed to be the common router (R).
All ordering computations are based on treating the common router R as the root for both link and node events.
TOC |
There are some events (for example a subsequent failure with conflicting repair requirements occurring before the ordered FIB process has completed) that cannot be correctly processed by this mechanism. In these cases it is necessary to ensure that convergence falls back to the conventional mode of operation (see Section 7 (Fall back to Conventional Convergence)).
In all cases it is necessary to wait some hold-down period after receiving the first notification to ensure that all routers have received the complete set of link state notifications associated with the single event.
At any time, if a link change notification is received which would have no effect on the receiving router's FIB, then it may be ignored.
If no other event is received during the hold-down time, the event is treated as a link event. Note that the reverse connectivity check means that only the first failure event, or second up event have an effect on the FIB.
If an event is received within the hold down period which does NOT reference the common router (R) then in this version of the specification normal convergence is invoked immediately (see Section 7 (Fall back to Conventional Convergence)).
The sudden failure of a link or a set of links that are not protected using a FRR mechanism must be processed using the conventional mode of operation.
In summary an ordered FIB process is applicable iif the set of link state notifications received between the first event and the hold down period reference a common router R, and one of the following assertions is verified :
. The set of notifications refer to link down events concerning protected links and metric increase events
. The set of notifications refer to link up events and metric decrease events.
TOC |
This section describes how the required ordering is computed.
TOC |
To respect the proposed ordering, routers compute a rank that will be used to determine the time at which they are permitted to perform their FIB update. In the case of a failure event rooted at router Y or an increase of the metric of link X->Y, router R computes the reverse Shortest Path Tree in the topology before the failure (rSPT_OLD) rooted at Y. This rSPT gives the shortest paths to reach Y before the failure. The branch of the reverse SPT that is below R corresponds to the set of shortest paths to R that are used by the routers that reach Y via R.
The rank of router R is defined as the depth (in number of hops) of this branch. In the case of ECMP, the maximum depth of the ECMP path set is used.
Router R is required to update its FIB at time
T0 + H + rank * MAX_FIB
where T0 is the arrival time of the link-state packet containing the topology change, H is the hold-down time and MAX_FIB is a network-wide constant that reflects the maximum time required to update a FIB irrespective of the change required. The value of MAX_FIB is network specific and its determination is out of the scope of this document. This value must be agreed by all the routers in the network. This agreement can be performed by using a capability TLV as defined in [8] (K, A. and S. Bryant, “Synchronisation of Loop Free Timer Values,” February 2008.).
All the routers that use R to reach Y will compute a lower rank than R, and hence the correct order will be respected. It should be noted that only the routers that used Y before the event need to compute their rank.
TOC |
In the case of a link or router up event rooted at Y or a link metric decrease affecting link Y->W, a router R must have a rank that is higher than the rank of the routers that it will use to reach Y, according to the rule described in Section 3 (The required FIB update order). The rank of R is thus the number of hops between R and Y in its renewed Shortest Path Tree. When R has multiple equal cost paths to Y, the rank is the length in hops of the longest ECMP path to Y.
Router R is required to update its FIB at time
T0 + H + rank * MAX_FIB
It should be noted that only the routers that use Y after the event have to compute a rank, i.e. only the routers that have Y in their SPT after the link-state change.
TOC |
The mechanism described above is conservative, and hence may be relatively slow. The purpose of this section is to describe a method of accelerating the controlled convergence in such a way that ordered loop-free convergence is still guaranteed.
In many cases a router will complete its required FIB changes in a time much shorter than MAX_FIB and in many other cases, a router will not have to perform any FIB change at all.
This section describes the use of completion messages to speed up the convergence by providing a means for a router to inform those routers waiting for it, that it has completed any required FIB changes. When a router has been advised of completion by all the routers for which it is waiting, it can safely update its own FIB without further delay. In most cases this can result in a sub-second re-convergence time comparable with that of normal convergence.
Routers maintain a waiting list of the neighbours from which a completion message must be received. Upon reception of a completion message from a neighbour, a router removes this neighbour from its waiting list. Once its waiting list becomes empty, the router is allowed to update its FIB immediately even if its ranking timer has not yet expired. Once this is done, the router sends a completion message to the neighbours that are waiting for it to complete. Those routers are listed in a list called the Notification List. Completion messages contain an identification of the event to which they refer.
Note that, since this is only an optimization, any loss of completion messages will result in the routers waiting their defined ranking time and hence the loop-free properties will be preserved.
TOC |
TOC |
Consider a link or node down event rooted at router Y or the cost increase of the link X->Y. A router R will compute rSPT_OLD(Y) to determine its rank. When doing this, R also computes the set of neighbors that R uses to reach the failing node or link, and the set of neighbors that are using R to reach the failing node or link. The Notification list of R is equal to the former set and the Waiting list of R is equal to the latter.
Note that R could include all its neighbors except those in the Waiting list in the Notification list, this has no impact on the correctness of the protocol, but would be unnecessarily inefficient.
TOC |
Consider a link or node up event rooted at router Y or the cost decrease of the link Y->X. A router R will compute its new SPT (SPT_new(R)). The Waiting list is the set of nexthop routers that R uses to reach Y in SPT_new(R).
In a simple implementation the notification list of R is all the neighbours of R excluding those in the Waiting list. This may be further optimized by computing rSPT_new(Y) to determine those routers that are waiting for R to complete.
TOC |
The format of completion messages and means of their delivery is routing protocol dependent and is outside the scope of this document. An encoding of completion message for IS-IS is proposed in [2] (Bonaventure, O., “ISIS extensions for ordered FIB updates,” February 2006.).
The following information is required:
. Identity of the sender.
. A list of routing notifications being considered in the associated FIB change. Each notification is defined as :
. Node ID of the near end of the link
. Node ID of the far end of the link
. Old Metric
. New Metric
TOC |
In circumstances where a router detects that it is dealing with incomplete or inconsistent link state information, or when a further topology event is received before completion of the current ordered FIB update process it may be expedient to abandon the controlled convergence process. Fall back mechanisms are investigated in [9] (Shand, M., Bryant, S., and P. Francois, “Mechanisms for safely abandoning loop-free convergence (AAH),” October 2008.). The state machine defined in this version of the draft does not make an assumption on which fall back mechanism will be used.
TOC |
An ofib capable router maintains an ofib state value which can be one of : OFIB_STABLE, OFIB_HOLDING_DOWN, OFIB_HOLDING_UP, OFIB_ABANDONED, OFIB_ONGOING.
An ofib capable router maintains a timer, Hold_down_timer. An ofib capable router is configured with a value refered to as HOLD_DOWN_DURATION. This configuration can be performed manually or using [8] (K, A. and S. Bryant, “Synchronisation of Loop Free Timer Values,” February 2008.).
An ofib capable router maintains a timer, rank_timer.
TOC |
OFIB_STABLE is the state of a router which is not currently involved in any convergence process. This router is ready to process an event by applying ofib.
EVENT : Reception of a link-state packet describing an event of the type link X--Y down or metric increase to be processed using oFIB.
ACTION : Set state to OFIB_HOLDING_DOWN. Start Hold_down_timer. ofib_current_common_set = {X,Y}. Compute rank with respect to the event, as defined in Section 5 (Computation of the ordering). Store Waiting List and Notification List for X--Y obtained from the rank computation.
EVENT : Reception of a link-state packet describing an event of the type link X--Y up or metric decrease which to be processed using oFIB.
ACTION :
Set state to OFIB_HOLDING_UP.
Start Hold_down_timer.
ofib_current_common_set = {X,Y}
Compute rank with respect to the event, as defined in section Section 5 (Computation of the ordering) .
Store Waiting List and Notification List for X--Y obtained from the rank computation.
TOC |
OFIB_HOLDING_DOWN is the state of a router that is collecting a set of link down or metric increase link-state packets to be processed together using controlled convergence.
EVENT : Reception of a link-state packet describing an event of the type link up or metric decrease which in itself can be processed using oFIB.
ACTION :
Set state to OFIB_ABANDONED.
Reset Hold_down_timer.
Trigger AAH mechanism
EVENT : Reception of a link-state packet describing an event of the type link A--B down or metric increase which in itself can be processed using oFIB.
ACTION :
ofib_current_common_set = intersection(ofib_current_common_set,{A,B}).
If ofib_current_common_set is empty, then there is no longer a node in common in all the pending link-state changes.
Set state to OFIB_ABANDONED
Reset Hold_down_timer
Trigger AAH mechanism.
If ofib_current_common set is not empty, update waiting list and notification list as defined in Section 5 (Computation of the ordering). Note that in the case of a single link event, the link-state packet received when the router is in this state describes the state change of the other direction of the link, hence no changes will be made to the waiting and notification lists.
EVENT : Hold_down_timer expires.
ACTION :
Set state to OFIB_ONGOING.
Start rank_timer with computed rank.
EVENT : Reception of a completion message
ACTION : Remove the sender from waiting list associated with the event identified in the completion message.
TOC |
OFIB_HOLDING_UP is the state of a router that is collecting a set of link up or metric decrease link-state packets to be processed together using controlled convergence.
EVENT : Reception of a link-state packet describing an event of the type link down or metric increase to be processed using oFIB.
ACTION :
Set state to OFIB_ABANDONED.
Reset Hold_down_timer.
Trigger AAH mechanism.
EVENT : Reception of a link-state packet describing an event of the type link A--B up or metric decrease to be processed using oFIB.
ACTION :
ofib_current_common_set = intersection(ofib_current_common_set,{A,B}).
If ofib_current_common_set is empty, then there is no longer a common node in the set of pending link-state changes.
Set state to OFIB_ABANDONED.
Reset Hold_down_timer.
Trigger AAH mechanism.
If ofib_current_common set is not empty, update waiting list and notification list as defined in Section 5 (Computation of the ordering). Note that in the case of a single link event, the link-state packet received when the router is in this state describes the state change of the other direction of the link, hence no changes will be made to the waiting and notification lists.
EVENT : Reception of a completion message
ACTION : Remove the sender from the waiting list associated with the event identified in the completion message.
EVENT : Hold_down_timer expires.
ACTION :
Set state to OFIB_ONGOING.
Start rank_timer with computed rank.
TOC |
OFIB_ONGOING is the state of a router that is applying the ordering mechanism w.r.t. the set of LSP collected when in OFIB_HOLDING_DOWN or OFIB_HOLDING_UP state.
EVENT : rank_timer expires or waiting list becomes empty.
ACTION :
Perform FIB updates according to the change.
Send completion message to each member of the notification list.
Set State to OFIB_STABLE.
EVENT : Reception of a completion message
ACTION : Remove the sender from the waiting list.
EVENT : Reception of a link-state packet describing a link state change event.
ACTION :
Set state to OFIB_ABANDONED.
Trigger AAH.
Start Hold_down_timer.
TOC |
OFIB_ABANDONED is the state of a router that has fallen back to fast convergence due to the reception of link-state packets that cannot be dealt together using oFIB.
EVENT : Reception of a link-state packet describing a link-state change event.
ACTION : Trigger AAH, reset Hold_down_timer.
EVENT : Hold_down_timer expires.
ACTION : Set state to OFIB_STABLE
TOC |
There are no IANA considerations which arise from this document. Any such considerations will be called out in protocol specific documents such as [8] (K, A. and S. Bryant, “Synchronisation of Loop Free Timer Values,” February 2008.)and (Bonaventure, O., “ISIS extensions for ordered FIB updates,” February 2006.) [2]
TOC |
This draft requires only minor modifications to existing routing protocols and therefore does not add significant additional security risks. However a full security analysis would need to be provided within the protocol specific specifications proposed for deployment.
TOC |
We would like to thank Jean-Philippe Vasseur for his useful suggestions and comments.
TOC |
[1] | International Organization for Standardization, “Intermediate system to Intermediate system intra-domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service (ISO 8473),” ISO/IEC 10589:2002, Second Edition, Nov 2002. |
[2] | Bonaventure, O., “ISIS extensions for ordered FIB updates,” draft-bonaventure-isis-ordered-00 (work in progress), February 2006 (TXT). |
[3] | P. Francois and O. Bonaventure, “Avoiding transient loops during IGP convergence in IP Networks,” in IEEE/ACM Transactions on Networking, http://inl.info.ucl.ac.be/publications, December 2007. |
[4] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
[5] | Moy, J., “OSPF Version 2,” STD 54, RFC 2328, April 1998 (TXT, HTML, XML). |
[6] | Pan, P., Swallow, G., and A. Atlas, “Fast Reroute Extensions to RSVP-TE for LSP Tunnels,” RFC 4090, May 2005 (TXT). |
[7] | Shand, M. and S. Bryant, “IP Fast Reroute Framework,” RFC 5714, January 2010 (TXT). |
[8] | K, A. and S. Bryant, “Synchronisation of Loop Free Timer Values,” draft-atlas-bryant-shand-lf-timers-04 (work in progress), February 2008 (TXT). |
[9] | Shand, M., Bryant, S., and P. Francois, “Mechanisms for safely abandoning loop-free convergence (AAH),” draft-bryant-francois-shand-ipfrr-aah-01 (work in progress), October 2008 (TXT). |
TOC |
Pierre Francois | |
Universite catholique de Louvain | |
Place Ste Barbe, 2 | |
Louvain-la-Neuve 1348 | |
BE | |
URI: | http://inl.info.ucl.ac.be/ |
Olivier Bonaventure | |
Universite catholique de Louvain | |
Place Ste Barbe, 2 | |
Louvain-la-Neuve 1348 | |
BE | |
URI: | http://inl.info.ucl.ac.be/ |
Mike Shand | |
Cisco Systems | |
Green Park, 250, Longwater Avenue, | |
Reading RG2 6GB | |
UK | |
Email: | mshand@cisco.com |
Stewart Bryant | |
Cisco Systems | |
Green Park, 250, Longwater Avenue, | |
Reading RG2 6GB | |
UK | |
Email: | stbryant@cisco.com |
Stefano Previdi | |
Cisco Systems | |
Via Del Serafico 200 | |
00142 Roma | |
Italy | |
Email: | sprevidi@cisco.com |
Clarence Filsfils | |
Cisco Systems | |
Brussels, | |
Belgium | |
Email: | cfilsfil@cisco.com |