TOC 
Network Working GroupK. Ogawa
Internet-DraftNTT Corporation
Intended status: Standards TrackW. Wang
Expires: November 22, 2010Zhejiang Gongshang University
 E. Haleplidis
 University of Patras
 May 21, 2010


ForCES Intra-NE High Availability
draft-ogawa-forces-ceha-00

Abstract

This document discusses CE High Availability within a ForCES NE.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on November 22, 2010.

Copyright Notice

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.



Table of Contents

1.  Definitions
2.  Introduction
    2.1.  Document Scope
    2.2.  Quantifying Problem Scope
3.  CE HA Framework
    3.1.  Current CE High Availability Support
        3.1.1.  Cold Standby Interaction with ForCES Protocol
        3.1.2.  Responsibilities for HA
4.  CE HA Hot Standby
5.  CE Fr Interface Communication
    5.1.  Basic Scope for Fr Interface
        5.1.1.  Fr Interface Operational Approach
        5.1.2.  Fr Interface Liveliness Protocol
        5.1.3.  Fr Interface Data Synchronization
        5.1.4.  Fr Interface Election
6.  Contributors
7.  IANA Considerations
8.  Security Considerations
9.  References
    9.1.  Normative References
    9.2.  Informative References
§  Authors' Addresses




 TOC 

1.  Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

The following definitions are taken from [RFC3654] (Khosravi, H. and T. Anderson, “Requirements for Separation of IP Control and Forwarding,” November 2003.)and [RFC3746] (Yang, L., Dantu, R., Anderson, T., and R. Gopal, “Forwarding and Control Element Separation (ForCES) Framework,” April 2004.):

Logical Functional Block (LFB) -- A template that represents a fine-grained, logically separate aspects of FE processing.

ForCES Protocol -- The protocol used at the Fp reference point in the ForCES Framework in [RFC3746] (Yang, L., Dantu, R., Anderson, T., and R. Gopal, “Forwarding and Control Element Separation (ForCES) Framework,” April 2004.).

ForCES Protocol Layer (ForCES PL) -- A layer in the ForCES architecture that embodies the ForCES protocol and the state transfer mechanisms as defined in [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.).

ForCES Protocol Transport Mapping Layer (ForCES TML) -- A layer in ForCES protocol architecture that specifically addresses the protocol message transportation issues, such as how the protocol messages are mapped to different transport media (like SCTP, IP, TCP, UDP, ATM, Ethernet, etc), and how to achieve and implement reliability, security, etc.



 TOC 

2.  Introduction



Figure 1 (ForCES Architecture) illustrates a ForCES NE controlled by a set of redundant CEs with CE1 being active and CE2 and CEn-1 being a backup.

                        -----------------------------------------
                        | ForCES Network Element                |
                        |                        +-----------+  |
                        |                        |  CEn-1    |  |
                        |                        |  (Backup) |  |
  --------------   Fc   | +------------+      +------------+ |  |
  | CE Manager |--------+-|     CE1    |------|    CE2     |-+  |
  --------------        | |  (Active)  |  Fr  |  (Backup)  |    |
        |               | +-------+--+-+      +---+---+----+    |
        | Fl            |         |  |    Fp      /   |         |
        |               |         |  +---------+ /    |         |
        |               |       Fp|            |/     |Fp       |
        |               |         |            |      |         |
        |               |         |      Fp   /+--+   |         |
        |               |         |  +-------+    |   |         |
        |               |         |  |            |   |         |
  --------------    Ff  | --------+--+--      ----+---+----+    |
  | FE Manager |--------+-|     FE1    |  Fi  |     FE2    |    |
  --------------        | |            |------|            |    |
                        | --------------      --------------    |
                        |   |  |  |  |          |  |  |  |      |
                        ----+--+--+--+----------+--+--+--+-------
                            |  |  |  |          |  |  |  |
                            |  |  |  |          |  |  |  |
                              Fi/f                   Fi/f

       Fp: CE-FE interface
       Fi: FE-FE interface
       Fr: CE-CE interface
       Fc: Interface between the CE Manager and a CE
       Ff: Interface between the FE Manager and an FE
       Fl: Interface between the CE Manager and the FE Manager
       Fi/f: FE external interface
 Figure 1: ForCES Architecture 

The ForCES architecture allows FEs to be aware of multiple CEs but enforces that only one CE be the master controller. This is known in the industry as 1+N redundancy [refxxxx]. The master CE controls the FEs via the ForCES protocol operating in the Fp interface. If the master CE becomes faulty, a backup CE takes over and NE operation continues. By definition, the current documented setup is known as cold-standby [refxxxx]. The CE set is static and is passed to the FE by the FE Manager (FEM) via the Ff interface and to each CE by the CE Manager (CEM) in the Fc interface during the pre-association phase.

From an FE perspective, the knobs of control for a CE set are defined by the FEPO LFB in [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.), Appendix B. Section 3.1 (Current CE High Availability Support) details these knobs further.



 TOC 

2.1.  Document Scope

By current definition, the Fr interface is out of scope for the ForCES architecture. However, it is expected that organizations implementing a set of CEs may need to have the CEs communicate to each other via the Fr interface in order to achieve the synchronization necessary for controlling the FEs.

The problem scope addressed by this document falls into 3 areas:

  1. To describe with more clarity (than [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.)) how current cold-standby approach operates within the NE cluster.
  2. To describe how to evolve the cold-standby setup to a hot-standby redundancy setup so as to improve the failover time and NE availability.
  3. To describe a minimalist approach for Fr plane communication which interacting CEs MAY use for both cold and hot standby.



 TOC 

2.2.  Quantifying Problem Scope

The NE recovery and availability is dependent on several time-sensitive metrics:

  1. How fast the CE plane failure is detected.
  2. How fast a backup CE becomes operational.
  3. How fast the FEs associate with the new master CE.
  4. How fast the FEs recover their state and become operational.

The design goals of the current [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.) choices to meet the above goals are driven by desire for simplicity.

To quantify the above criteria with the current prescribed ForCES CE setup:

  1. How fast the CE side detects a CE failure is left undefined. To illustrate an extreme scenario, we could have a human operator acting as the monitoring entity to detect faulty CEs. How fast such detection happens could be in the range of seconds to days. A more active monitor on the Fr interface could improve this detection. In Section 5 (CE Fr Interface Communication) we define a behavior on Fr interface to detect CE failures in order to improve things.
  2. How fast the backup CE becomes operational is also currently out of scope. In the current setup, a backup CE need not be operational at all (for example, to save power) and therefore it is feasible for a monitoring entity to boot up a backup CE after it detects the failure of the master CE. In this document Section 4 (CE HA Hot Standby) we suggest that at least one backup CE be online so as to improve this metric.
  3. How fast an FE associates with new master CE is also currently undefined. The cost of an FE connecting and associating adds to the recovery overhead. As mentioned above we suggest having at least one backup CE online. In Section 4 (CE HA Hot Standby) we propose to zero out the connection and association cost on failover by having each FE associate with all online backup CEs after associating to the active CE. Note that if an FE pre-associates with backup CEs, then the system will be technically operating in hot-standby mode.
  4. And last: How fast an FE recovers its state depends on how much NE state exists. By ForCES current definition, the new master CE assumes zero state on the FE and starts from scratch to update the FE. So the larger the state, the longer the recovery. In Section 5 (CE Fr Interface Communication) we propose to improve this metric by having the master CE and backup CEs synchronizing in the Fr plane.



 TOC 

3.  CE HA Framework

To achieve CE High Availability, FEs and CEs MUST inter-operate per [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.) definition which is repeated for contextual reasons in Section 3.1 (Current CE High Availability Support). It should be noted that in this default setup, which MUST be implemented by CEs and FEs needing HA, the Fr plane is out of scope (and if available is proprietary to an implementation).



 TOC 

3.1.  Current CE High Availability Support

As mentioned earlier, there can be multiple redundant CEs controlling FEs in a ForCES NE (although in practice there may be only one backup CE). At any one time only one master CE can control the FEs. In addition, the FE connects and associates to only the master CE. The FE and the CE PL are aware of the primary and secondary CEs. This information (primary, secondary CEs) is configured on the FE and the CE PLs during pre-association by the FEM and the CEM respectively.

Figure 2 (CE Failover for Cold Standby) below illustrates the Forces message sequences that the FE uses to recover the connection in current defined cold-standby scheme.





      FE                   CE Primary        CE Secondary
      |                       |                    |
      |  Asso Estb,Caps exchg |                    |
    1 |<--------------------->|                    |
      |                       |                    |
      |       state update    |                    |
    2 |<--------------------->|                    |
      |                       |                    |
      |                       |                    |
      |                   FAILURE                  |
      |                                            |
      |         Asso Estb,Caps exchange            |
    3 |<------------------------------------------>|
      |                                            |
      |              Event Report (pri CE down)    |
    4 |------------------------------------------->|
      |                                            |
      |         state update from scratch          |
    5 |<------------------------------------------>|

 Figure 2: CE Failover for Cold Standby 



 TOC 

3.1.1.  Cold Standby Interaction with ForCES Protocol

High Availability parameterization in an FE is driven by configuring the FE Protocol Object (FEPO) LFB.

The FEPO CEID component identifies the current master CE and the component table BackupCEs identifies the backup CEs. The FEPO FE Heartbeat Interval, CE Heartbeat Dead Interval, and CE Heartbeat policy help in detecting connectivity problems between an FE and CE. The CE Failover policy defines how the FE should react on a detected failure.

Figure 3 (FE State Machine considering HA) illustrates the defined state machine that facilitates connection recovery.

The FE connects to the CE specified on FEPO CEID component. If it fails to connect to the defined CE, it moves it to the bottom of table BackupCEs and sets its CEID component to be the first CE retrieved from table BackupCEs. The FE then attempts to associate with the CE designated as the new primary CE. The FE continues through this procedure until it successfully connects to one of the CEs.




    (CE issues Teardown ||    +-----------------+
       Lost association) &&   | Pre-Association |
      CE failover policy = 0  | (Association    |
          +------------------>|   in            |<----+
          |                   | progress)       |     |
          |   CE Issues       +--------+--------+     |
          |   Association              |              | CFTI
          |   Setup Response = Success |              | timer
          |     +----------------------+              | expires
          |     |                                     |
          |     V                                     |
        +-+-----------+                          +----+--------+
        |             |                          |  Not        |
        |             |  (CE issues Teardown ||  |  Associated |
        |             |    Lost association) &&  |             |
        | Associated  |  CE Failover Policy = 1  | (May        |
        |             |                          | Continue    |
        |             +------------------------->|  Forwarding)|
        |             |                          |             |
        +-------------+                          +-----+-------+
             ^                                         |
             |                                         |
             |            CE Issues                    |
             |            Association                  |
             |            Setup Response = Success     |
             +-----------------------------------------+

 Figure 3: FE State Machine considering HA 

When communication fails between the FE and CE (which can be caused by either the CE or link failure but not FE related), either the TML on the FE will trigger the FE PL regarding this failure or it will be detected using the HB messages between FEs and CEs. The communication failure, regardless of how it is detected, MUST be considered as a loss of association between the CE and corresponding FE.

If the FE's FEPO CE Failover Policy is configured to mode 0 (the default), it will immediately transition to the pre-association phase. This means that if association is again established, all FE state will need to be re-established.

If the FE's FEPO CE Failover Policy is configured to mode 1, it indicates that the FE is capable of HA restart recovery. In such a case, the FE transitions to the not associated state and the CEFTI timer is started. The FE MAY continue to forward packets during this state. It MAY also recycle through any configured backup CEs in a round-robin fashion. It first adds its primary CE to the bottom of table BackupCEs and sets its CEID component to be the first secondary retrieved from table BackupCEs. The FE then attempts to associate with the CE designated as the new primary CE. If it fails to re-associate with any CE and the CEFTI expires, the FE then transitions to the pre-association state.

If the FE, while in the not associated state, manages to reconnect to a new primary CE before CEFTI expires it transitions to the Associated state. Once re-associated, the FE tries to recover any state that may have been lost during the not associated state. How the FE achieves to re-synchronize its state is out of scope for the current ForCES architecture.

An explicit message (a Config message setting Primary CE component in ForCES Protocol object) from the primary CE, can also be used to change the Primary CE for an FE during normal protocol operation.

Also note that the FEs in a ForCES NE could also use a multicast CE ID, i.e., they could be associated with a group of CEs (this assumes the use of a CE-CE synchronization protocol, which is out of scope for this specification). In this case, the loss of association would mean that communication with the entire multicast group of CEs has been lost. The mechanisms described above will apply for this case as well during the loss of association. If, however, the secondary CE was also using the multicast CE ID that was lost, then the FE will need to form a new association using a different CE ID. If the capability exists, the FE MAY first attempt to form a new association with original primary CE using a different non multicast CE ID.



 TOC 

3.1.2.  Responsibilities for HA

XXX: we may remove this section (not much value to overall discussion)

TML Level:

  1. The TML controls logical connection availability and failover.
  2. The TML also controls peer HA management.

At this level, control of all lower layers, for example transport level (such as IP addresses, MAC addresses etc) and associated links going down are the role of the TML.

PL Level:
All other functionality, including configuring the HA behavior during setup, the CE IDs used to identify primary and secondary CEs, protocol messages used to report CE failure (Event Report), Heartbeat messages used to detect association failure, messages to change the primary CE (Config), and other HA related operations described before, are the PL responsibility.

To put the two together, if a path to a primary CE is down, the TML would take care of failing over to a backup path, if one is available. If the CE is totally unreachable then the PL would be informed and it would take the appropriate actions described before.



 TOC 

4.  CE HA Hot Standby

In this section we make some small extensions to the existing scheme to enable it to achieve hot standby HA. With these suggested changes we achieve some of the goals defined in Section 2.2 (Quantifying Problem Scope), namely:

As described in Section 3.1 (Current CE High Availability Support), the FEM configures the FE to make it aware of all the CEs in the NE. The FEM also configures the FE to make it aware of which CE is the master and which are backup(s). The FE's FEPO LFB CEID component identifies the current master CE and table BackupCEs identifies the backup CEs. The FE only connects to the master CE and then proceeds to associate with it. The master thereafter controls the FE and receives events from it. This continues until there is communication failure between the FE and CE at which point the FE attempts to connect to a CE from the BackupCEs table until it succeeds to connect and associate with one listed CE.

It is recommended that at least one backup CE should be online. Doing so will improve how fast the backup CE will take to be operational (as opposed to bringing up a backup CE when we detect a master CE fault). If we assume that a CE implementation does state synchronization between CEs (proprietary or as discussed in Section 5 (CE Fr Interface Communication)), then we can zero out the cost of making the backup CE operational and ready to serve FEs; in such a case an associating FE could immediately become operational.

If we assume the presence of at least one backup CE online, we can improve how fast the FEs associate with a new master CE by making two changes:

The first change that needs to be made is to have the FE, soon after successfully connecting and associating with the master CE, to proceed and connect as well as associate with the rest of the CEs listed in the BackupCEs table.

By virtue of having multiple CE connections, the FE switchover to a new master CE will be relatively much faster. The overall effect is improving the NE recovery time in case of communication failure or faults of the master CE.

The second change is to have the FE respond to messages issued by any CE (including a backup CE) it is associated with. This keeps the FE simple and as dumb as it is in the current definition.

Again for the sake of simplicity, asynchronous events and packet redirects continue to be sent only to the master CE. XXXX: We need to rethink perhaps and discuss possibility of events being sent to ALLCEIDs CEID (which the TML can translate to mean send-to-all-online-CES).

XXXX: We need to have an extra state for each CE (master, connected, associated, stats etc) on the FEPO - so probably another change to current FEPO components.

XXXX: What about FEs each assuming a different master CE - is that a problem? It doesnt seem to be because what matters is how the CEs agree between themselves who the master is. The FE responds to all CEs.

XXXX: What other kind of traffic needs to be running between FE and backup CEs? Heartbeats?



 TOC 

5.  CE Fr Interface Communication

In this section, we define activities in the Fr interface in order to achieve the other two goals defined Section 2.2 (Quantifying Problem Scope)



 TOC 

5.1.  Basic Scope for Fr Interface

In the Fr plane we expect to see liveliness detection and configuration.

In the case of a fault of a master CE being detected by liveliness, we expect there is going to be an election to choose a new master CE.

It is also expected that the master CE will be updating the backup CEs via configuration on any necessary NE state changes.

Our goal is to keep the Fr interface simple. For this reason, our scope is not very ambitious and tries as much as possible to maintain current ForCES architecture:

In this section, we start by assuming the ForCES architecture (protocol and model) and then extend it when necessary.



 TOC 

5.1.1.  Fr Interface Operational Approach

Each CE on bootup knows the NE CE set as configured by the CEM. This static approach greatly simplifies discovery. It is expected in most operational setups, there will be one active and one backup CE.

Each backup CE does a ForCES association to the listed master CE.

The master CE updates backup CEs with configuration necessary to mantain ForCES related NE state.



 TOC 

5.1.2.  Fr Interface Liveliness Protocol

The ForCES protocol already has built-in heartbeats for liveliness detection. If we define a CEPO LFB, in the same spirit as the FEPO LFB, it should be sufficient to have ForCES act as the liveliness protocol in the Fr plane.

XXX: We need to be very clear on what is needed and reused from ForCES protocol. XXX: What details does the CEPO carry? Example that seems to make sense: What CE type (eg master/slave), Status (connected etc), Connectivity parameters, Dead intervals etc



 TOC 

5.1.3.  Fr Interface Data Synchronization

Most existing NE implementations in the industry run some hot standby proprietary scheme. They synchronize many things using such a scheme. Example they keep protocol state of things like OSPF, BGP, IKE etc. We dont want to do that.

We focus on a scope that specifies only the need to migrate state and maybe configuration that is maintained by the CE on behalf of CE-FE plane. Not anything else. To be specific: A master CE synchronizes to backup CEs any state updates that happen on the CE-FE plane that it controls.

One challenge that will require an extension to the ForCES protocol is on how to communicate (from the master CE to a backup CE) the details about an LFB component state change that happened in a specific FE.

We propose to introduce a new protocol TLV at the same hierarchy level as LFB selector. Operationally, this TLV will define that a set of state changes that happened apply to a specific FE. For this reason it will encompass the FEID on which the update happened on. We call it the applies-to TLV.

Lets say an update has happened (or depending on update scheme needs to happen) on FE z, LFB-a/instance-b/path-c from the controlling CE x, then the synchronization method to backup CE y will be in the form of a config message from master CE x to backup CE y that will have a message source CEID of x and destination CEID of y. The applies-to TLV will contain FEID z. The rest of the message will be exactly as if the CE x had sent a config message to FE z and will contain the path LFB-a/instance-b/path-c

XXX: Refer to IETF 77 presentation slide 11 for choices on how to do CECE synchronization in conjunction with FEs. The consensus seems to lean on the second scheme..



 TOC 

5.1.4.  Fr Interface Election

Upon failure detection of the master CE, a very simple election occurs. The CE with the lowest CEID wins. Operationally, all CEs associate to the next lowest CEID. This is easy to execute since the static CE list never changes.

XXX: Optimize - the master CE could keep tabs on which backup CEs are alive and update the associated CEs CEPO table with status info so this way if the next lowest CE is not alive, theres no point in connecting to it when the master fails...



 TOC 

6.  Contributors

Jamal Hadi Salim has contributed to discussions that created this document.



 TOC 

7.  IANA Considerations

TBA



 TOC 

8.  Security Considerations

TBA



 TOC 

9.  References



 TOC 

9.1. Normative References

[RFC5810] Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” RFC 5810, March 2010 (TXT).


 TOC 

9.2. Informative References

[RFC3654] Khosravi, H. and T. Anderson, “Requirements for Separation of IP Control and Forwarding,” RFC 3654, November 2003 (TXT).
[RFC3746] Yang, L., Dantu, R., Anderson, T., and R. Gopal, “Forwarding and Control Element Separation (ForCES) Framework,” RFC 3746, April 2004 (TXT).
[RFC5812] Halpern, J. and J. Hadi Salim, “Forwarding and Control Element Separation (ForCES) Forwarding Element Model,” RFC 5812, March 2010 (TXT).


 TOC 

Authors' Addresses

  Kentaro Ogawa
  NTT Corporation
  3-9-11 Midori-cho
  Musashino-shi, Tokyo 180-8585
  Japan
Email:  ogawa.kentaro@lab.ntt.co.jp
  
  Weiming Wang
  Zhejiang Gongshang University
  18, Xuezheng Str., Xiasha University Town
  Hangzhou 310018
  P.R.China
Email:  wmwang@mail.zjgsu.edu.cn
  
  Evangelos Haleplidis
  University of Patras
  Patras
  Greece
Email:  ehalep@ece.upatras.gr