TOC |
|
This document discusses CE High Availability within a ForCES NE.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
This Internet-Draft will expire on November 22, 2010.
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1.
Definitions
2.
Introduction
2.1.
Document Scope
2.2.
Quantifying Problem Scope
3.
CE HA Framework
3.1.
Current CE High Availability Support
3.1.1.
Cold Standby Interaction with ForCES Protocol
3.1.2.
Responsibilities for HA
4.
CE HA Hot Standby
5.
CE Fr Interface Communication
5.1.
Basic Scope for Fr Interface
5.1.1.
Fr Interface Operational Approach
5.1.2.
Fr Interface Liveliness Protocol
5.1.3.
Fr Interface Data Synchronization
5.1.4.
Fr Interface Election
6.
Contributors
7.
IANA Considerations
8.
Security Considerations
9.
References
9.1.
Normative References
9.2.
Informative References
§
Authors' Addresses
TOC |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
The following definitions are taken from [RFC3654] (Khosravi, H. and T. Anderson, “Requirements for Separation of IP Control and Forwarding,” November 2003.)and [RFC3746] (Yang, L., Dantu, R., Anderson, T., and R. Gopal, “Forwarding and Control Element Separation (ForCES) Framework,” April 2004.):
Logical Functional Block (LFB) -- A template that represents a fine-grained, logically separate aspects of FE processing.
ForCES Protocol -- The protocol used at the Fp reference point in the ForCES Framework in [RFC3746] (Yang, L., Dantu, R., Anderson, T., and R. Gopal, “Forwarding and Control Element Separation (ForCES) Framework,” April 2004.).
ForCES Protocol Layer (ForCES PL) -- A layer in the ForCES architecture that embodies the ForCES protocol and the state transfer mechanisms as defined in [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.).
ForCES Protocol Transport Mapping Layer (ForCES TML) -- A layer in ForCES protocol architecture that specifically addresses the protocol message transportation issues, such as how the protocol messages are mapped to different transport media (like SCTP, IP, TCP, UDP, ATM, Ethernet, etc), and how to achieve and implement reliability, security, etc.
TOC |
Figure 1 (ForCES Architecture) illustrates a ForCES NE controlled by a set of redundant CEs with CE1 being active and CE2 and CEn-1 being a backup.
----------------------------------------- | ForCES Network Element | | +-----------+ | | | CEn-1 | | | | (Backup) | | -------------- Fc | +------------+ +------------+ | | | CE Manager |--------+-| CE1 |------| CE2 |-+ | -------------- | | (Active) | Fr | (Backup) | | | | +-------+--+-+ +---+---+----+ | | Fl | | | Fp / | | | | | +---------+ / | | | | Fp| |/ |Fp | | | | | | | | | | Fp /+--+ | | | | | +-------+ | | | | | | | | | | -------------- Ff | --------+--+-- ----+---+----+ | | FE Manager |--------+-| FE1 | Fi | FE2 | | -------------- | | |------| | | | -------------- -------------- | | | | | | | | | | | ----+--+--+--+----------+--+--+--+------- | | | | | | | | | | | | | | | | Fi/f Fi/f Fp: CE-FE interface Fi: FE-FE interface Fr: CE-CE interface Fc: Interface between the CE Manager and a CE Ff: Interface between the FE Manager and an FE Fl: Interface between the CE Manager and the FE Manager Fi/f: FE external interface
Figure 1: ForCES Architecture |
The ForCES architecture allows FEs to be aware of multiple CEs but enforces that only one CE be the master controller. This is known in the industry as 1+N redundancy [refxxxx]. The master CE controls the FEs via the ForCES protocol operating in the Fp interface. If the master CE becomes faulty, a backup CE takes over and NE operation continues. By definition, the current documented setup is known as cold-standby [refxxxx]. The CE set is static and is passed to the FE by the FE Manager (FEM) via the Ff interface and to each CE by the CE Manager (CEM) in the Fc interface during the pre-association phase.
From an FE perspective, the knobs of control for a CE set are defined by the FEPO LFB in [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.), Appendix B. Section 3.1 (Current CE High Availability Support) details these knobs further.
TOC |
By current definition, the Fr interface is out of scope for the ForCES architecture. However, it is expected that organizations implementing a set of CEs may need to have the CEs communicate to each other via the Fr interface in order to achieve the synchronization necessary for controlling the FEs.
The problem scope addressed by this document falls into 3 areas:
TOC |
The NE recovery and availability is dependent on several time-sensitive metrics:
The design goals of the current [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.) choices to meet the above goals are driven by desire for simplicity.
To quantify the above criteria with the current prescribed ForCES CE setup:
TOC |
To achieve CE High Availability, FEs and CEs MUST inter-operate per [RFC5810] (Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” March 2010.) definition which is repeated for contextual reasons in Section 3.1 (Current CE High Availability Support). It should be noted that in this default setup, which MUST be implemented by CEs and FEs needing HA, the Fr plane is out of scope (and if available is proprietary to an implementation).
TOC |
As mentioned earlier, there can be multiple redundant CEs controlling FEs in a ForCES NE (although in practice there may be only one backup CE). At any one time only one master CE can control the FEs. In addition, the FE connects and associates to only the master CE. The FE and the CE PL are aware of the primary and secondary CEs. This information (primary, secondary CEs) is configured on the FE and the CE PLs during pre-association by the FEM and the CEM respectively.
Figure 2 (CE Failover for Cold Standby) below illustrates the Forces message sequences that the FE uses to recover the connection in current defined cold-standby scheme.
FE CE Primary CE Secondary | | | | Asso Estb,Caps exchg | | 1 |<--------------------->| | | | | | state update | | 2 |<--------------------->| | | | | | | | | FAILURE | | | | Asso Estb,Caps exchange | 3 |<------------------------------------------>| | | | Event Report (pri CE down) | 4 |------------------------------------------->| | | | state update from scratch | 5 |<------------------------------------------>|
Figure 2: CE Failover for Cold Standby |
TOC |
High Availability parameterization in an FE is driven by configuring the FE Protocol Object (FEPO) LFB.
The FEPO CEID component identifies the current master CE and the component table BackupCEs identifies the backup CEs. The FEPO FE Heartbeat Interval, CE Heartbeat Dead Interval, and CE Heartbeat policy help in detecting connectivity problems between an FE and CE. The CE Failover policy defines how the FE should react on a detected failure.
Figure 3 (FE State Machine considering HA) illustrates the defined state machine that facilitates connection recovery.
The FE connects to the CE specified on FEPO CEID component. If it fails to connect to the defined CE, it moves it to the bottom of table BackupCEs and sets its CEID component to be the first CE retrieved from table BackupCEs. The FE then attempts to associate with the CE designated as the new primary CE. The FE continues through this procedure until it successfully connects to one of the CEs.
(CE issues Teardown || +-----------------+ Lost association) && | Pre-Association | CE failover policy = 0 | (Association | +------------------>| in |<----+ | | progress) | | | CE Issues +--------+--------+ | | Association | | CFTI | Setup Response = Success | | timer | +----------------------+ | expires | | | | V | +-+-----------+ +----+--------+ | | | Not | | | (CE issues Teardown || | Associated | | | Lost association) && | | | Associated | CE Failover Policy = 1 | (May | | | | Continue | | +------------------------->| Forwarding)| | | | | +-------------+ +-----+-------+ ^ | | | | CE Issues | | Association | | Setup Response = Success | +-----------------------------------------+
Figure 3: FE State Machine considering HA |
When communication fails between the FE and CE (which can be caused by either the CE or link failure but not FE related), either the TML on the FE will trigger the FE PL regarding this failure or it will be detected using the HB messages between FEs and CEs. The communication failure, regardless of how it is detected, MUST be considered as a loss of association between the CE and corresponding FE.
If the FE's FEPO CE Failover Policy is configured to mode 0 (the default), it will immediately transition to the pre-association phase. This means that if association is again established, all FE state will need to be re-established.
If the FE's FEPO CE Failover Policy is configured to mode 1, it indicates that the FE is capable of HA restart recovery. In such a case, the FE transitions to the not associated state and the CEFTI timer is started. The FE MAY continue to forward packets during this state. It MAY also recycle through any configured backup CEs in a round-robin fashion. It first adds its primary CE to the bottom of table BackupCEs and sets its CEID component to be the first secondary retrieved from table BackupCEs. The FE then attempts to associate with the CE designated as the new primary CE. If it fails to re-associate with any CE and the CEFTI expires, the FE then transitions to the pre-association state.
If the FE, while in the not associated state, manages to reconnect to a new primary CE before CEFTI expires it transitions to the Associated state. Once re-associated, the FE tries to recover any state that may have been lost during the not associated state. How the FE achieves to re-synchronize its state is out of scope for the current ForCES architecture.
An explicit message (a Config message setting Primary CE component in ForCES Protocol object) from the primary CE, can also be used to change the Primary CE for an FE during normal protocol operation.
Also note that the FEs in a ForCES NE could also use a multicast CE ID, i.e., they could be associated with a group of CEs (this assumes the use of a CE-CE synchronization protocol, which is out of scope for this specification). In this case, the loss of association would mean that communication with the entire multicast group of CEs has been lost. The mechanisms described above will apply for this case as well during the loss of association. If, however, the secondary CE was also using the multicast CE ID that was lost, then the FE will need to form a new association using a different CE ID. If the capability exists, the FE MAY first attempt to form a new association with original primary CE using a different non multicast CE ID.
TOC |
XXX: we may remove this section (not much value to overall discussion)
TML Level:
At this level, control of all lower layers, for example transport level (such as IP addresses, MAC addresses etc) and associated links going down are the role of the TML.
PL Level:
All other functionality, including
configuring the HA behavior during setup, the CE IDs used to
identify primary and secondary CEs, protocol messages used to report CE
failure (Event Report), Heartbeat messages used to detect association
failure, messages to change the primary CE (Config), and other HA
related operations described before, are the PL responsibility.
To put the two together, if a path to a primary CE is down, the TML would take care of failing over to a backup path, if one is available. If the CE is totally unreachable then the PL would be informed and it would take the appropriate actions described before.
TOC |
In this section we make some small extensions to the existing scheme to enable it to achieve hot standby HA. With these suggested changes we achieve some of the goals defined in Section 2.2 (Quantifying Problem Scope), namely:
As described in Section 3.1 (Current CE High Availability Support), the FEM configures the FE to make it aware of all the CEs in the NE. The FEM also configures the FE to make it aware of which CE is the master and which are backup(s). The FE's FEPO LFB CEID component identifies the current master CE and table BackupCEs identifies the backup CEs. The FE only connects to the master CE and then proceeds to associate with it. The master thereafter controls the FE and receives events from it. This continues until there is communication failure between the FE and CE at which point the FE attempts to connect to a CE from the BackupCEs table until it succeeds to connect and associate with one listed CE.
It is recommended that at least one backup CE should be online. Doing so will improve how fast the backup CE will take to be operational (as opposed to bringing up a backup CE when we detect a master CE fault). If we assume that a CE implementation does state synchronization between CEs (proprietary or as discussed in Section 5 (CE Fr Interface Communication)), then we can zero out the cost of making the backup CE operational and ready to serve FEs; in such a case an associating FE could immediately become operational.
If we assume the presence of at least one backup CE online, we can improve how fast the FEs associate with a new master CE by making two changes:
The first change that needs to be made is to have the FE, soon after successfully connecting and associating with the master CE, to proceed and connect as well as associate with the rest of the CEs listed in the BackupCEs table.
By virtue of having multiple CE connections, the FE switchover to a new master CE will be relatively much faster. The overall effect is improving the NE recovery time in case of communication failure or faults of the master CE.
The second change is to have the FE respond to messages issued by any CE (including a backup CE) it is associated with. This keeps the FE simple and as dumb as it is in the current definition.
Again for the sake of simplicity, asynchronous events and packet redirects continue to be sent only to the master CE. XXXX: We need to rethink perhaps and discuss possibility of events being sent to ALLCEIDs CEID (which the TML can translate to mean send-to-all-online-CES).
XXXX: We need to have an extra state for each CE (master, connected, associated, stats etc) on the FEPO - so probably another change to current FEPO components.
XXXX: What about FEs each assuming a different master CE - is that a problem? It doesnt seem to be because what matters is how the CEs agree between themselves who the master is. The FE responds to all CEs.
XXXX: What other kind of traffic needs to be running between FE and backup CEs? Heartbeats?
TOC |
In this section, we define activities in the Fr interface in order to achieve the other two goals defined Section 2.2 (Quantifying Problem Scope)
TOC |
In the Fr plane we expect to see liveliness detection and configuration.
In the case of a fault of a master CE being detected by liveliness, we expect there is going to be an election to choose a new master CE.
It is also expected that the master CE will be updating the backup CEs via configuration on any necessary NE state changes.
Our goal is to keep the Fr interface simple. For this reason, our scope is not very ambitious and tries as much as possible to maintain current ForCES architecture:
In this section, we start by assuming the ForCES architecture (protocol and model) and then extend it when necessary.
TOC |
Each CE on bootup knows the NE CE set as configured by the CEM. This static approach greatly simplifies discovery. It is expected in most operational setups, there will be one active and one backup CE.
Each backup CE does a ForCES association to the listed master CE.
The master CE updates backup CEs with configuration necessary to mantain ForCES related NE state.
TOC |
The ForCES protocol already has built-in heartbeats for liveliness detection. If we define a CEPO LFB, in the same spirit as the FEPO LFB, it should be sufficient to have ForCES act as the liveliness protocol in the Fr plane.
XXX: We need to be very clear on what is needed and reused from ForCES protocol. XXX: What details does the CEPO carry? Example that seems to make sense: What CE type (eg master/slave), Status (connected etc), Connectivity parameters, Dead intervals etc
TOC |
Most existing NE implementations in the industry run some hot standby proprietary scheme. They synchronize many things using such a scheme. Example they keep protocol state of things like OSPF, BGP, IKE etc. We dont want to do that.
We focus on a scope that specifies only the need to migrate state and maybe configuration that is maintained by the CE on behalf of CE-FE plane. Not anything else. To be specific: A master CE synchronizes to backup CEs any state updates that happen on the CE-FE plane that it controls.
One challenge that will require an extension to the ForCES protocol is on how to communicate (from the master CE to a backup CE) the details about an LFB component state change that happened in a specific FE.
We propose to introduce a new protocol TLV at the same hierarchy level as LFB selector. Operationally, this TLV will define that a set of state changes that happened apply to a specific FE. For this reason it will encompass the FEID on which the update happened on. We call it the applies-to TLV.
Lets say an update has happened (or depending on update scheme needs to happen) on FE z, LFB-a/instance-b/path-c from the controlling CE x, then the synchronization method to backup CE y will be in the form of a config message from master CE x to backup CE y that will have a message source CEID of x and destination CEID of y. The applies-to TLV will contain FEID z. The rest of the message will be exactly as if the CE x had sent a config message to FE z and will contain the path LFB-a/instance-b/path-c
XXX: Refer to IETF 77 presentation slide 11 for choices on how to do CECE synchronization in conjunction with FEs. The consensus seems to lean on the second scheme..
TOC |
Upon failure detection of the master CE, a very simple election occurs. The CE with the lowest CEID wins. Operationally, all CEs associate to the next lowest CEID. This is easy to execute since the static CE list never changes.
XXX: Optimize - the master CE could keep tabs on which backup CEs are alive and update the associated CEs CEPO table with status info so this way if the next lowest CE is not alive, theres no point in connecting to it when the master fails...
TOC |
Jamal Hadi Salim has contributed to discussions that created this document.
TOC |
TBA
TOC |
TBA
TOC |
TOC |
[RFC5810] | Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, W., Dong, L., Gopal, R., and J. Halpern, “Forwarding and Control Element Separation (ForCES) Protocol Specification,” RFC 5810, March 2010 (TXT). |
TOC |
[RFC3654] | Khosravi, H. and T. Anderson, “Requirements for Separation of IP Control and Forwarding,” RFC 3654, November 2003 (TXT). |
[RFC3746] | Yang, L., Dantu, R., Anderson, T., and R. Gopal, “Forwarding and Control Element Separation (ForCES) Framework,” RFC 3746, April 2004 (TXT). |
[RFC5812] | Halpern, J. and J. Hadi Salim, “Forwarding and Control Element Separation (ForCES) Forwarding Element Model,” RFC 5812, March 2010 (TXT). |
TOC |
Kentaro Ogawa | |
NTT Corporation | |
3-9-11 Midori-cho | |
Musashino-shi, Tokyo 180-8585 | |
Japan | |
Email: | ogawa.kentaro@lab.ntt.co.jp |
Weiming Wang | |
Zhejiang Gongshang University | |
18, Xuezheng Str., Xiasha University Town | |
Hangzhou 310018 | |
P.R.China | |
Email: | wmwang@mail.zjgsu.edu.cn |
Evangelos Haleplidis | |
University of Patras | |
Patras | |
Greece | |
Email: | ehalep@ece.upatras.gr |