In a resiliency case, a PCC has redundant PCEP sessions towards
multiple PCEs. In such a case, a PCC gives control on an LSP to a
single PCE only, and only this PCE is responsible for the path
computation for the delegated LSP: the PCC achieves this by setting
the D flag only towards the active PCE [RFC8231] selected for delegation. The election of the active
PCE to delegate an LSP is controlled by each PCC. The PCC usually
elects the active PCE by a local configured policy (by setting a
priority). Upon PCEP session failure, or active PCE failure, PCC may
decide to elect a new active PCE by sending new PCRpt message with D
flag set to this new active PCE. When the failed PCE or PCEP session
comes back online, it will be up to the implementation to do
preemption. Doing preemption may lead to some
disruption on the existing path if path results from both PCEs are not
exactly the same. By considering a network with multiple PCCs and
implementing multiple stateful PCEs for redundancy purpose, there is
no guarantee that at any time all the PCCs delegate their LSPs to the
same PCE.¶
+----------+
| PCC1 | LSP1
+----------+
/ \
/ \
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
\ /
*fail* \ /
+----------+
| PCC2 | LSP2
+----------+
¶
In the example above, we consider that by configuration, both PCCs
will firstly delegate their LSPs to PCE1. So, PCE1 is responsible for
computing a path for both LSP1 and LSP2. If the PCEP session between
PCC2 and PCE1 fails, PCC2 will delegate LSP2 to PCE2. So PCE1 becomes
responsible only for LSP1 path computation while PCE2 is responsible
for the path computation of LSP2. When the PCC2-PCE1 session is back
online, PCC2 will keep using PCE2 as active PCE (consider no
preemption in this example). So the result is a permanent situation
where each PCE is responsible for a subset of path computation.¶
This situation is called a split-brain scenario, as there are
multiple computation brains running at the same time while a central
computation unit was required in some deployments/use cases.¶
Further, there are use cases where a particular LSP path
computation is linked to another LSP path computation: the most common
use case is path disjointness (see [RFC8800]). The set of LSPs that are dependent to each other
may start from a different head-end.¶
_________________________________________
/ \
/ +------+ +------+ \
| | PCE1 | | PCE2 | |
| +------+ +------+ |
| |
| +------+ +------+ |
| | PCC1 | ----------------------> | PCC2 | |
| +------+ +------+ |
| |
| |
| +------+ +------+ |
| | PCC3 | ----------------------> | PCC4 | |
| +------+ +------+ |
| |
\ /
\_________________________________________/
_________________________________________
/ \
/ +------+ +------+ \
| | PCE1 | | PCE2 | |
| +------+ +------+ |
| |
| +------+ 10 +------+ |
| | PCC1 | ----- R1 ---- R2 ------- | PCC2 | |
| +------+ | | +------+ |
| | | |
| | | |
| +------+ | | +------+ |
| | PCC3 | ----- R3 ---- R4 ------- | PCC4 | |
| +------+ +------+ |
| |
\ /
\_________________________________________/
¶
In the figure above, the requirement is to create two link-disjoint
LSPs: PCC1->PCC2 and PCC3->PCC4. In the topology, all links cost
metric is set to 1 except for the link 'R1-R2' which has a metric of
10. The PCEs are responsible for the path computation and PCE1 is the
active primary PCE for all PCCs in the nominal case.¶
Scenario 1:¶
In the normal case (PCE1 as active primary PCE), consider that
PCC1->PCC2 LSP is configured first with the link disjointness
constraint, PCE1 sends a PCUpd message to PCC1 with the ERO:
R1->R3->R4->R2->PCC2 (shortest path). PCC1 signals and
installs the path. When PCC3->PCC4 is configured, the PCEs already
knows the path of PCC1->PCC2 and can compute a link-disjoint path:
the solution requires to move PCC1->PCC2 onto a new path to let
room for the new LSP. PCE1 sends a PCUpd message to PCC1 with the new
ERO: R1->R2->PCC2 and a PCUpd to PCC3 with the following ERO:
R3->R4->PCC4. In the normal case, there is no issue for PCE1 to
compute a link-disjoint path.¶
Scenario 2:¶
Consider that PCC1 lost its PCEP session with PCE1 (all other PCEP
sessions are UP). PCC1 delegates its LSP to PCE2.¶
+----------+
| PCC1 | LSP: PCC1->PCC2
+----------+
\
\ D=1
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
D=1 \ / D=0
\ /
+----------+
| PCC3 | LSP: PCC3->PCC4
+----------+
¶
Consider that the PCC1->PCC2 LSP is configured first with the
link disjointness constraint, PCE2 (which is the new active primary
PCE for PCC1) sends a PCUpd message to PCC1 with the ERO:
R1->R3->R4->R2->PCC2 (shortest path). When PCC3->PCC4
is configured, PCE1 is not aware of LSPs from PCC1 any more, so it
cannot compute a disjoint path for PCC3->PCC4 and will send a PCUpd
message to PCC3 with the shortest path ERO: R3->R4->PCC4. When
PCC3->PCC4 LSP will be reported to PCE2 by PCC3, PCE2 will ensure
disjointness computation and will correctly move PCC1->PCC2 (as it
owns delegation for this LSP) on the following path:
R1->R2->PCC2. With this sequence of event and these PCEP
sessions, disjointness is ensured.¶
Scenario 3:¶
+----------+
| PCC1 | LSP: PCC1->PCC2
+----------+
/ \
D=1 / \ D=0
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
/ D=1
/
+----------+
| PCC3 | LSP: PCC3->PCC4
+----------+
¶
Consider the above PCEP sessions and the PCC1->PCC2 LSP is
configured first with the link disjointness constraint, PCE1 computes
the shortest path as it is the only LSP in the disjoint association
group that it is aware of: R1->R3->R4->R2->PCC2 (shortest
path). When PCC3->PCC4 is configured, PCE2 must compute a disjoint
path for this LSP. The only solution found is to move PCC1->PCC2
LSP on another path, but PCE2 cannot do it as it does not have
delegation for this LSP. In this set-up, PCEs are not able to find a
disjoint path.¶
Scenario 4:¶
+----------+
| PCC1 | LSP: PCC1->PCC2
+----------+
/ \
D=1 / \ D=0
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
D=0 \ / D=1
\ /
+----------+
| PCC3 | LSP: PCC3->PCC4
+----------+
¶
Consider the above PCEP sessions and that PCEs are configured to
fall-back to the shortest path if disjointness cannot be found as
described in [RFC8800]. The
PCC1->PCC2 LSP is configured first, PCE1 computes the shortest path
as it is the only LSP in the disjoint association group that it is
aware of: R1->R3->R4->R2->PCC2 (shortest path). When
PCC3->PCC4 is configured, PCE2 must compute a disjoint path for
this LSP. The only solution found is to move PCC1->PCC2 LSP on
another path, but PCE2 cannot do it as it does not have delegation for
this LSP. PCE2 then provides the shortest path for PCC3->PCC4:
R3->R4->PCC4. When PCC3 receives the ERO, it reports it back to
both PCEs. When PCE1 becomes aware of the PCC3->PCC4 path, it
recomputes the constrained shortest path first (CSPF) algorithm and
provides a new path for PCC1->PCC2: R1->R2->PCC2. The new
path is reported back to all PCEs by PCC1. PCE2 recomputes also CSPF
to take into account the new reported path. The new computation does
not lead to any path update.¶
Scenario 5:¶
_____________________________________
/ \
/ +------+ +------+ \
| | PCE1 | | PCE2 | |
| +------+ +------+ |
| |
| +------+ 100 +------+ |
| | | -------------------- | | |
| | PCC1 | ----- R1 ----------- | PCC2 | |
| +------+ | +------+ |
| | | | |
| 6 | | 2 | 2 |
| | | | |
| +------+ | +------+ |
| | PCC3 | ----- R3 ----------- | PCC4 | |
| +------+ 10 +------+ |
| |
\ /
\_____________________________________/
¶
Now, consider a new network topology with the same PCEP sessions as
the previous example. Suppose that both LSPs are configured almost at
the same time. PCE1 will compute a path for PCC1->PCC2 while PCE2
will compute a path for PCC3->PCC4. As each PCE is not aware of the
path of the second LSP in the association group (not reported yet),
each PCE is computing the shortest path for the LSP. PCE1 computes
ERO: R1->PCC2 for PCC1->PCC2 and PCE2 computes ERO:
R3->R1->PCC2->PCC4 for PCC3->PCC4. When these shortest
paths will be reported to each PCE. Each PCE will recompute
disjointness. PCE1 will provide a new path for PCC1->PCC2 with ERO:
PCC1->PCC2. PCE2 will provide also a new path for PCC3->PCC4
with ERO: R3->PCC4. When those new paths will be reported to both
PCEs, this will trigger CSPF again. PCE1 will provide a new more
optimal path for PCC1->PCC2 with ERO: R1->PCC2 and PCE2 will
also provide a more optimal path for PCC3->PCC4 with ERO:
R3->R1->PCC2->PCC4. So we come back to the initial state.
When those paths will be reported to both PCEs, this will trigger CSPF
again. An infinite loop of CSPF computation is then happening with a
permanent flap of paths because of the split-brain situation.¶
This permanent computation loop comes from the inconsistency
between the state of the LSPs as seen by each PCE due to the
split-brain: each PCE is trying to modify at the same time its
delegated path based on the last received path information which de
facto invalidates this received path information.¶
Scenario 6: multi-domain¶
Domain/Area 1 Domain/Area 2
________________ ________________
/ \ / \
/ +------+ | | +------+ \
| | PCE1 | | | | PCE3 | |
| +------+ | | +------+ |
| | | |
| +------+ | | +------+ |
| | PCE2 | | | | PCE4 | |
| +------+ | | +------+ |
| | | |
| +------+ | | +------+ |
| | PCC1 | | | | PCC2 | |
| +------+ | | +------+ |
| | | |
| | | |
| +------+ | | +------+ |
| | PCC3 | | | | PCC4 | |
| +------+ | | +------+ |
\ | | |
\_______________/ \________________/
¶
In the example above, suppose that the disjoint LSPs from PCC1 to
PCC2 and from PCC4 to PCC3 are created. All the PCEs have the
knowledge of both domain topologies (e.g. using BGP-LS [RFC7752]). For operation/management
reasons, each domain uses its own group of redundant PCEs. PCE1/PCE2
in domain 1 have PCEP sessions with PCC1 and PCC3 while PCE3/PCE4 in
domain 2 have PCEP sessions with PCC2 and PCC4. As PCE1/2 does not
know about LSPs from PCC2/4 and PCE3/4 do not know about LSPs from
PCC1/3, there is no possibility to compute the disjointness
constraint. This scenario can also be seen as a split-brain scenario.
This multi-domain architecture (with multiple groups of PCEs) can also
be used in a single domain, where an operator wants to limit the
failure domain by creating multiple groups of PCEs maintaining a
subset of PCCs. As for the multi-domain example, there will be no
possibility to compute the disjoint path starting from head-ends
managed by different PCE groups.¶
In this document, we propose a solution that addresses the
possibility to compute LSP association based constraints (like
disjointness) in split-brain scenarios while preventing computation
loops.¶