3.6. Handling Maintenance Windows
Whenever network components are under maintenance, the operator wants to inhibit the emission of symptoms from those components. A typical use case is device maintenance, during which the device is not supposed to be operational. As such, symptoms related to the device health should be ignored. Symptoms related to the device-specific subservices, such as the interfaces, might be ignored as well as their state changes is probably the consequence of the maintenance.¶
The ietf-service-assurance model proposed in [I-D.ietf-opsawg-service-assurance-yang] enables flagging subservices as under maintenance, and, in that case, requires a string that identifies the person or process who requested the maintenance. When a service or subservice is flagged as under maintenance, it must report a generic "Under Maintenance" symptom, for propagation towards subservices that depend on this specific subservice: any other symptom from this service, or by one of its impacting dependencies must not be reported.¶
We illustrate this mechanism on three independent examples based on the assurance graph depicted in Figure 2:¶
- Device maintenance, for instance upgrading the device OS. The operator flags the subservice "Peer1" device as under maintenance. This inhibits the emission of symptoms, except "Under Maintenance", from "Peer1 Physical Interface", "Peer1 Tunnel Interface" and "Tunnel Service Instance". All other subservices are unaffected.¶
- Interface maintenance, for instance replacing a broken optic. The operator flags the subservice "Peer1 Physical Interface" as under maintenance. This inhibits the emission of symptoms, except "Under Maintenance" from "Peer 1 Tunnel Interface" and "Tunnel Service Instance". All other subservices are unaffected.¶
- Routing protocol maintenance, for instance modifying parameters or redistribution. The operator marks the subservice "IS-IS Routing Protocol" as under maintenance. This inhibits the emission of symptoms, except "Under Maintenance", from "IP connectivity" and "Tunnel Service Instance". All other subservices are unaffected.¶
In each example above, the subservice under maintenance is completely impacting the service, putting it under maintenance as well. In more complex cases, for instance with a primary and backup path for the connectivity, the service might still be working in a degraded way if a subservice impacting the primary path is under maintenance. In such cases, the status of the service might include the "Under Maintenance" as well as other symptoms (e.g. from the backup path) to explain the lower health score. In general, the computation of the service status from the subservices is done in the SAIN collector whose implementation is out of scope for this document.¶
The maintenance of a subservice might modify or hide modifications of the structure of the assurance graph. Therefore, unflagging a subservice as under maintenance should trigger an update of the assurance graph.¶