Internet-Draft Problem Statement and Requirements October 2023
Yuan, et al. Expires 25 April 2024 [Page]
Workgroup:
CATS
Internet-Draft:
draft-yuan-cats-end-to-end-problem-requirement-01
Published:
Intended Status:
Standards Track
Expires:
Authors:
D. Yuan
ZTE Corporation
H. Yao
China Mobile
Z. Li
China Mobile
F. Zhou
ZTE Corporation
X. Wang
Ruijie Networks Co.,Ltd

Problem Statement and Requirements of end-to-end CATS

Abstract

This document describes and proposes problem statement and incremental requirements of an end-to-end computing aware traffic steering (CATS) process illustrated in [I-D.ldbc-cats-framework]. Particularly, this document analyzes the significance of appropriate aggregation algorithms and the necessity of hierarchical forwarding mechanisms.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 25 April 2024.

Table of Contents

1. Introduction

1.1. Usecase: Service scheduling for ubiquitous instances

Taking computing-aware AR or VR mentioned in [I-D.ietf-cats-usecases-requirements] for instance, a specific condition is further discussed in this draft with ubiquitous service instances.


+--------------+     +--------------+     +--------------+
|Instance 01-20|     |Instance 21-40|     |Instance 41-60|
+--------------+     +--------------+     +--------------+
        |                    |                    |
        |                    |                    |
   +----+-----+        +-----+----+         +-----+----+
   |  Egress  |        |  Egress  |         |  Egress  |
   | Router 1 |        | Router 2 |         | Router 3 |
   +----+-----+        +-----+----+         +-----+----+
        |                    |                    |
        |                    |                    |
        |           +--------+--------+           |
        +-----------|  Infrastructure |-----------+
                    +--------+--------+
                             |
                        +----+----+
                        | Ingress |
        +---------------|  Router |--------------+
        |               +----+----+              |
        |                    |                   |
    +--+--+              +--+---+           +---+--+
  +------+|            +------+ |         +------+ |
  |Client|+            |Client|-+         |Client|-+
  +------+             +------+           +------+

Figure 1: Service scheduling for ubiquitous instances

As illustrated in [I-D.ietf-cats-usecases-requirements], it requires to dynamically steer traffic to the appropriate instance to meet the E2E delay requirements considering both network and computing resource status. However, a large amount of ubiquitous service instances connected to different Egress Routers make the work complicated which includes the following two aspects:

1. Large amount of dynamic entries recorded and maintained in the control plane for CATS routers.


                         (  Instance 1     Metrics
                         |    ......
                         |  Instance 20    Metrics
                         |
     CATS  Routers       |  Instance 21    Metrics
           x            <     ......
    Average amount of    |  Instance 40    Metrics
instances per CATS Router|
                         |  Instance 41    Metrics
                         |    ......
                         (  Instance 60    Metrics

Figure 2: Large amount of dynamic entries

2. Quick updates for FIB.

Suppose a simple load balance strategy is implemented with a Round-Robin scheme. During time slot 000 - 200 s, the traffic would be steered to a definite CATS Router 1 which connects service instance 1 - 20, however, a FIB with instance granularity is required to update quite frequently.


Round Robin among    Time Slot
   Instance 1        000-010 s
    ......           ......
   Instance 20       190-200 s

   Instance 21       200-210 s
    ......           ......
   Instance 40       390-400 s

   Instance 41       400-410 s
    ......           ......
   Instance 60       590-600 s

Figure 3: Large amount of dynamic entries

1.2. Requirements: Hierarchical service routing

Based on the migration of computing resources towards the edge and the presentation of distributed and heterogeneous deployment patterns, the boundaries between computing resources and the underlay network are no longer exactly separated. Computing-related services and corresponding scheduling schemes have gained their popularity and necessity in current and future circumstances. Considering the large number of future service instances and the dynamic and continuous nature of computing state variations, reflecting detailed instance information in the system will result in an incalculable number of entries and computational complexity. This will be catastrophic, especially for a circumstance of distributed control plane.

As illustrated in [I-D.ldbc-cats-framework], the Ingress CATS-Router steers service-specific traffic along a CATS-computed path that leads to an Egress CATS-Router that connects to the most suitable edge site that host the service contact instance selected to satisfy the initial service demand. In some cases, the choice of the service contact instance may be left open to the Egress CATS-Router and the Egress CATS-Router selects a service contact instance using its knowledge of service and network capabilities as well as the current load as observed by the CATS router among other considerations. Therefore, the end-to-end CATS process is implemented in a hierarchical manner.

A corresponding hierarchical two segment routing scheme is also proposed and analyzed in [I-D.huang-cats-two-segment-routing] which separates the service routing path into two segments regarding the highly dynamic computing status. Aggregated per-site computing related metrics appears as a privileged option scalability-wise. High frequency of computing status updates is restrained in a higher level decision region which is the GCRS segment described in this draft.

The introduction of hierarchical segment routing for end-to-end CATS brings about the following significant superiority:

Comparatively, in a non-hierarchical scheme, a specific instance may be directly selected in a sole decison. Thus, a qualitative comparison between hierarchical and non-hierarchical schemes is shown below.


+----------------+----------------------+--------------------+
|                |     Hierarchical     |  Non-hierarchical  |
+----------------+----------------------+--------------------+
| Entries in     | Local instances      | All instances      |
| the control    | and global PEs       | with paths         |
| plane          | with paths           |                    |
| (Service RIB)  |                      |                    |
+----------------+----------------------+--------------------+
| Entries in     | Single 'best'        | Single 'best'      |
| the forwarding | entry or multiple    | entry or multiple  |
| plane          | equal entries        | equal entries      |
| (Service FIB)  |                      |                    |
+----------------+----------------------+--------------------+
| Frequency of   | When a relatively    | When a choice      |
| updates of     | stable choice among  | among all possible |
| Service FIB    | PEs updates or       | instances updates  |
|                | local updates happen |                    |
+----------------+----------------------+--------------------+

Figure 4: Comparison between Hierarchical and Non-hierarchical Schemes

Regarding "R6 MUST realize means for rate control for distributing of metrics" mentioned in [I-D.ietf-cats-usecases-requirements], rate control MAY not be enough for future conditions due to distributed and ubiquitous instances. Thus, an incremental requirement is proposed:

R: SHOULD provide mechanisms to aggregate service metrics for distribution.

However, a hierarchical segment routing implys a multi-device decision-making manner which leads to possible problems:

Thus, this draft proposes a problem statement and incremental requirements of hierarchical segment routing schemes in the end-to-end CATS process.

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. Terminology

4. Problem 1: Design of a Cohesive Scheme to Make Consistent Decisions

As shown below, Instance 1 to Instance 7 locate at PE 1 to PE 3 respectively and their explicit information is displayed with typical computing attributes of CPU cores, memory remains and load.


                              +----------+          +------------+
                              |   PE 3   +----------+ Instance 7 |
                              +----++----+          +------------+
                                   ||                6C,200G,60%.
                                   ||
                                   ||
                      +------------++------------+
                     /                            \
                    /                              \
                   /                                \
                  /                                  \
                 /                                    \
           +----------+                          +----------+
           |   PE 1   +--------------------------+   PE 2   |
           +----++----+                          +----++----+
                ||                                    ||
                ||                                    ||
            (---++---)                            (---++---)
         (---        ---)                      (---        ---)
   -------              --------         -------              --------
  (+------------+               )       (+------------+               )
 ( | Instance 1 |  4C,160G,40%.  )     ( | Instance 4 |  6C,100G,20%.  )
 ( +------------+                )     ( +------------+                )
 ( +------------+                )     ( +------------+                )
(  | Instance 2 |  8C,300G,60%.   )   (  | Instance 5 |  8C,200G,20%.   )
(  +------------+                 )   (  +------------+                 )
 ( +------------+                )     ( +------------+                )
 ( | Instance 3 |  6C,200G,50%.  )     ( | Instance 6 |  8C,250G,80%.  )
  (+------------+               )       (+------------+               )
   -----------------------------         -----------------------------

                                         CPU Cores, Memory Remains, Load.

Figure 5: Primitive Metadata with a Hierarchical Scheme

Without a hierarchical scheme, PE 1, PE 2 and PE 3 learn and maintain the information of each service instance as shown below. Entries generated and maintained at PE 1, PE 2 and PE 3 are completely identical. Suppose a computing-related service is sensitive to a memory attribute and other constraints are temporarily ignored, then Instance 2 should be selected as the most appropriate instance.


Instance 1    4C,160G,40%.
Instance 2    8C,300G,60%.
Instance 3    6C,200G,50%.
Instance 4    6C,100G,20%.
Instance 5    8C,200G,20%.
Instance 6    8C,250G,80%.
Instance 7    6C,200G,60%.

Figure 6: Explicit Entries without a Hierarchical Scheme

Then, a hierarchical scheme is applied and suppose a summation method is utilized to express the aggregated information of all instances connected to a PE while explicit and detailed information is concealed. Entries maintained at PE 1, PE 2 and PE 3 vary from each other. Suppose a service request accesses at PE 3, the request is steered to PE 1 according to the entries stored at PE 3. Similar decision procedures are made at PE 1 and PE 2, then the request is steered between PE 1 and PE 2 continuously which generates a permanent loop.


At PE 1:          At PE 2:          At PE 3:
Instance 1  160G  Instance 4  100G  Instance 7  200G
Instance 2  300G  Instance 5  200G  PE 1        660G
Instance 3  200G  Instance 6  250G  PE 2        550G
PE 2        550G  PE 1        660G
PE 3        200G  PE 3        200G

                         |
                         V
                    +----------+
                    |   PE 3   |
                    +---+--+---+
                    /  /    \
                   /  /      \
                  /  /        \
                 /  /          \
                V  /            \
             +----------+  +----------+
             |   PE 1   +--+   PE 2   |
             +----------+  +----------+
                    ----------->
                    <-----------

Figure 7: Cohesive Entries with a Hierarchical Scheme

Based on the above example, inappropriate aggregation algorithms lead to inconsistent decisions among multiple PEs in the global region. The fundamental reason is that the application of summation leads to the disappearance of comparability between aggregated information and detailed information and a summation value of a cluster of instances is always preferred and thus selected. Therefore, a requirement is proposed for Problem 1 that the cohesive information needs to be comparable to the information of any explicit single instance. In the condition illustrated in Figure 2, taking the maximum value may be an appropriate method to unify the decision-making process. No matter a service request accesses at PE 1, PE 2 or PE 3, a forementioned loop problem is avoided.


At PE 1:          At PE 2:          At PE 3:
Instance 1  160G  Instance 4  100G  Instance 7  200G
Instance 2  300G  Instance 5  200G  PE 1        300G
Instance 3  200G  Instance 6  250G  PE 2        250G
PE 2        250G  PE 1        300G
PE 3        200G  PE 3        200G

                    +----------+
                    |   PE 3   |
                    +---+--+---+
                       /    \
                      /      \
                     /        \
                    /          \
                   /            \
             +----------+  +----------+
             |   PE 1   +--+   PE 2   |
             +----------+  +----------+
              PE 1-->Instance 2
              PE 2-->PE 1-->Instance 2
              PE 3-->PE 1-->Instance 2

Figure 8: Adjusted Cohesive Entries with a Hierarchical Scheme

5. Problem 2: Lack of Detailed Information Affects Decisions

Undoubtedly, there is an information gap to utilize aggregated information to generally represent the ability among multiple instances at a PE. Therefore, profound analysis is required to determine whether decisions made based on aggregated information can adapt to various scheduling strategies. Taking a preferred strategy and a balanced strategy as examples, the use case and analysis are presented below.


           +----------+
Access---->|   PE 3   |
           +---+--+---+
              /    \
             /      \
            /        \
           /          \
          /            \
    +----------+  +----------+
    |   PE 1   +--+   PE 2   |
    +----------+  +----------+
At PE 1:          At PE 2:
Instance 1  100G  Instance 4  160G
Instance 2  100G  Instance 5  200G
Instance 3  400G  Instance 6  300G
Average     200G  Average     220G

Figure 9: Average Value as the Cohesive Information

Under a preferred strategy, suppose an average value is selected to represent the instances behind a PE, when a service request accesses at PE 3, it finds that the performance of PE 2 seems better than PE 1 since the aggregated value is higher at PE 2. However, the 'best' instance which is Instance 3 locates at PE 1.

Therefore, a maximum value or a minimum value is suggested to be the aggregated information to adapt a preferred strategy. However, practical problems may be sophisticated.

In a more complicated occasion below, more attributes are required to expose to fulfill a multi-factor optimization with constraints. As shown below, under scheme 1, scheme 2 and scheme 3, the traffic is misled to the incorrect PE with the knowledge of the exposed information, however, Instance 6 and Instance 7 prove to be the 'best' respectively which satisfies the requirement and maintains the most memory resource with a global vision.

Taking scheme 1 as an example, a computing related service requires the end-to-end delay is no more than 50ms and it is reckoned as a memory sensitive service. The best value among instances for each attribute is selected as the aggregated value. Thus, 400G and 20ms represent the performance at PE 1 while 260G and 30ms represent the performance at PE 2. Referring to the aggregated value, PE 1 seems to be the 'better' selection. However, Instance 3 at PE 1 fails to satisfy the service requirement while Instance 1 and Instance 2 have lower memory values than Instance 6 at PE 2.


                Requirement: <50ms

                   +----------+
        Access---->|   PE 3   |
                   +----------+
                   /          \
                  /            \
                 / 10ms         \ 10ms
                /                \
               /                  \
         +----------+        +----------+
         |   PE 1   +--------+   PE 2   |
         +----------+        +----------+
At PE 1:                     At PE 2:
Instance 1  100G,20ms        Instance 4  200G,30ms
Instance 2  220G,20ms        Instance 5  200G,30ms
Instance 3  400G,40ms        Instance 6  260G,30ms *

Scheme 1    400G,20ms *                  260G,30ms
Best Memory, Best Delay

Scheme 2    240G,27ms *                  220G,30ms
Average Memory, Average Delay

                   +----------+
        Access---->|   PE 3   |
                   +----------+
                   /          \
                  /            \
                 / 10ms         \ 10ms
                /                \
               /                  \
         +----------+        +----------+
         |   PE 1   +--------+   PE 2   |
         +----------+        +----------+
At PE 1:                     At PE 2:
Instance 1  100G,20ms        Instance 4  200G,30ms
Instance 2  220G,20ms        Instance 5  200G,30ms
Instance 3  400G,40ms        Instance 6  260G,30ms
Instance 7  350G,30ms *

Scheme 3    400G,40ms                    260G,30ms *
Instance with Best Memory

Figure 10: Cohesive Entries with Multiple Factors

Since the aggregated information shields explicit and detailed entries, it is difficult to locate the 'best' instance under a preferred strategy. However, appropriate aggregation algorithm can indicate part of the satisfied instances which could adapts to a balanced strategy. It is suggested that an aggregation algorithm should be designated with the knowledge of distribution of computing resources and corresponding scheduling strategies.

With future considerations, the number of service instances may increase to a large amount, the explicit information of running status may be displayed in a continous manner. 50,000 instances with load from 40% to 60% for instance. With the instances sorted, the values in the 'best' section may be 40%, 40.1%, 40.2%, and a great number of similar values. Thus, it is difficult to distinguish and it also may not be neccessary to select the 'best' one. Under the mentioned circumstances, a hierarchical method with aggregation algorithms introduced can adapt well.

6. Problem 3: Microloop Caused by Interruptions

As show below, suppose a service client accesses at PE 2 and Instance 2 located at a cloud site connected to PE 1 loses efficacy at a sudden. This event has not yet been notified to PE 2 in a timely manner which leads to a mistake that PE 2 still steers the traffic to PE 1 which is still reckoned as the best selection. However, whenever PE 1 discovers that Instance 2 becomes invalid and updates its FIB, PE 2 turns out to be the most appropriate choice decided at PE 1. Thus, the traffic is steered back to PE 2 which forms a microloop.


                                 +----------+          +------------+
                                 |   PE 3   +----------+ Instance 7 |
                                 +----++----+          +------------+
                                      ||                6C,200G,60%.
                                      ||
                                      ||
   At PE 1:              +------------++------------+      At PE 2:
   Instance 1  160G     /                            \     Instance 4  100G
x  Instance 2  300G    /                              \    Instance 5  200G
   Instance 3  200G   /                                \   Instance 6  250G
*  PE 2        250G  /                                  \  PE 1        300G *
   PE 3        200G /                                    \ PE 3        200G
              +----------+   ------------------->   +----------+
              |   PE 1   +--------------------------+   PE 2   +<-------- Access
              +----++----+   <-------------------   +----++----+
                   ||                                    ||
                   ||                                    ||
               (---++---)                            (---++---)
            (---        ---)                      (---        ---)
      -------              --------         -------              --------
     (+------------+               )       (+------------+               )
    ( | Instance 1 |  4C,160G,40%.  )     ( | Instance 4 |  6C,100G,20%.  )
    ( +------------+                )     ( +------------+                )
    ( +------------+x               )     ( +------------+                )
   (  | Instance 2 |x 8C,300G,60%.   )   (  | Instance 5 |  8C,200G,20%.   )
   (  +------------+x                )   (  +------------+                 )
    ( +------------+                )     ( +------------+                )
    ( | Instance 3 |  6C,200G,50%.  )     ( | Instance 6 |  8C,250G,80%.  )
     (+------------+               )       (+------------+               )
      -----------------------------         -----------------------------

Figure 11: Interrupt Notification with Time Difference

Without a hierarchical scheme, the destination is explicitly indicated with a single decision process. Without introducing or configuring instance or path protection schemes, packets will be directly discarded.

Compared with microloop occured in IGP, a microloop mentioned here has similarities. Distributed status storage and disorderly convergence lead to inconsistent decision-making among distributed devices. SR provides a scheme to eliminate potential loops in the network with minimal impact on the network, creating an acyclic SRv6 Segment List for instance. However, unlike conventional problems and solutions, an explicit destination is not determined within previous decisions in a hierarchical scheme. Therefore, explicit routing path indicated by SR is not enough to solve a hierarchical segment routing problem. Actually, a higher level decision has been made within the last entry lookup. Thus, entries should be distinguished according to their gradation in the overall hierarchical scheme. And finally a requirement is proposed for Problem 3 that hierarchical entries should be designed and a corresponding forwarding behaviour should be introduced to adapt the forementioned entries.

7. Incremental Requirements for end-to-end CATS

As illustrated in [I-D.ietf-cats-usecases-requirements], the requirements of CATS are listed. As analyzed in this draft, the incremental requirements for a hierarchical segment routing scheme for CATS are further concluded as follows:

  1. The cohesive information calculated by an aggregation algorithm SHOULD be comparable to a explicit entry of any service instance to guarantee the consistency and basic correctness of decision-making process.

  2. The aggregation algorithm SHOULD be appropriately designed according to the distribution of computing resources and a corresponding applied scheduling strategy to reflect the performance and compatibility of local instances. Qualified service instances MUST be distinguished.

  3. Hierarchical forwarding information entries SHOULD be separated within the lookup process and a corresponding hierarchical forwarding mechanism SHOULD be introduced accordingly.

8. Security Considerations

TBA.

9. Acknowledgements

TBA.

10. IANA Considerations

TBA.

11. Normative References

[I-D.huang-cats-two-segment-routing]
Huang, D., Du, Z., and C. Zhang, "Hierarchical segment routing solution of CATS", Work in Progress, Internet-Draft, draft-huang-cats-two-segment-routing-01, , <https://datatracker.ietf.org/doc/html/draft-huang-cats-two-segment-routing-01>.
[I-D.ietf-cats-usecases-requirements]
Yao, K., Trossen, D., Boucadair, M., Contreras, L. M., Shi, H., Li, Y., and S. Zhang, "Computing-Aware Traffic Steering (CATS) Problem Statement, Use Cases, and Requirements", Work in Progress, Internet-Draft, draft-ietf-cats-usecases-requirements-00, , <https://datatracker.ietf.org/doc/html/draft-ietf-cats-usecases-requirements-00>.
[I-D.ldbc-cats-framework]
Li, C., Du, Z., Boucadair, M., Contreras, L. M., Drake, J., Huang, D., and G. S. Mishra, "A Framework for Computing-Aware Traffic Steering (CATS)", Work in Progress, Internet-Draft, draft-ldbc-cats-framework-03, , <https://datatracker.ietf.org/doc/html/draft-ldbc-cats-framework-03>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

Authors' Addresses

Dongyu Yuan
ZTE Corporation
No.50 Software Avenue
Nanjing
Jiangsu, 210012
China
Huijuan Yao
China Mobile
China
Zhiqiang Li
China Mobile
China
Fenlin Zhou
ZTE Corporation
No.50 Software Avenue
Nanjing
Jiangsu, 210012
China
Xuewei Wang
Ruijie Networks Co.,Ltd
China