Internet-Draft | Congestion Control Convergence | September 2023 |
Kuhn, et al. | Expires 31 March 2024 | [Page] |
This document specifies a cautious method for IETF transports that enables fast startup of congestion control for a wide range of connections or reconnections.¶
It reuses a set of computed congestion control parameters that are based on previously observed path characteristics between the same pair of transport endpoints. These parameters are stored, allowing them to be later used to modify the congestion control behavior of a subsequent connection.¶
It discusses assumptions and defines requirements for how a sender utilizes these parameters to provide opportunities for a connection to more rapidly get up to speed and rapidly utilize available capacity. It discusses how use of the method impacts the capacity at a shared network bottleneck and the safe response that is needed after any indication that the new rate is inappropriate.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 31 March 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
All Internet transports are required to either use a Congestion Control (CC) method, or to constrain their rate of transmission [RFC8085]. In 2010, a survey of alternative CC methods [RFC5783], noted that there are challenges when a CC method operates across an Internet path with a high and/or varying Bandwidth-Delay Product (BDP). This mechanism targets a solution for these challenges.¶
A CC method typically takes time to ramp-up the sending rate, called the "slow-start phase", informally known as the time to "Get up to speed". This slow-start phase defines a time in which a sender intentionally uses less capacity than might be available, with the intention to avoid or limit overshooting the available capacity for the path. The slow-start design can increase queuing (latency/jitter) and/or congestion packet loss to the flow. Any overshoot can have a detrimental effect on other flows sharing a common bottleneck. In the extreme case, persistent congestion could result in unwanted starvation of other flows [RFC8867] (i.e., preventing other flows from successfully sharing capacity at a common bottleneck).¶
This document proposes a CC method that is expected to reduce the time to complete a transfer when the transfer sends significantly more data than allowed by the Initial congestion Window (IW), and where the BDP of the path is also significantly more than the product of the IW and the Round Trip Rime (RTT).¶
It introduces an alternative method to select initial CC parameters, that seek to more rapidly and safely grow the sending rate controlled by then congestion window, CWND. (CC methods that are rate-based can make similar adjustments to their target sending rate.¶
This method is based on temporal sharing (sometimes known as caching) of a saved set of CC parameters that relate to previous observations of the same path. The saved CC parameters include: the available capacity found on the path and the RTT. These parameters are stored and used to modify the CC behavior of a subsequent connection between the same endpoints.¶
When used with the QUIC transport, this provides transport services that resemble those currently available in TCP, using methods such as TCP Control Block (TCB) [RFC9040] caching.¶
CC parameters are used by Careful Resume for two functions:¶
"Generally, implementations are advised to be cautious when using saved CC parameters on a new path", as stated in [RFC9000]. While this statement has been proposed in the context of QUIC standardization, this advice is appropriate for any IETF transport protocol. Care is therefore needed to assure safe use and to be robust to changes in traffic patterns, network routing, and link/node conditions. There are cases where using the saved parameters of a previous connection is not appropriate.¶
Whilst a sender could take optimization decisions without considering the receiver's preference, there are cases where a receiver could have information that is not available at the sender, or might benefit from understanding that Resume might be used. In these cases, a receiver could explicitly ask to enable or inhibit tuning of the CC when an application initiates a new session or resume an existing one.¶
An indication from the sender that Careful Resume is available, could also allow a receiver to tune policies for using the connection (e.g., managing the receiver window or flow credit).¶
Examples where a receiver could request not to use Careful Resume include:¶
QUIC introduces the concept of transport parameters (Section 4 of [RFC9000]). A related document proposes an extension for QUIC that requests the sender-generated CC parameters to be stored at the receiver [I-D.kuhn-quic-bdpframe-extension]. Transferring the information to a receiver releases the need for a sender to retain transport state for each receiver. This document also evaluates the potential for malicious use of this exchange.¶
This section provides a set of examples where Careful Resume is expected to improve performance.¶
Either endpoint can assume the role of a sender or a receiver. Careful Resume also supports a bidirectional data transfer, where both endpoints simultaneously send data (e.g., remote execution of an application, or a bidirectional video conference call).¶
In one example, an application uses a series of connections over a path (i.e., resumes a connection to the same endpoint). Without a new method, each connection would need to individually discover appropriate CC parameters, whereas Careful Resume allows the flow to continue at a rate that resembles the observed rate.¶
In another example, an application reconnects after a disruption had temporarily reduced the path capacity (e.g., after a link propagation impairment, or where a user on a train journey travels through different areas of connectivity). When the endpoint returns to use a path with the original characteristics, it can resume a transmission rate based on the previously observed CC parameters.¶
There is particular benefit for any path with an RTT that is much larger than typical Internet paths. In a specific example, an application connected via a satellite access network [IJSCN] could require 9 seconds to complete a 5.3 MB transfer using standard CC, whereas using Careful Resume this transfer time could be reduced to 4 seconds. The time to complete a 1 MB transfer could similarly be reduced by 62 % [MAPRG111]. This benefit is also expected for other sizes of transfer and for different path characteristics when a path has a large BDP.¶
{XXX-Editor note: A future revision would helpfully provide further Path Examples here.}¶
This subsection provides a brief summary of key terms and the requirements language.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The document uses language drawn from a range of IETF RFCs. It defines current, and saved values for a set of CC parameters:¶
The Endpoint Token is described in Appendix A.¶
This section defines a series of phases that the CC algorithm moves through as a connection uses Careful Resume, as shown in Figure 1.¶
Connect -> Reconaissance --------------------> Normal | ^ v | Unvalidated --> Validating -----------+ | | | | | | +---------------+--> Safe Retreat --+¶
Figure 1: Phases when a connection uses the Careful Resume. The Observe Phase is later performed by an established connection.¶
During a previous connection, CC pareameters for the specific path to an endpoint are saved. This is used to characterize the path and to measure the capacity that was used. This includes the minimum RTT (saved_rtt), the path capacity (saved_capacity) and the receiver Endpoint Token (saved_endpoint_token). An implementation can store this information at the server (or could exchange this information with a receiver, as detailed in [I-D.kuhn-quic-bdpframe-extension]).¶
When a sender resumes transmission between the same pair of endpoints, (a.k.a. thinks it uses the same path) it enters the Reconnaissance Phase. The sender only enters this phase when there are saved CC parameters for the same pair of endpoints and this information is currently valid (i.e., the saved parameters have not expired). A receiver can use a method (such as the QUIC BDP Frame [I-D.kuhn-quic-bdpframe-extension])) to request that the sender does not enter this phase.¶
In this phase, the sender transmits initial data, limited by the IW, and monitors its reception. This phase measures the current path characteristics to confirm these are consistent with the previously observed CC parameters.¶
When a sender confirms the path and it receives an acknowledgement for the initial data without reported congestion, it MAY then enter the Unvalidated Phase. This transition occurs when a sender has more data than permitted by the current CWND.¶
Implementation requirements are provided in Section 4.2.¶
When the path is not confirmed, Careful Resume is not used and the sender enters the Normal Phase.¶
The Unvalidated Phase is designed to enable the CWND to more rapidly get up to speed, but this requires data to send. If the application is data limited, the sender sends insufficient data to be able to validate transmission at the tentative higher rate. Careful Resume therefore remains in the Reconnaissance Phase and does not transition to the Unvalidated Phase until the sender has more data ready to send in the transmission buffer than is permitted by the current CWND. (If an application is data-limited, the sender sends insufficient data to be able to validate the tentative higher rate.) In some implementations, the decision to enter the Unvalidated Phase could require coordination with the management of buffers in the interface to the higher layers.¶
This phase paces transmission using an increased CWND (jump_CWND) that is calculated based on the saved CC parameters and current_RTT.¶
Implementation requirements are provided in Section 4.3.¶
The Validating Phase is checks that the packets sent in the Unvalidated Phase were received without inducing congestion. The sender typically remains in this phase for 1 RTT. The CWND remains unvalidated. (Note: When the full jump_cwnd is not fully utilised, it results in a smaller capacity being validated.)¶
This phase is entered when the sender detects that a jump in the Unvalidated Phase has overshot the currently available capacity. It starts when the first loss/ECN-CE marking is detected. (This trigger is the same as used by a QUIC sender to transition from Slow Start to Recovery [RFC9002] .)¶
Implementation requirements are provided in Section 4.5.¶
Unacknowledged packets that were sent in the Unvalidated Phase can be lost when there is congestion. Loss recovery commences using the reduced CWND that was set on entry to the Safe Retreat Phase.¶
The sender leaves the Safe Retreat Phase when an acknowledgement is received for the last packet number (or higher) sent in the Unvalidated Phase. If the last packet number is not cumulatively acknowledged, then additional packets might need to be retransmitted.¶
CC methods using a slowstart threshold need to update this from the CWND (i.e., ssthresh = CWND).¶
The Normal Phase is then entered.¶
In the Normal Phase, the sender transitions to using the normal CC method (e.g., in congestion avoidance).¶
Implementation requirements are provided in Section 4.6.¶
A sender that experiences a Retransmission Time Out (RTO) expiry ceases to use Careful Resume. The sender continues using normal CC.¶
This section provides requirements for implementation and guidance on use.¶
There are various approaches to measuring the capacity that used by a connection. Congestion controllers, such as CUBIC or Reno, can estimate the capacity by utilizing a combination of the CWND/flight_size and the RTT. A different approach could estimate the same parameters for a rate-based congestion controller, such as BBR [I-D.cardwell-iccrg-bbr-congestion-control].¶
In the Reconnaissance Phase a sender initiates a connection and starts sending initial data. It measures the RTT to confirm the path it wishes to use.¶
A sender must limit the initial data, sent in the first RTT of transmitted data, to not more than the IW [RFC9000]. This transmission using the IW is assumed to be a safe starting point for any path to avoid adding excessive load to a potentially congested path. (When used in a controlled network, additional information about local path characteristics could be known, which might be used to configure a non-standard IW.)¶
Path characteristics can change over time for many reasons, resulting in the previously observed CC parameters becoming irrelevant. The sender therefore compares the saved_RTT with each of a series of measured RTT samples.¶
If the current RTT sample is less than a half of the saved_RTT, this is regarded as too small, and is an indicator of a path change. (This factor of two arises, because the rate should not exceed the observed rate when the capacity was measured, because the jump_cwnd is calculated as half the measured capacity.)¶
A current RTT larger than that at the time the capacity was measured results in a proportionaly lower resumed rate, because the transmission using the CR method is paced based on the current RTT. An RTT sample more than ten times the saved_RTT is regarded as too large, such a high RTT is indicative of a path change. (The factor of ten accommodates both increases in latency from buffering on a path, and any variation between samples).¶
NOTE: Some transport protocols implement methods that infer potential congestion from an increase in the RTT. In the Reconnaissance Phase, this indication occurs earlier than congestion which is reported by loss or by ECN marking. Designs need to consider if this is a suitable trigger for changing the phase of CR.¶
This section defines the safety requirements for using saved CC parameters to tentatively update the CWND. These safety guidelines mitigate the risk of adding excessive congestion to an already congested path.¶
{XXX-Editor NOTE: A future revision of this document needs to specify how long CC Parameters can be cached, possibly based on TCP-new-CWV or TCB, RFC9040.}¶
Unvalidated and Reconnaissance Phases: Careful Resume MUST be robust to changes in network conditions due to variations in the forwarding path, reconfiguration of equipment, or changes in the link conditions.¶
The sender must avoid sending a burst of packets greater than IW as a result of a step-increase in the congestion window [RFC8085], [RFC9000]. Pacing sent packets as a function of the current RTT provides an additional safety during the Unvalidated Phase. Other sender mitigations have also been suggested to avoid line-rate bursts (e.g., [I-D.hughes-restart]).¶
The following example provides a relevant pacing rhythm using the RTT and the saved_capacity. The Inter-packet Transmission Time (ITT) is determined from the ratio between the current Maximum Message Size (MMS) and the ratio between the saved_capacity and the RTT. A safety margin can avoid sending more than a recommended maximum (max_jump):¶
This follows the idea presented in [RFC4782], [I-D.irtf-iccrg-sallantin-initial-spreading] and [CONEXT15].¶
When a sender completes the Unvalidated Phase, either by sending the jump_cwnd or after one RTT, it ceases to use the unvalidated CWND. That is, CWND is reset to the flight size, and the sender awaits reception of the acknowledgments to validate the use of this capacity. During this phase, new packets are sent when previously sent data is newly acknowledged. The purpose of this phase is to trigger a safe retreat in the case when the capacity is not validated.¶
This section defines the safety requirements after congestion has been detected during the Unvalidated Phase.¶
The Safe Retreat reaction MUST differ from a traditional reaction to detected congestion, because the jump_cwnd can result in a significantly higher rate than would be allowed by the slow-start mechanism. This could aggressively feed a congested bottleneck, resulting in overshoot where a disproportionate number of packets from existing flows are displaced from the buffer at the congested bottleneck. For this reason, a sender needs to react to detected congestion by reducing significantly CWND significantly below the saved_capacity.¶
Note: Proportional Rate Reduction (PRR) assumes that it is safe to reduce the rate gradually when in congestion avoidance. The method specified by PRR [RFC6937] is therefore not appropriate when there might be significant overshoot in the use of the capacity.¶
The CWND is reduced on entry to the Safe Retreat Phase to no more than the IW.¶
This provides some examples of how to implement the Safe Retreat Phase:¶
After using Careful Resume, the CC controller returns to the Normal Phase.¶
The implementation details for the transition to the Normal Phase depend on the design of the CC method.¶
{XXX-Editor note: A future revision should discuss updating the saved parameters, whether used or not, after reaching normal operation for use the next time even if that update is to just refresh the expiration time.}¶
A sender is limited by any rate-limitation of the transport protocol that is used.¶
The implementation details for different transports depend on the design of the transport.¶
For QUIC this includes flow control mechanisms or preventing amplification attacks. In particular, a QUIC receiver might need to issue proactive MAX_DATA frames to increase the flow control limits of a connection that is started when using Careful Resume to gain the expected benefit.¶
A TCP sender is limited by the receiver window (rwnd). Unless configured at a receiver, the rwnd constrains the rate of increase for a connection and reduces the benefit of Careful Resume.¶
The authors would like to thank John Border, Gabriel Montenegro, Patrick McManus, Ian Swett, Igor Lubashev, Robin Marx, Roland Bless, Franklin Simo, Raffaello Secchi for their fruitful comments on earlier versions of this document.¶
The authors would like to particularly thank Tom Jones for co-authoring several previous versions of this document.¶
No current parameters are required to be registered by IANA.¶
This document does not exhibit specific security considerations. Security considerations for the interactions with the receiver are discussed in [I-D.kuhn-quic-bdpframe-extension].¶
This annex proposes an Endpoint Token to allow a sender to identify its own view of the network path that it is using. In [I-D.kuhn-quic-bdpframe-extension] this Endpoint Token could be shared and used as an opaque path identifier to other parties and the sender can verify if this is one of its current paths.¶
When computing the Endpoint Token, the sender includes information to identify the path on which it sends, for example:¶
When creating an Endpoint Token, the sender has to ensure the following:¶
Previous individual submissions were discussed in TSVWG and QUIC.¶