This autumn I was taking TCP/IP course taught by professor Le Boudec. This time I learned something new, such as IPv6, interoperability between IPv4 and IPv6, more details about RIP and BGP, etc. Of which the most important I think are flow control and congestion control in TCP.
Flow control and congestion control are two different mechanisms for different purposes.
- Flow Control adapts sending rate of source to the receiving buffer and processing speed of receiver
- Congestion Control adapts rate of source to state of network
In another word, flow control is used to coordinate sending and receiving rate. If the sender is sending too fast, while the receiving application is processing slowly, the receiving buffer may not have enough space to hold the incoming data. Thus, there should be a mechanism to tell the sender to slow down.
Congestion control is a mechanism to avoid congestion collapse in the network. In the real world, if there’s heavy traffic, and every driver wants to rush, the congestion will be more severe. So there is need for a mechanism to regulate traffic in case of congestion.
Flow control is implemented in TCP through sliding window. In the three-way handshake period of TCP connection, the sender and receiver will negotiate the initial size of sliding window for each direction. The sliding window size will be updated on the fly.
lower edge = smallest non acknowledged sequence number upper edge = lower edge + window size
Note that the difference between two sequence number indicates how many bytes are sent or received. The rule of sliding window is as follows:
Only sequence numbers that are in the window may be sent
If the receiving buffer is full, it would advertise sliding window size of 0 to the sender, to prohibit the sender from sending packets temporarily.
Congestion control is about resource(bandwidth) allocation and traffic regulation.
Max-Min and Proportional Fairness
As bandwidth is a kind of limited resource, there should be a way to justly allocate the bandwidth resources. Theoretically, there are three approaches to allocate bandwidth resources:
- Egalitarianism - if each one is allocated the same bandwidth
- Max‐Min Fairness - if any increase would benefit the rich but damage the poor
- Proportional Fairness - if any increase would make the sum of proportional gains negative
The TCP allocation is between max‐min fairness and proportional fairness. One trait of TCP protocol is that it favors connections with low round trip time(RTT).
Additive Increase Multiplicative Decrease(AIMD)
Another theoretical result is that, among the linear controls, only additive-increase–multiplicative-decrease (AIMD) should be considered, that’s what the Internet is based on.
TCP uses an end-to-end congestion mechanism, i.e. hosts sender increases its sending window until loss occurs, then decreases.
cwnd and ssthresh
To implement congestion control for TCP, the protocol maintains a congestion window(cwnd) for each connection. Following rule is enforced by the protocol:
At any given time, a TCP MUST NOT send data with a sequence number higher than the sum of the highest acknowledged sequence number and the minimum of cwnd and rwnd.
NOTE: rwnd means the size of flow control window advertised by the receiver.
The protocol also maintains another variable named slow start threshold(ssthresh). ssthresh is used to determine whether the protocol should enter slow start state or congestion control state. ssthresh is initialized to a large value.
TCP has two mechanisms for packet retransmit:
- Fast Retransmit: when 3 duplicate ACKs are received, declare a loss
- Retransmit Timeout: TCP also uses a retransmit timer, set for every sent packet
When timeout happens, the values of ssthresh and cwnd are updated and slow start begins:
ssthresh = cwnd/2 cwnd = 1
When fast retransmit happens, ssthresh and cwnd are updated and fast recovery begins:
ssthresh = cwnd/2 cwd = ssthresh + 3
Following state machine diagram illustrates the state transformation:
- Start with cwnd = 1 (slow start)
- On each successful ACK increment cwnd: cwnd ← cnwd + 1
- Exponential growth of cwnd: cwnd ← 2 x cwnd (for each RTT)
- Enter Congestion Avoidance when cwnd >= ssthresh
Slow start is actually not slow, it’s exponential(multiplicative increase). Exponential back-off is used in case of successive timeouts.
- Starts when cwnd ≥ ssthresh
- On each successful ACK: cwnd ← cwnd + 1/cwnd
- Linear growth of cwnd: cwnd ← cwnd + 1 (for each RTT)
- Starts when fast retransmit happens
- Set cwnd ← cwnd + 1 for each additional duplicate ACK
- On non-duplicate ACK
- Set cwnd ← ssthresh
- Enter Congestion Avoidance