When users want to send data over the Internet faster than the network can handle, congestion can occur, in the same way that traffic congestion messes up the morning commute to a big city.
Computers and devices that transmit data over the Internet break the data into smaller packets and use a special algorithm to decide how fast to send those packets. These congestion control algorithms seek to discover and fully utilize available network capacity while sharing it fairly with other users who may be sharing the same network. These algorithms try to minimize the delay caused by data waiting in network queues.
Over the last decade, researchers in industry and academia have developed various algorithms that attempt to achieve high rates while controlling delays. Some of these, like the BBR algorithm developed by Google, are now widely used by many websites and apps.
But a team of researchers at MIT found that these algorithms can be profoundly unfair. In a new study, they show that there will always be a network scenario where at least one sender receives almost no bandwidth compared to other senders; that is, a problem known as starvation cannot be avoided.
“What’s really amazing about this paper and the results is that when you take into account the real complexity of network paths and everything they can do with data packets, it’s basically impossible for the congestion control algorithms that control lag avoid starvation using current methods,” says Mohammad Alizadeh, associate professor of electrical and computer engineering (EECS).
While Alizadeh and her co-authors were unable to find a traditional congestion control algorithm that could prevent starvation, there may be algorithms in a different class that could prevent this problem. Their analysis also suggests that changing the way these algorithms work, so that they allow for larger variations in delay, could help prevent starvation in some network situations.
Alizadeh wrote the paper with first author and EECS graduate student Venkat Arun and lead author Hari Balakrishnan, a Fujitsu professor of computer science and artificial intelligence. The research will be presented at the ACM Special Interest Group on Data Communications (SIGCOMM) conference.
Congestion control is a fundamental problem in networks that researchers have been trying to address since the 1980s.
A user’s computer doesn’t know how fast to send data packets over the network because it lacks information, such as the quality of the network connection or how many other senders are using the network. Sending packets too slowly makes poor use of available bandwidth. But sending them too fast can overload the network and by doing so the packets will start to get lost. These packets must be resent, which results in longer delays. Delays can also be caused by packets waiting in queues for a long time.
Congestion control algorithms use packet loss and delay as signals to infer congestion and decide how fast to send data. But the Internet is complicated, and packets can be delayed and lost for reasons unrelated to network congestion. For example, data could be held in a queue en route and then released with a burst of other packets, or receiver acknowledgment could be delayed. The authors call delays that aren’t caused by congestion “jitters.”
Even if a congestion control algorithm measures delay perfectly, it cannot differentiate between delay caused by congestion and delay caused by jitter. The delay caused by jitter is unpredictable and confuses the sender. Because of this ambiguity, users begin to estimate delay differently, causing them to send packets at uneven speeds. Eventually this leads to a situation where starvation sets in and someone is completely left out, Arun explains.
“We started the project because we lacked a theoretical understanding of the behavior of congestion control in the presence of fluctuations. To put it on a firmer theoretical footing, we built a mathematical model that was simple enough to think about, but capable of capturing some of the the complexities of the Internet. It’s been very rewarding to have mathematics tell us things we didn’t know and that have practical relevance,” he says.
The researchers fed their mathematical model into a computer, gave it a series of commonly used congestion control algorithms, and asked the computer to find an algorithm that could prevent starvation, using their model.
“We couldn’t do it. We tried all the algorithms we knew about and some new ones we created. Nothing worked. The computer always found a situation where some people get all the bandwidth and at least one person gets basically nothing,” says Arun.
The researchers were surprised by this result, especially since these algorithms are believed to be reasonably fair. They began to suspect that it may not be possible to prevent starvation, an extreme form of injustice. This motivated them to define a class of algorithms that they call “delay convergent algorithms” which showed that they will always starve under their network model. All existing congestion control algorithms that control delay (that researchers know of) are convergent to delay.
The fact that the very simple failure modes of these widely used algorithms remained unknown for so long illustrates how difficult it is to understand the algorithms through empirical testing alone, adds Arun. He stresses the importance of a solid theoretical foundation.
But all hope is not lost. While all of the algorithms they tested failed, there may be other algorithms that aren’t convergent with delay and could prevent starvation. This suggests that one way around the problem could be to design congestion control algorithms that vary the delay range more widely. so the range is greater than any delay that may occur due to jitter in the network.
“To control delays, algorithms have also tried to limit variations in delay about a desired balance, but there’s nothing wrong with potentially creating more variation in delay to get better measurements of congestive delays. It’s just a new design philosophy that you would have to adopt,” adds Balakrishnan.
Now, the researchers want to keep pushing to see if they can find or build an algorithm that will eliminate starvation. They also want to apply this mathematical modeling and computational testing approach to other thorny and unsolved problems in networked systems.
“We rely more and more on computer systems for very critical things, and we need to put their reliability on a firmer conceptual footing. We’ve shown the amazing things you can discover when you take the time to come up with these formal specifications of what really is the problem,” says Alizadeh.