Speeding up the Internet

The internet is a aggregation of multiple interconnected terrestrial, submarine and radio networks. 

Tier 1 networks which are like global freeways, able to deliver to destinations across multiple continents. Multiple tier 1 networks exchange traffic without payments through peering agreements based on reciprocity. Tier 2/3 networks cover a smaller geographical area and get global access by making transit payments tier 1 networks. These networks form the backbone of the internet.  

The last mile is the part of the network that connects connects people's homes and businesses into the larger internet. The last mile is typically part of the ISP’s network. The middle mile is the segment linking the ISP’s core network to the local, last mile networks.

When comparing the internet to a tree,  the tier 1 and 2 networks are the trunk, the middle mile the branches, connecting a very large number of “last mile" leaves. These last mile links are the most numerous and most expensive part of the system.  This is illustrated by the chart below.

The internet as an inverted tree

The internet as an inverted tree

In each the next sections we look into a major issue affecting latency and ways to improve performance.

Outdated protocols

The core internet protocols and routing mechanisms have not changed significantly in the last 40 years. TCP was built for reliability and to avoid congestion and served those purposes wonderfully well but is starting to show its age. Major issues include

  1. TCP is a round trip protocol and the only way to reduce latency is to reduce round trips.
  2. High overhead to acknowledge every window of data packets sent. This particularly affects streaming video where the distance between server and client constrains download speeds.
  3. Larger networks and greater distances increase Round Trip Time (RTT), packet loss and decrease bandwidth. 
  4. It is not very efficient when handling large payloads like streaming video which in the US accounted for over 70% of internet traffic in late 2015 and is expected to grow to 80% by 2019.  

The chart below illustrates the inverse relationship between RTT and bandwidth. There is an exponential decrease in throughput. This has been derived from this report published by Akamai.

Increase in latency decreases throughput exponentially

Increase in latency decreases throughput exponentially

There are a number of optimizations to mitigate these problems. These include

  1. Using pools of persistent connections to eliminate setup and teardown overhead.
  2. Compressing content to reduce the number of TCP roundtrips.
  3. Sizing TCP window based on real-time network latency. Under good conditions, an entire payload can be sent by setting a large initial window, eliminating the wait for an acknowledgement. Larger windows can result in increased throughput over longer distances.
  4. Intelligent retransmission after packet loss by leveraging network latency information, instead of relying on the standard TCP timeout and retransmission protocols. This could mean shorter and aggressive retransmission timeouts under good network conditions. 
  5. QUIC, a replacement for TCP built over UDP and HTTP2 include some of these optimizations from ground up.

Congestion 

Congestion happens at many different points on the Internet including peering points, middle and last miles.

Peering points are the interconnections where different networks exchange traffic. Border Gateway Protocol (BGP) is used to exchange routing information between different networks. This older protocol has a number of limitations affecting its ability to keep up with the increase in routes and traffic. Peering points can be deliberately used to throttle traffic and charge for increase in transit capacity as in the case of Netflix vs Comcast (and Verizon).

The middle mile connects the last mile to the greater internet. Last mile which is typically part of the ISP’s network, is the network segment connecting homes and businesses. These last mile links are the most numerous and most expensive part of the system. Goldman Sachs estimated it would cost Google $140 billion to build out a nationwide network or $1000/home on average.

The large capital investment limits competition and creates an imperfect market dominated by few players. Almost a third of US households have no choice for broadband internet service. There is no incentive for these corporations to improve the service due to lack of competition. In locations where Google Fiber has entered the market, other providers have duplicated Google FIber’s price and levels of service

Minimizing long-haul content and computing

Distributing content and computing to the edge presents a unique set of challenges, especially for mobile devices.

  1. Mobile devices are likely to switch between cellular and wi-fi networks, resulting in disconnections.
  2. Mobile applications are personalized and interact through APIs (not static HTML web pages accessed through computer browsers) which cannot be cached as efficiently by CDNs.
  3. Streaming content especially video has to be optimized for different devices and resolutions. The downloadable stream is a function of both the device resolution and the bandwidth availability based on real time network conditions. 

These issues can be addressed by bringing content and computing is as close to the users as possible.

Content delivery networks are effective in reducing latency and increasing throughput by taking content close to the users. Ideally content servers are located within each users ISP and geography. This minimizes the reliance on inter-network and long-distance communications especially through the middle-mile bottleneck of the Internet. Better end-user experiences over cellular networks can be enabled by distributing content and delivery directly inside Mobile Operators core networks.

When computing is moved to the edge, data goes along with it. Data distributed to the edges has to be managed carefully to prevent concurrent modifications. Solutions have to be create to merge data, resolve conflicts automatically or keep source data at source in escrow and update when the computing is complete. 

Summary

Latency over the internet is fundamentally limited by the speed of light, aging infrastructure (networks and protocols), lack of competition and increase in streaming video traffic. These problems can be addressed by moving content and computing closer to the edge, building smarter applications using newer protocols and increased competition leading to investments in the last mile internet.