Cloud computing

Compute is ephemeral while data has gravity

The shift from compute to data centric computing is driven by a confluence of two trends; The first is an increase in the data collected and the second is using this data to unpack additional value in the supply chain. 

The explosion in data collection is driven by the increase in the number of computing devices. Historically, this has increased by orders of magnitude with each generation, starting with mainframes, to PC’s and mobile devices. While there were only a handful of mainframe computers and a PC for every few people, mobile devices are ubiquitous; two thirds of adult human beings on the planet possess one. The growth of IOT devices will follow the same exponential trend set by ancestor devices, there will be many IOT devices per person. But unlike ancestor devices, IOT devices will be specialized and mostly autonomous. 

Autonomous edge computing

Traditionally in computing, the value of data increased when it was shared. Excel spreadsheets became more useful when shared with with co-workers, photos and videos when shared with family and friends. However specialized devices collect data about their immediate environment, which may not be useful to another device in a different environment. For example, autonomous cars collect 10GB data for every mile; it neither necessary nor possible to transfer all this data over the internet and back for real time decision making. As data about a car's current environment changes in real time, the data from the past is no longer relevant, and does not need to be stored. Additionally, this raw data is not useful to another car at a different location. Enabled by higher bandwidth at lower latencies, edge computing facilitates faster extraction of value from large amounts of data.

The inability to transfer large amounts of data over the internet will drive collaborative machine learning models like federated learning. Under this model, data collection and processing agents will run at the edge and transfer a summary of their learnings to the cloud. The cloud is responsible for merging (averaging) the learnings and distributing it back to the edges. In the case of autonomous cars, the learnings from each autonomous vehicle will be shared with the cloud. The cloud merges the learnings and redistributes it to all the other autonomous vehicles. 

This trend has already started at Google, where their engineers are working on federated collaborative machine learning without centralizing training data. Apple released a Basic Neural Network Subroutine (BNNS) library enabling neural nets to run on the client. Currently BNNS does not train the neural net, but this is a next logical step. Specialized computers and systems will be built that are data centric, i.e. with the ability to move large amounts of data for processing at very high rates. One of the first examples is Google’s Tensor Processing Unit (or TPU) which outperforms standard processors by an order of magnitude. In the near future every mobile device will have a SOC that is capable of running a fairly complex neural network. Data and applications that consume this data will be collocated, creating autonomous edge computing systems. 

The gravity of data

As the cost of compute has been going down, the big three cloud vendors (AWS, Azure and Google) provide more services around data. Larger amounts of data will need more compute and higher bandwidth at lower latencies to extract value. It is easier to bring compute to the data than the other way around.

These vendors now provide AI and machine learning as a service to extract value from this data. In the near future, it will be possible to automatically analyze and transform the data to provide actionable insights. Think of the raw data as database tables and transformed data as indexes which co-exist with the tables. The vendors will automate data transformation and analysis, locking the data in and making it non portable. Organizations should ensure that the process of value extraction is not dependent on a vendor’s proprietary technology, and the transformed data stays portable.

So in summary

  1. We have shifted from being compute to data centric.
  2. Large, temporal data will drive autonomous edge computing and federated machine learning. 
  3. Enterprises should not use proprietary technology for extracting value from their data.

References

  1. https://www.wired.com/2017/04/building-ai-chip-saved-google-building-dozen-new-data-centers/
  2. http://a16z.com/2016/12/16/the-end-of-cloud-computing/
  3. http://www.zdnet.com/article/data-gravity-the-reason-for-a-clouds-success/

Speeding up the Internet

The internet is a aggregation of multiple interconnected terrestrial, submarine and radio networks. 

Tier 1 networks which are like global freeways, able to deliver to destinations across multiple continents. Multiple tier 1 networks exchange traffic without payments through peering agreements based on reciprocity. Tier 2/3 networks cover a smaller geographical area and get global access by making transit payments tier 1 networks. These networks form the backbone of the internet.  

The last mile is the part of the network that connects connects people's homes and businesses into the larger internet. The last mile is typically part of the ISP’s network. The middle mile is the segment linking the ISP’s core network to the local, last mile networks.

When comparing the internet to a tree,  the tier 1 and 2 networks are the trunk, the middle mile the branches, connecting a very large number of “last mile" leaves. These last mile links are the most numerous and most expensive part of the system.  This is illustrated by the chart below.

  The internet as an inverted tree

The internet as an inverted tree

In each the next sections we look into a major issue affecting latency and ways to improve performance.

Outdated protocols

The core internet protocols and routing mechanisms have not changed significantly in the last 40 years. TCP was built for reliability and to avoid congestion and served those purposes wonderfully well but is starting to show its age. Major issues include

  1. TCP is a round trip protocol and the only way to reduce latency is to reduce round trips.
  2. High overhead to acknowledge every window of data packets sent. This particularly affects streaming video where the distance between server and client constrains download speeds.
  3. Larger networks and greater distances increase Round Trip Time (RTT), packet loss and decrease bandwidth. 
  4. It is not very efficient when handling large payloads like streaming video which in the US accounted for over 70% of internet traffic in late 2015 and is expected to grow to 80% by 2019.  

The chart below illustrates the inverse relationship between RTT and bandwidth. There is an exponential decrease in throughput. This has been derived from this report published by Akamai.

 Increase in latency decreases throughput exponentially

Increase in latency decreases throughput exponentially

There are a number of optimizations to mitigate these problems. These include

  1. Using pools of persistent connections to eliminate setup and teardown overhead.
  2. Compressing content to reduce the number of TCP roundtrips.
  3. Sizing TCP window based on real-time network latency. Under good conditions, an entire payload can be sent by setting a large initial window, eliminating the wait for an acknowledgement. Larger windows can result in increased throughput over longer distances.
  4. Intelligent retransmission after packet loss by leveraging network latency information, instead of relying on the standard TCP timeout and retransmission protocols. This could mean shorter and aggressive retransmission timeouts under good network conditions. 
  5. QUIC, a replacement for TCP built over UDP and HTTP2 include some of these optimizations from ground up.

Congestion 

Congestion happens at many different points on the Internet including peering points, middle and last miles.

Peering points are the interconnections where different networks exchange traffic. Border Gateway Protocol (BGP) is used to exchange routing information between different networks. This older protocol has a number of limitations affecting its ability to keep up with the increase in routes and traffic. Peering points can be deliberately used to throttle traffic and charge for increase in transit capacity as in the case of Netflix vs Comcast (and Verizon).

The middle mile connects the last mile to the greater internet. Last mile which is typically part of the ISP’s network, is the network segment connecting homes and businesses. These last mile links are the most numerous and most expensive part of the system. Goldman Sachs estimated it would cost Google $140 billion to build out a nationwide network or $1000/home on average.

The large capital investment limits competition and creates an imperfect market dominated by few players. Almost a third of US households have no choice for broadband internet service. There is no incentive for these corporations to improve the service due to lack of competition. In locations where Google Fiber has entered the market, other providers have duplicated Google FIber’s price and levels of service

Minimizing long-haul content and computing

Distributing content and computing to the edge presents a unique set of challenges, especially for mobile devices.

  1. Mobile devices are likely to switch between cellular and wi-fi networks, resulting in disconnections.
  2. Mobile applications are personalized and interact through APIs (not static HTML web pages accessed through computer browsers) which cannot be cached as efficiently by CDNs.
  3. Streaming content especially video has to be optimized for different devices and resolutions. The downloadable stream is a function of both the device resolution and the bandwidth availability based on real time network conditions. 

These issues can be addressed by bringing content and computing is as close to the users as possible.

Content delivery networks are effective in reducing latency and increasing throughput by taking content close to the users. Ideally content servers are located within each users ISP and geography. This minimizes the reliance on inter-network and long-distance communications especially through the middle-mile bottleneck of the Internet. Better end-user experiences over cellular networks can be enabled by distributing content and delivery directly inside Mobile Operators core networks.

When computing is moved to the edge, data goes along with it. Data distributed to the edges has to be managed carefully to prevent concurrent modifications. Solutions have to be create to merge data, resolve conflicts automatically or keep source data at source in escrow and update when the computing is complete. 

Summary

Latency over the internet is fundamentally limited by the speed of light, aging infrastructure (networks and protocols), lack of competition and increase in streaming video traffic. These problems can be addressed by moving content and computing closer to the edge, building smarter applications using newer protocols and increased competition leading to investments in the last mile internet.  

Cloudonomics - maximizing return on cloud investment

In this post, I address the question "Is my organization maximizing ROI from the cloud" exploring various options with practical examples.

Use reservations effectively

Cloud vendors like AWS and Azure provide substantial discounts for longer term commitments. For example AWS provides significant discounts on EC2 instances, DynamoDB, RDS, Elasticache for a longer commitment. Target maximizing coverage through reservations by reserving resources where projected usage equals or exceeds reservation discount percentage.

Here is a simple example illustrating savings using reserved instances over on-demand pricing using this technique. Lets assume there are a total of 200 instances used, 100 of which are used 24 hours, 50 for 18 hours and 50 more for 12 hours daily. Lets further assume that one year reservations provide 35% discount over on-demand pricing, monthly on-demand costs are $100 while reserved costs are $65.  In this case, the rule of thumb here is to reserve any capacity that is used for more than 65% of the time which is the break even in terms of reservations.

Cost savings through reservations

Instances/Costs On-demand costs $ Reserved costs $ To reserve Actual costs $ Savings $
(On-demand - Actual costs)
100 instances used 100% of the time 10,000 6,500 Yes 6,500 3,500
50 instances used 75% of the time 3,750 3,250 Yes 3,250 500
50 instances used 50% of the time 2,500 3,250 No 2,500 N.A
Total 16,250 N.A N.A 12,250 4,000

Simply reserving 150 instances results in savings of 24%. Keep in mind is that instance types change frequently and the benefits of reserving instances for 3 years is not as clear. However for managed services like Dynamo it is more economical to invest in 3 year reservations than 1 year as the customers are insulated from underlying hardware changes.

There are cloud service management platforms like CloudHealth Technologies which automate these processes to maximize ROI from the cloud.

Scaling through hardware and not optimizing enough

While the cloud is elastic and can scale with demand, costs can get out of hand quickly. It can encourage uneconomical behavior by using compute resources to solve problems rather than efficient engineering practices.

Too often, scaling problems are solved by adding hardware instead of an efficient design. An efficient design minimizes the amount of data transferred between different computing systems leading to lesser processing requirements resulting in cost savings.

The servers and databases become more efficient by processing smaller volume of data resulting in lower compute, memory and networking resource needs. Data transfer costs are reduced which can be passed on to the consumers.

Data compression can achieve the similar results. The CPU overhead to compress and uncompress is offset by gains in transferring lower amounts of data. 

Breaking down data transferred into smaller chunks involves a higher level of engineering effort. For example it is programmatically simpler to save an entire user profile when a single attribute has changed but is inefficient and expensive, especially when repeated. .

Use resources only when needed

Follow the business hours cost model where resources are active only when they are being used. This is usually 12 hours/day M-F,  60 of 168 hours a week which is 35.7% of the time. Resources for software development and testing are excellent examples for this type of usage.

Batch compute tasks, both scheduled and ad-hoc tasks run and release all resources when complete. A word of caution - on-demand compute capacity may not always be available so plan for variable schedules. 

Containerize

When an application is containerized, it can be deployed across different hardware consistently. When new hardware is introduced the application can be benchmarked it and used if more efficient (new hardware usually is).

Containerization future proofs applications. As applications mature, the engineers who built them may not be around and technology will have evolved. Redeploying the application on newer hardware might get complicated. For example, there is no clear migration path for legacy applications running on AWS from PV AMi’s to the newer HVM AMI’s without knowing how the application is configured or deployed. There are applications running on older AWS m1 and m2 instance families which cannot be moved to newer instances without significant engineering effort. A containerized application will not have this problem resulting, in savings over the entire lifecycle of the application.

In addition containerization facilitates on-demand usage, supporting use cases like business hours cost model and batch computing. 

Use managed services

Managed services tend to be more elastic and costs correlate to usage. For example AWS DynamoDB follows an elastic pricing model where capacity can be changed as needed, multiple times a day. In addition these services are managed programmatically lending themselves to a devops culture requiring lower administrative overhead. Other administrative tasks like backing data up and setting up cross region replication for high availability are greatly simplified. So managed services improve productivity and need fewer personnel.

In summary, to get the maximum ROI from the cloud, use reservations effectively, optimize the applications to minimize data transfer, follow business hours cost model to schedule resources during working hours, containerize to future proof and to take advantage of newer, cheaper hardware and finally use managed services which are elastic and simpler to manage.

Appendix

Here is a presentation on the same topic.

On-demand computing with AWS Lambda

Amazon introduced an event driven compute service, Lambda in Nov 2014, simplifying how applications are built and run. Lambda forces developers to be more modular by writing small, elegant functions instead of monolithic applications. Think micro-microservices.

But before using it as the computing platform for future applications, there is the following set of issues which prevent broader adoption across a wider variety of use cases.

  1. Expensive - Coarse pricing model in units of 100ms. For example requests that take 5ms to process are billed as one single transaction not utilizing the other 95ms. Fix this by billing for actual time utilized or if billing in 100ms increments, allow 20 such transactions to be processed in a unit of processing time.
  2. Statelessness - There is no guarantee of state between invocations. This is a good model for building applications, this is not for functions. Application contexts and database connection pools have to be reinitialized multiple times, which is inefficient.
  3. Primitive support for development, debugging and logging. 
  4. Inability to access resources behind VPC’s. This limits how Lambda interacts with other services and systems.

Lambda is the first step in towards moving to a fully elastic everything where where compute, memory, storage and network will be independently scalable and costs are tied to actual usage. Currently these are tightly coupled affecting flexibility needed to run modern applications. GP2 EBS volumes coupled with IOPS, simple but inflexible and uneconomical. In mature applications, the data size has grown needing larger volumes but usage has stabilized reducing the IOPS needed to handle current levels of usage. Migration to magnetic or S3 is not automatic and disruptive. As another example, in Lambda couples memory with compute, network and storage bandwidth; it is not possible to tune these resources to an actual use case. 

These problems will be addressed as Lambda continues to evolve; tools will get better and pricing models will become flexible.  For organizations that have not fully embraced the cloud yet, Lambda offers  an opportunity bypass AWS infrastructural services around instances and containers by starting directly with event driven computing. This is analogous to developing countries bypassing landlines in favor of mobile.