Compute is ephemeral while data has gravity

The shift from compute to data centric computing is driven by a confluence of two trends; The first is an increase in the data collected and the second is using this data to unpack additional value in the supply chain. 

The explosion in data collection is driven by the increase in the number of computing devices. Historically, this has increased by orders of magnitude with each generation, starting with mainframes, to PC’s and mobile devices. While there were only a handful of mainframe computers and a PC for every few people, mobile devices are ubiquitous; two thirds of adult human beings on the planet possess one. The growth of IOT devices will follow the same exponential trend set by ancestor devices, there will be many IOT devices per person. But unlike ancestor devices, IOT devices will be specialized and mostly autonomous. 

Autonomous edge computing

Traditionally in computing, the value of data increased when it was shared. Excel spreadsheets became more useful when shared with with co-workers, photos and videos when shared with family and friends. However specialized devices collect data about their immediate environment, which may not be useful to another device in a different environment. For example, autonomous cars collect 10GB data for every mile; it neither necessary nor possible to transfer all this data over the internet and back for real time decision making. As data about a car's current environment changes in real time, the data from the past is no longer relevant, and does not need to be stored. Additionally, this raw data is not useful to another car at a different location. Enabled by higher bandwidth at lower latencies, edge computing facilitates faster extraction of value from large amounts of data.

The inability to transfer large amounts of data over the internet will drive collaborative machine learning models like federated learning. Under this model, data collection and processing agents will run at the edge and transfer a summary of their learnings to the cloud. The cloud is responsible for merging (averaging) the learnings and distributing it back to the edges. In the case of autonomous cars, the learnings from each autonomous vehicle will be shared with the cloud. The cloud merges the learnings and redistributes it to all the other autonomous vehicles. 

This trend has already started at Google, where their engineers are working on federated collaborative machine learning without centralizing training data. Apple released a Basic Neural Network Subroutine (BNNS) library enabling neural nets to run on the client. Currently BNNS does not train the neural net, but this is a next logical step. Specialized computers and systems will be built that are data centric, i.e. with the ability to move large amounts of data for processing at very high rates. One of the first examples is Google’s Tensor Processing Unit (or TPU) which outperforms standard processors by an order of magnitude. In the near future every mobile device will have a SOC that is capable of running a fairly complex neural network. Data and applications that consume this data will be collocated, creating autonomous edge computing systems. 

The gravity of data

As the cost of compute has been going down, the big three cloud vendors (AWS, Azure and Google) provide more services around data. Larger amounts of data will need more compute and higher bandwidth at lower latencies to extract value. It is easier to bring compute to the data than the other way around.

These vendors now provide AI and machine learning as a service to extract value from this data. In the near future, it will be possible to automatically analyze and transform the data to provide actionable insights. Think of the raw data as database tables and transformed data as indexes which co-exist with the tables. The vendors will automate data transformation and analysis, locking the data in and making it non portable. Organizations should ensure that the process of value extraction is not dependent on a vendor’s proprietary technology, and the transformed data stays portable.

So in summary

  1. We have shifted from being compute to data centric.
  2. Large, temporal data will drive autonomous edge computing and federated machine learning. 
  3. Enterprises should not use proprietary technology for extracting value from their data.