About six years ago, Google (NASDAQ:GOOG) realized they were facing a serious challenge that would have cost them plenty hadn’t they figured out an innovative way around it. This was the situation: if all of Google’s Android users accessed their voice recognition services for just three minutes per day, to be able to handle all the requests to the machine learning system powering those services, they would have needed twice as many data centers (spanning 15 locations in four continents) from what they had. That would have of course entailed a significant amount of investment.
Rather than purchase more land and build more data centers, the search giant opted to create new hardware specifically designed to run and power machine-learning systems, including voice recognition. And Google’s solution shall forever be known as its Tensor Processing Unit (or TPUs).
It was in May 2016 when Google first talked about this custom-made processor. And now, almost a year later, details about the project have finally been revealed. The company recently released a paper discussing how the TPUs work and what specific problems it can solve.
In simplest terms, a TPU is a computer chip that’s designed to speed up the ‘thinking phase’ of deep neural networks — those complicated mathematical systems that can think, reason and learn how to do certain tasks and solve problems by analyzing huge amounts of data.
Based on the tests done, Google’s TPU did considerably better than other comparable chips like Intel’s Haswell CPU and NVIDIA’s K80 GPU. Specifically, it ran 15 – 30 times faster at machine-learning inference tasks and did 30 – 80 times better in terms of performance per watt. And that’s just the beginning. Because Google claims there’s still room for more improvement.
To be clear, Google isn’t the first to use a dedicated chip for executing neural networks. Microsoft and China’s Internet giant Baidu are doing it as well. What differentiates Google’s effort is that it built its chip from scratch.
Technically known as an application-specific integrated circuit (ASIC), Google chose to go this route instead of using programmable FPGA chips (like those used by Microsoft) because they prioritized speed rather than programmability. They built the chip for one specific task: to execute neural networks. This doesn’t mean, however, that it won’t be able to do anything else.
As Norman Jouppi (one of Google’s distinguished hardware engineers who developed the chip) explained to NextPlatform: “The TPU is programmable like a CPU or GPU. It isn’t designed for just one neural network model; it executes CISC instructions on many networks (convolutional, LSTM models, and large, fully connected models). So it is still programmable, but uses a matrix as a primitive instead of a vector or scalar.”
For now, the TPU’s sole purpose is to execute neural networks; training those neural networks isn’t part of its function. As limited and specific as its role is, however, having this chip saved Google from spending quite a lot. It has also established another area where Google can potentially excel — neural networking.
In a blog post they shared, Jouppi said: “We’re committed to building the best infrastructure and sharing those benefits with everyone. We look forward to sharing more updates in the coming weeks and months.”