Why we need GPUs for Deep Learning

2 minute read

What I Learned: Why we need GPUs for Deep Learning

For the past few months I’ve been working on several deep learning projects. At the beginning of my course on the subject, my class was told that we would need to use GPUs, and would therefore be working with Google Colab.

Since then, I’ve been wondering why. As someone without a computer science background, the hardware side of data science is all quite new to me, so I wanted to look into this a little bit.

A GPU is a processor that is good at handling specialized computations. This is in contrast to a CPU, which is a better fit for general computations. CPUs power post of the typical computations on our electronic devices. A GPU can be much faster than a CPU, but this depends on the type of computation being performed. For computations that can be parallelized, GPUs excel.

Parallel computing is a type of computation whereby a particular computation is broken down into independent, smaller computations that can be carried out simultaneously. The resulting computations are then recombined (or “synchonrized”) to form the result of the original computation. The number of tasks that a larger computation can be broken into depends on the number of cores contained on a particular piece of hardware. CPUs typically have 4, 8, or 16 cores, while GPUs have potentially thousands.

Neural networks are “embarassingly parallel,” meaning that little or no effort is required to divvy up the task into smaller computations. This is because NNs are structured in a very uniform manner such that at each layer of the network thousands of identical artificial neurons perform the same computation. So, the structure of a NN fits quite well with the kinds of computation that a GPU can efficiently perform.

Additionally, CPUs are latency optimized, while GPUs are bandwidth optimized. This means that CPUs are good at fetching small amounts of memory quickly, while GPUs are superior for fetching large amounts of memory, as is needed for things like matrix multiplication. So the more memory a computational operation requires, the more significant the advantage of GPUs over CPUs. While we could imagine latency hurting GPU performance, thread parallelism overcomes this challenge.

There is of course much more nuance to the advantages of one or the other, and there are certain use cases when CPUs may actually be quicker than GPUs for neural networks, particularly for smaller tasks.