One of the big motivations to get a NVIDIA 1060 GPU is to start playing with deep learning neural networks, using tools like Google’s TensorFlow. A quick skim of the installation instructions didn’t look too bad, but once I got hands-on it was clear version mismatches were a recurring issue for running the TensorFlow binaries.
Ubuntu 16.04.4 Operating System
Since I had only a fresh installation of Ubuntu, it was not a big loss (aside from time) to restart from scratch. Goodbye, new hotness Ubuntu 18.04 LTS, we’re leaving you to return to the known quantity Ubuntu 16.04.4 LTS.
NVIDIA Graphics Driver 390
Once Ubuntu 16.04 was up and running with all its latest operating system updates, we proceed to the only part of TensorFlow setup that doesn’t require a specific (old) version: the NVIDIA graphics drivers. Going straight to NVIDIA’s website found this suggestion:
Note that many Linux distributions provide their own packages of the NVIDIA Linux Graphics Driver in the distribution’s native package management format. This may interact better with the rest of your distribution’s framework, and you may want to use this rather than NVIDIA’s official package.
Hmm… OK. Checking Ubuntu’s user support site AskUbuntu for NVIDIA drivers found this set of instructions to install drivers from a graphics drivers project. Staffed by volunteers that repackage NVIDIA’s raw binaries into Ubuntu installer-friendly “personal package archives” (PPA) infrastructure. Following the instructions, I told my installation of Ubuntu about them via:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
Then I had to deviate from the directions because
sudo apt-get install nvidia-driver-390 returned an error
E: Unable to locate package nvidia-drivers-390. A little searching using the
apt search command found the correct name in the PPA:
sudo apt-get install nvidia-390
After driver installation, the graphics driver project page recommends that I test my system by running the Phoronix graphics test suite. It gave my system a good workout and provided some pretty visuals while the tests ran. After the tests were complete, I was happy to upload my system information to their database of installation successes.
NVIDIA CUDA Toolkit 9.0
As of this writing, the TensorFlow binaries released by Google requires NVIDIA CUDA toolkit version 9.0. The latest version of CUDA toolkit is 9.2 but sadly it will not work with TensorFlow binaries, so we have to go find the older version 9.0 in NVIDIA’s past version archives. Again we have to deviate from instructions because
sudo apt-get install cuda would install the latest 9.2. We have to be explicit about the older version with
sudo apt-get install cuda-9.0.
After the toolkit was installed, follow instructions to compile and run the tools
bandwidthTest to verify installation.
NVIDIA cnDNN SDK 7.0.5
TensorFlow instruction specifies cnDNN v7. The latest version is 7.1.4 as of this writing and based on other experience probably wouldn’t work, so off we go into the old version archives again. There we will find two versions corresponding to CUDA 9.0: 7.0.4 and 7.0.5. Which one to use?
The hint comes from elsewhere in TensorFlow installer documentation talking about an optional component, NVIDIA TensorRT 3.0. The instructions there made a reference to
18.104.22.168-1+cuda9.0 so that swayed the vote on which version to use.
After the SDK was installed, follow instructions to compile and run the tool
mnistCUDNN to verify installation.
After all those pre-requisites were satisfied, it was finally time to install TensorFlow itself! I decided to go with Python 3 for my TensorFlow experimentation, and followed recommendation of
virtualenv, before installing the GPU-enabled variant of TensorFlow.
pip3 install --upgrade tensorflow-gpu
After bumping head in the dark against version mismatch error for hours, it felt great to finally cross the finish line. Hello, TensorFlow!