Compiling Tensorflow under Debian Linux with GPU support and CPU extensions

Tensorflow is a wonderful tool for Differentiable Neural Computing (DNC) and has enjoyed great success and market share in the Deep Learning arena. We usually use it with python in a prebuild fashion using Anaconda or pip repositories. What we miss that way is the chance to enable optimizations to better use our processing capabilities as well as do some lower level computing using C/C++.

The purpose of this post is to be a guide for compiling Tensorflow r1.4 on Linux with CUDA GPU support and the high performance AVX and SSE CPU extensions.

This guide is largely based on the official Tensorflow Guide and this snippet with some bug fixes from my side.

1. Install python adependencies:

sudo apt-get install python-numpy python-dev python-pip python-wheel python-setuptools

2. Install GPU prerequisites:
  • CUDA developer and drivers
  • CUDNN developer and runtime
Make sure cuddn libs are copied inside the cuda/lib64 directory usually found under /usr/local/cuda.

sudo apt-get install libcupti-dev

3. Install Bazel google's custom build tool:

sudo apt-get install openjdk-8-jdk

echo "deb [arch=amd64] stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list

curl | sudo apt-key add -

sudo apt-get update && sudo apt-get install bazel

sudo apt-get upgrade bazel

4. Configure Tensorflow:

git clone

cd tensorflow

git checkout r1.4

## don't use clang for nvcc backend [] 
## when asked for the path to the gcc compiler, make sure it points to a version <= 5 

5. Compile with the SSE and AVX flags and install using pip:

# set locale to en_us []

export LC_ALL=en_us.UTF-8

export LANG=en_us.UTF-8

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --incompatible_load_argument_is_label=false --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sudo pip install /tmp/tensorflow_pkg/tensorflow-1.4.1*

6. Test that everything works:

cd ~/


>>> import tensorflow as tf

>>> session = tf.InteractiveSession()
>>> init = tf.global_variables_initializer()
## At this point if your get a malloc.c assertion failure, it is due to a wrong CUDA configuration (ie not using the runtime version)

At this point there should not be any CPU warning and the GPU should be initialized.