Building tensorflow from source

29 Apr 2018, 13:23

Build tensorflow-gpu on GNU/Linux Debian

I am running GNU/Linux Debian with Nvidia 9.1 drivers and cuDNN 7.1.

Tensorflow-gpu pip packages are build for specific CUDA
CUDA Toolkit files from Debian packages are placed in different location than from conventional installer

That means that with GNU/Linux Debian packages, I can not use pip directly, since versions of libraries might not match. I can not also easily compile tensorflow source against latest CUDA Toolkit installed from packages, because of misplaced header files and shared objects.

I solve it by installing CUDA Toolkit that matches version from packages by running local insteller downloaded from Nvidia website. Only toolkit is required without drivers or examples. Than I build package for Python.

Instructions

Download tensorflow

  git clone https://github.com/tensorflow/tensorflow
  cd tensorflow
  git checkout v1.8.0 # Check out latest branch
  # Patch to compile with gcc-6, at this moment this is latest supported compiler
  git cherry-pick e489b60

Download CUDA Toolkit

Download version of CUDA that matches version from packages, in my case it is 9.1. Install files, except for drivers and samples, into standard location.

Download CuDNN

Download and install CUDA Deep neural network library matching CUDA Tookit

cuDNN v7.1.3 Runtime Library for Ubuntu16.04 (Deb)
cuDNN v7.1.3 Developer Library for Ubuntu16.04 (Deb)

Download bazel

To compile and build

  # Add repository
  echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
  # Add key
  curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
  # Update packages information
  apt-get update
  # Install bazel
  apt-get install bazel

Compile tensorflow-gpu

Run configure script and give default answers to most questions except CUDA which should be enabled. Pick the gcc compiler as gcc-6. If you don't have gcc-6 installed

  apt-get install gcc-6 g++-6

Run bazel

    bazel build --config=opt --config=cuda --incompatible_load_argument_is_label=false //tensorflow/tools/pip_package:build_pip_packagec

Building binaries took roughly 2 hours on 8 cores i7-6700HQ CPU @ 2.60GHz. Next build and install the python package

    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    cp /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl ~/
    pip3 install ~/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl

Run example

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy']) 
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))
 
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
 

Epoch 1/10
2018-04-29 13:03:10.454166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-29 13:03:10.454636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.91GiB
2018-04-29 13:03:10.454652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Ignoring visible gpu device (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0) with Cuda compute capability 5.0. The minimum required Cuda capability is 5.2.
2018-04-29 13:03:10.454660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-29 13:03:10.454664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-04-29 13:03:10.454669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
1000/1000 [==============================] - 0s 293us/step - loss: 0.7168 - acc: 0.4890
Epoch 2/10
1000/1000 [==============================] - 0s 31us/step - loss: 0.7045 - acc: 0.4940
Epoch 3/10
1000/1000 [==============================] - 0s 26us/step - loss: 0.6993 - acc: 0.5100
Epoch 4/10
1000/1000 [==============================] - 0s 27us/step - loss: 0.6937 - acc: 0.5180
Epoch 5/10
1000/1000 [==============================] - 0s 28us/step - loss: 0.6886 - acc: 0.5410
Epoch 6/10
1000/1000 [==============================] - 0s 27us/step - loss: 0.6806 - acc: 0.5560
Epoch 7/10
1000/1000 [==============================] - 0s 26us/step - loss: 0.6814 - acc: 0.5600
Epoch 8/10
1000/1000 [==============================] - 0s 28us/step - loss: 0.6736 - acc: 0.5630
Epoch 9/10
1000/1000 [==============================] - 0s 26us/step - loss: 0.6717 - acc: 0.6010
Epoch 10/10
1000/1000 [==============================] - 0s 27us/step - loss: 0.6698 - acc: 0.5990
Out[2]: <keras.callbacks.History at 0x7fd1ef63eeb8>

It works now!