Build tensorflow-gpu on GNU/Linux Debian
I am running GNU/Linux Debian with Nvidia 9.1 drivers and cuDNN 7.1.
- Tensorflow-gpu pip packages are build for specific CUDA
- CUDA Toolkit files from Debian packages are placed in different location than from conventional installer
That means that with GNU/Linux Debian packages, I can not use pip directly, since versions of libraries might not match. I can not also easily compile tensorflow source against latest CUDA Toolkit installed from packages, because of misplaced header files and shared objects.
I solve it by installing CUDA Toolkit that matches version from packages by running local insteller downloaded from Nvidia website. Only toolkit is required without drivers or examples. Than I build package for Python.
Instructions
Download tensorflow
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v1.8.0 # Check out latest branch
# Patch to compile with gcc-6, at this moment this is latest supported compiler
git cherry-pick e489b60
Download CUDA Toolkit
Download version of CUDA that matches version from packages, in my case it is 9.1. Install files, except for drivers and samples, into standard location.
Download CuDNN
Download and install CUDA Deep neural network library matching CUDA Tookit
cuDNN v7.1.3 Runtime Library for Ubuntu16.04 (Deb) cuDNN v7.1.3 Developer Library for Ubuntu16.04 (Deb)
Download bazel
To compile and build
# Add repository
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
# Add key
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
# Update packages information
apt-get update
# Install bazel
apt-get install bazel
Compile tensorflow-gpu
Run configure script and give default answers to most questions except CUDA which should be enabled. Pick the gcc compiler as gcc-6. If you don't have gcc-6 installed
apt-get install gcc-6 g++-6
Run bazel
bazel build --config=opt --config=cuda --incompatible_load_argument_is_label=false //tensorflow/tools/pip_package:build_pip_packagec
Building binaries took roughly 2 hours on 8 cores i7-6700HQ CPU @ 2.60GHz. Next build and install the python package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
cp /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl ~/
pip3 install ~/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl
Run example
model = Sequential() model.add(Dense(32, activation='relu', input_dim=100)) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']) # Generate dummy data import numpy as np data = np.random.random((1000, 100)) labels = np.random.randint(2, size=(1000, 1)) # Train the model, iterating on the data in batches of 32 samples model.fit(data, labels, epochs=10, batch_size=32) Epoch 1/10 2018-04-29 13:03:10.454166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-04-29 13:03:10.454636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176 pciBusID: 0000:01:00.0 totalMemory: 3.95GiB freeMemory: 3.91GiB 2018-04-29 13:03:10.454652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Ignoring visible gpu device (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0) with Cuda compute capability 5.0. The minimum required Cuda capability is 5.2. 2018-04-29 13:03:10.454660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-29 13:03:10.454664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-04-29 13:03:10.454669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 1000/1000 [==============================] - 0s 293us/step - loss: 0.7168 - acc: 0.4890 Epoch 2/10 1000/1000 [==============================] - 0s 31us/step - loss: 0.7045 - acc: 0.4940 Epoch 3/10 1000/1000 [==============================] - 0s 26us/step - loss: 0.6993 - acc: 0.5100 Epoch 4/10 1000/1000 [==============================] - 0s 27us/step - loss: 0.6937 - acc: 0.5180 Epoch 5/10 1000/1000 [==============================] - 0s 28us/step - loss: 0.6886 - acc: 0.5410 Epoch 6/10 1000/1000 [==============================] - 0s 27us/step - loss: 0.6806 - acc: 0.5560 Epoch 7/10 1000/1000 [==============================] - 0s 26us/step - loss: 0.6814 - acc: 0.5600 Epoch 8/10 1000/1000 [==============================] - 0s 28us/step - loss: 0.6736 - acc: 0.5630 Epoch 9/10 1000/1000 [==============================] - 0s 26us/step - loss: 0.6717 - acc: 0.6010 Epoch 10/10 1000/1000 [==============================] - 0s 27us/step - loss: 0.6698 - acc: 0.5990 Out[2]: <keras.callbacks.History at 0x7fd1ef63eeb8>
It works now!