Since currently a server is shared between numerous lab members, I usually need to run my code with non-root environment. All too often, I find it is indispensable to install new packages and polish the running environment, yet it is extremely inconvenient with a non-root user. Therefore, I plan to use docker
, which could separate each other’s running environment. Here I record how I make a deep learning docker image from a basic Ubuntu image.
First I check which version of CUDA and driver were installed in the server previously (stable version is preferred and I check that to avoid unnecessary pitfall)
cat /proc/driver/nvidia/version
cat /usr/local/cuda/version.txt
The command nvcc --version
gives the CUDA compiler version (which matches the toolkit version).
Then, we should use nvidia-docker
to enable the docker container to use GPU of server (QuickStart) :
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker
# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Moreover, I imitate the Dockerfile written previously by a machine learning server provider:
Dockerfile.gpu1
And I use the docker image built by nvidia (find the version that suits you in this website), which is 9.0-cudnn7-devel-ubuntu16.04 (Dockerfile).
apt-get update
apt-get install -y bc \
build-essential \
cmake \
curl \
g++ \
gfortran \
git \
libopenblas-dev \
software-properties-common \
vim \
wget
apt-get clean
apt-get autoremove
rm -rf /var/lib/apt/lists/*
BLAS
library to use OpenBLAS using the alternative mechanism (https://www.scipy.org/scipylib/building/linux.html#debian-ubuntu)update-alternatives --set libblas.so.3 /usr/lib/openblas-base/libblas.so.3
curl -O https://bootstrap.pypa.io/get-pip.py && \
python3 get-pip.py && \
rm get-pip.py
pip --no-cache-dir install tensorflow-gpu