assume we are working with Ubuntu 16.04 xenial and Docker 17.09.1-ce:
1 2 3 4 5 6 7 8 9 | $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial $ docker --version Docker version 17.09.1-ce, build 19e2cf6 |
1. go to ngc.nvidia.com and get NGC tensorflow image link. Then make docker login to NGC registry:
1 2 3 4 | docker login nvcr.io Username: $oauthtoken Password: <Your Key> |
We are prompted here to type username which is «$oauthtoken» and password which is an API key. API key is generated at the NGC user account, in the «Configuration» section.
Now we get NGC tensorflow image. Go to «Registry» section, choose nvidia/tensorflow image section, scroll down and copy the needed (latest) tag pull command. Currently its 17.12 tag and the command is the following:
1 | $ docker pull nvcr.io /nvidia/tensorflow :17.12 |
Now we have the image in the local images registry:
1 2 3 4 5 | $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE hello-world latest f2a91732366c 5 weeks ago 1.85kB nvcr.io /nvidia/tensorflow 17.12 19afd620fc8e 5 weeks ago 2.88GB ubuntu latest 20c44cd7596f 5 weeks ago 123MB |
Now we will make our own image based on NVIDIA one.
1. get miniconda
installer:
1 | wget https: //repo .continuum.io /miniconda/Miniconda3-4 .2.12-Linux-x86_64.sh |
I am downloading the specific 4.2 version as a last one with python=3.5
used by default.
2. now create Dockerfile
for the first stage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | FROM nvcr.io/nvidia/tensorflow:17.12 MAINTAINER your_name_here <your_email_here> ENV LANG=C.UTF-8 LC_ALL=C.UTF-8 ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \ libglib2.0-0 libxext6 libsm6 libxrender1 \ git mercurial subversion COPY ./Miniconda3-4.2.12-Linux-x86_64.sh /root/miniconda.sh RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \ /bin/bash /root/miniconda.sh -b -p /opt/conda && \ rm /root/miniconda.sh ENV PATH=/opt/conda/bin:$PATH RUN apt-get install -y libgtk2.0 && \ rm -rf /var/lib/apt/lists/* RUN conda update --all -y && \ conda install numpy=1.11.0 pandas matplotlib scikit-learn seaborn scipy tqdm jupyter ipython -y && \ conda install -c conda-forge ipywidgets netcdf4 basemap graphviz -y && \ conda install -c jaikumarm hyperopt RUN conda config --add channels conda-forge && \ conda config --add channels intel && \ conda config --add channels menpo && \ conda install -c menpo opencv3=3.1.0 -y RUN conda clean -y -all && \ rm -rf /opt/conda/pkgs/* COPY ./jupyter_notebook_config.py /root/.jupyter/ RUN conda install h5py -y VOLUME [ "/app", "/data" ] EXPOSE 8888 WORKDIR /app ENTRYPOINT [ "/usr/local/bin/nvidia_entrypoint.sh" ] CMD ["jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root", "--debug"] |
jupyter_notebook_config.py
is a file that will configure jupyter notebook parameters on start. Right now its content is the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 | c.NotebookApp.ip = '*' c.NotebookApp.open_browser = False ## Hashed password to use for web authentication. # # To generate, type in a python/IPython shell: # # from notebook.auth import passwd; passwd() # # The string should be of the form type:salt:hashed-password. c.NotebookApp.password = 'sha1:your_password_hash_here' c.NotebookApp.port = 8888 |
Note here that we are installing numpy=1.11.0
. It is tensorflow 1.4 requirement.
Also note here that we are installing opencv3=3.1.0
. It is the version that will not conflict with matplotlib
, numpy=1.11.0
and others in terms of dependencies. The repository for opencv3
is menpo
: it is the one containing opencv3 with contrib
modules.
3. From this step we will compose docker image of the stage 1 (i`ve named it «nvcr.io/mk/py35_nv_tf:int01
«).
Remember that we have to do it in the directory where Dockerfile
, jupyter_notebook_config.py
and Miniconda3-4.2.12-Linux-x86_64.sh
are located.
1 | $ docker build -t nvcr.io /mk/py35_nv_tf :int01 . |
we may check now if the docker container can be run here:
1 | $ nvidia-docker run -it -- rm nvcr.io /mk/py35_nv_tf :int01 /bin/bash |
4. Now we will launch docker container using this image and build TF inside it. We are using this approach because of NVIDIA docker runtime usage that distribute some necessary libraries in nvidia-docker run/exec ...
case only.
Particularly libcuda.so.1
of the proper version:
(inside container)
1 2 3 4 | # ll /usr/lib/x86_64-linux-gnu | grep libcuda.so lrwxrwxrwx 1 root root 17 Dec 25 21:41 libcuda.so -> libcuda.so.387.34 lrwxrwxrwx 1 root root 17 Dec 25 21:41 libcuda.so.1 -> libcuda.so.387.34 -rw-r--r-- 1 root root 11008248 Nov 21 10:18 libcuda.so.387.34 |
So we build TF inside the container, and then commit all changes made to the image of the stage 2.
launch container:
1 | $ nvidia-docker run -it -- rm nvcr.io /mk/py35_nv_tf :int01 /bin/bash |
and inside this container:
1 2 3 4 | # cd /opt/tensorflow/ # export TF_UNOFFICIAL_SETTING=1 # yes "" | ./configure # bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package |
this will take a long — about 20min.
(still inside container)
1 2 3 4 5 6 | # bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tf_pip # pip install --upgrade /tmp/tf_pip/tensorflow-*.whl # rm -rf /tmp/tf_pip/tensorflow-*.whl # bazel clean --expunge # pip install keras==2.0.8 |
and now one more step: i am going to make Keras download VGG16 head:
1 | # ipython |
1 2 3 | import keras from keras.applications import VGG16 conv_base1 = VGG16(weights = 'imagenet' , include_top = False , input_shape = ( 100 , 100 , 3 )) |
here Keras will download the file vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
to directory /root/.keras/models/
After this we are going to detach from the container using the <Ctrl>+<Q> -> <Ctrl>+<P>
keys sequence.
And now commit all changes from the container to the new image:
1 | $ docker commit 87bcd85ec902 nvcr.io /mk/py35_nv_tf |
87bcd85ec902 here is the ID of docker container where we have built TF.
after this we may stop and remove this container:
1 2 | $ docker stop 87bcd85ec902 $ docker rm 87bcd85ec902 |
5. some polish. Create the new Dockerfile containing the following:
1 2 3 4 5 6 7 8 9 10 11 12 | FROM nvcr.io/mk/py35_nv_tf:int02 MAINTAINER YOUR NAME HERE <your@email.here> VOLUME [ "/app", "/data" ] EXPOSE 8888 WORKDIR /app ENTRYPOINT [ "/usr/local/bin/nvidia_entrypoint.sh" ] CMD ["jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root", "--debug"] |
rename nvcr.io/mk/py35_nv_tf:latest
to nvcr.io/mk/py35_nv_tf:int02
and build the new image
1 2 | $ docker tag nvcr.io /mk/py35_nv_tf :latest nvcr.io /mk/py35_nv_tf :int02 $ docker build -t nvcr.io /mk/py35_nv_tf . |
after these 5 steps we are able to use Python 3.5 with NVIDIA-optimized Tensorflow 1.4 build for Python 3.5. And there is an opencv3 in the image as well.
The container may be run with the following (jupyter notebook server will start automatically):
1 | $ nvidia-docker run --shm-size=1g -- ulimit memlock=-1 -- ulimit stack=67108864 -it -- rm -- mount type =bind, source =$HOME /py35docker_app/ ,target= /app -p 8888:8888 nvcr.io /mk/py35_nv_tf |
or the following:
1 | $ nvidia-docker run --shm-size=1g -- ulimit memlock=-1 -- ulimit stack=67108864 -it -- rm -- mount type =bind, source =$HOME /py35docker_app/ ,target= /app -p 8888:8888 nvcr.io /mk/py35_nv_tf /bin/bash |
in this case we will get command line of the container.