NVIDIA NGC tensorflow image py27 to py35 with OpenCV (opencv3 for python)

assume we are working with Ubuntu 16.04 xenial and Docker 17.09.1-ce:

1
2
3
4
5
6
7
8
9
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:    16.04
Codename:   xenial
 
$ docker --version
Docker version 17.09.1-ce, build 19e2cf6

1. go to ngc.nvidia.com and get NGC tensorflow image link. Then make docker login to NGC registry:

1
2
3
4
docker login nvcr.io
 
Username: $oauthtoken
Password: <Your Key>

We are prompted here to type username which is «$oauthtoken» and password which is an API key. API key is generated at the NGC user account, in the «Configuration» section.
Now we get NGC tensorflow image. Go to «Registry» section, choose nvidia/tensorflow image section, scroll down and copy the needed (latest) tag pull command. Currently its 17.12 tag and the command is the following:

1
$ docker pull nvcr.io/nvidia/tensorflow:17.12

Now we have the image in the local images registry:

1
2
3
4
5
$ docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             SIZE
hello-world                 latest              f2a91732366c        5 weeks ago         1.85kB
nvcr.io/nvidia/tensorflow   17.12               19afd620fc8e        5 weeks ago         2.88GB
ubuntu                      latest              20c44cd7596f        5 weeks ago         123MB

Now we will make our own image based on NVIDIA one.
1. get miniconda installer:

1
wget https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh

I am downloading the specific 4.2 version as a last one with python=3.5 used by default.

2. now create Dockerfile for the first stage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
FROM nvcr.io/nvidia/tensorflow:17.12
MAINTAINER your_name_here <your_email_here>
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu
 
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
    libglib2.0-0 libxext6 libsm6 libxrender1 \
    git mercurial subversion
 
COPY ./Miniconda3-4.2.12-Linux-x86_64.sh /root/miniconda.sh
 
RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
    /bin/bash /root/miniconda.sh -b -p /opt/conda && \
    rm /root/miniconda.sh
 
ENV PATH=/opt/conda/bin:$PATH
 
RUN apt-get install -y libgtk2.0 && \
    rm -rf /var/lib/apt/lists/*
 
RUN conda update --all -y && \
    conda install numpy=1.11.0 pandas matplotlib scikit-learn seaborn scipy tqdm jupyter ipython -y && \
    conda install -c conda-forge ipywidgets netcdf4 basemap graphviz -y && \
    conda install -c jaikumarm hyperopt
 
RUN conda config --add channels conda-forge && \
    conda config --add channels intel && \
    conda config --add channels menpo && \
    conda install -c menpo opencv3=3.1.0 -y
 
RUN conda clean -y -all && \
    rm -rf /opt/conda/pkgs/*
 
COPY ./jupyter_notebook_config.py /root/.jupyter/
 
RUN conda install h5py -y
 
VOLUME [ "/app", "/data" ]
EXPOSE 8888
WORKDIR /app
ENTRYPOINT [ "/usr/local/bin/nvidia_entrypoint.sh" ]
CMD ["jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root", "--debug"]

jupyter_notebook_config.py is a file that will configure jupyter notebook parameters on start. Right now its content is the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
 
## Hashed password to use for web authentication.
#  To generate, type in a python/IPython shell:
#    from notebook.auth import passwd; passwd()
#  The string should be of the form type:salt:hashed-password.
c.NotebookApp.password = 'sha1:your_password_hash_here'
 
c.NotebookApp.port = 8888

Note here that we are installing numpy=1.11.0. It is tensorflow 1.4 requirement.
Also note here that we are installing opencv3=3.1.0. It is the version that will not conflict with matplotlib, numpy=1.11.0 and others in terms of dependencies. The repository for opencv3 is menpo: it is the one containing opencv3 with contrib modules.

3. From this step we will compose docker image of the stage 1 (i`ve named it «nvcr.io/mk/py35_nv_tf:int01«).
Remember that we have to do it in the directory where Dockerfile, jupyter_notebook_config.py and Miniconda3-4.2.12-Linux-x86_64.sh are located.

1
$ docker build -t nvcr.io/mk/py35_nv_tf:int01 .

we may check now if the docker container can be run here:

1
$ nvidia-docker run -it --rm nvcr.io/mk/py35_nv_tf:int01 /bin/bash

4. Now we will launch docker container using this image and build TF inside it. We are using this approach because of NVIDIA docker runtime usage that distribute some necessary libraries in nvidia-docker run/exec ... case only.
Particularly libcuda.so.1 of the proper version:
(inside container)

1
2
3
4
# ll /usr/lib/x86_64-linux-gnu | grep libcuda.so
lrwxrwxrwx  1 root root        17 Dec 25 21:41 libcuda.so -> libcuda.so.387.34
lrwxrwxrwx  1 root root        17 Dec 25 21:41 libcuda.so.1 -> libcuda.so.387.34
-rw-r--r--  1 root root  11008248 Nov 21 10:18 libcuda.so.387.34

So we build TF inside the container, and then commit all changes made to the image of the stage 2.
launch container:

1
$ nvidia-docker run -it --rm nvcr.io/mk/py35_nv_tf:int01 /bin/bash

and inside this container:

1
2
3
4
# cd /opt/tensorflow/
# export TF_UNOFFICIAL_SETTING=1
# yes "" | ./configure
# bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package

this will take a long — about 20min.
(still inside container)

1
2
3
4
5
6
# bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tf_pip
# pip install --upgrade /tmp/tf_pip/tensorflow-*.whl
# rm -rf /tmp/tf_pip/tensorflow-*.whl
# bazel clean --expunge
 
# pip install keras==2.0.8

and now one more step: i am going to make Keras download VGG16 head:

1
# ipython
1
2
3
import keras
from keras.applications import VGG16
conv_base1 = VGG16(weights='imagenet', include_top=False, input_shape=(100, 100, 3))

here Keras will download the file vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 to directory /root/.keras/models/

After this we are going to detach from the container using the <Ctrl>+<Q> -> <Ctrl>+<P> keys sequence.

And now commit all changes from the container to the new image:

1
$ docker commit 87bcd85ec902 nvcr.io/mk/py35_nv_tf

87bcd85ec902 here is the ID of docker container where we have built TF.
after this we may stop and remove this container:

1
2
$ docker stop 87bcd85ec902
$ docker rm 87bcd85ec902

5. some polish. Create the new Dockerfile containing the following:

1
2
3
4
5
6
7
8
9
10
11
12
FROM nvcr.io/mk/py35_nv_tf:int02
 
MAINTAINER YOUR NAME HERE <your@email.here>
 
VOLUME [ "/app", "/data" ]
EXPOSE 8888
 
WORKDIR /app
 
ENTRYPOINT [ "/usr/local/bin/nvidia_entrypoint.sh" ]
 
CMD ["jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root", "--debug"]

rename nvcr.io/mk/py35_nv_tf:latest to nvcr.io/mk/py35_nv_tf:int02 and build the new image

1
2
$ docker tag nvcr.io/mk/py35_nv_tf:latest nvcr.io/mk/py35_nv_tf:int02
$ docker build -t nvcr.io/mk/py35_nv_tf .

after these 5 steps we are able to use Python 3.5 with NVIDIA-optimized Tensorflow 1.4 build for Python 3.5. And there is an opencv3 in the image as well.
The container may be run with the following (jupyter notebook server will start automatically):

1
$ nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it --rm --mount type=bind,source=$HOME/py35docker_app/,target=/app -p 8888:8888 nvcr.io/mk/py35_nv_tf

or the following:

1
$ nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it --rm --mount type=bind,source=$HOME/py35docker_app/,target=/app -p 8888:8888 nvcr.io/mk/py35_nv_tf /bin/bash

in this case we will get command line of the container.