Building deep learning model based on Dockerfile (OpenPCdet)

Article catalogue

Installation of docker (gpu)

Installing the graphics card driver on Linux

Install docker

Install NVIDIA Container Toolkit

Write dockerfile

Docker image and container startup, packaging and export

Some problems encountered:

no module name 'pcdet'

libGL.so.1

Take OpenPCdet as an example

reference resources: [Docker] image packaging trained model_ Emery_learning blog - CSDN blog

Installation of docker (gpu)

reference resources: Docker deploys deep learning server, CUDA+cudnn+ssh_ Causeway burning Anan's blog - CSDN blog

Here are the steps to install:

Installing the graphics card driver on Linux

reference resources: Three methods of installing graphics card driver in ubuntu_ u014682691 column - CSDN blog_ ubuntu installing the graphics card driver

The first method with good network speed is recommended. After the installation is completed, enter NVIDIA SMI, and you will be prompted with a problem. Restart the computer to restore normal

You can also consider not updating your own driver

Install docker

1. If an old version of docker is installed, you need to uninstall it first!

sudo apt-get remove docker docker-engine docker.io containerd runc

2. Update apt package index and install the package to allow apt to use the repository via HTTPS

sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

3. Add Docker's official GPG key:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

4. Use the following command to set up the stable repository.

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

5. Install Docker engine

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

6. At this time, docker has been installed. Run the following command to test it

sudo docker run hello-world

Install NVIDIA Container Toolkit

reference resources: Installation Guide — NVIDIA Cloud Native Technologies documentation

1. Set the repository and GPG key of the stable version

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

2. Update the source and install NVIDIA container toolkit

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

3. After setting the default operation, restart the Docker daemon to complete the installation

sudo systemctl restart docker

4. At this time, you can test the working settings by running the basic CUDA container

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

ps: the above is where I often make mistakes. The previous installation may have made mistakes without strictly following the steps. (sometimes it's normal the day before, but it's wrong to turn off the computer the next day)

could not select device driver "" with capabilities: [[gpu]].

If the above problems occur, uninstall the docker and follow the previous steps again. Or wait, restart the computer

If normal, the installation of docker (gpu) is successful. The following are the key steps to build images, containers and environments based on Dockerfile.

Write dockerfile

Select a basic dockerfile (take OpenPCdet as an example)

Just a few days ago, the author of Openpcdet provided a dockerfile, based on which I modified it to suit my own environment

OpenPCDet docker address: https://github.com/open-mmlab/OpenPCDet/tree/master/docker

dockerfile address: https://github.com/open-mmlab/OpenPCDet/blob/master/docker/Dockerfile

There are many contents, including some contents related to the installation environment. We can create a folder locally and then create a Dockerfile

touch Dockerfile

Copy the above content into it, refer to some writing methods in it, and modify it according to your own environment.

For example, my local environment is RTX3090. Generally, the cuda environment installed is 11.1 with cudnn8 version. Then we need to change the basic image, and then install the corresponding pytorch 1.8. Then I also want to replace the source in the docker image with Tsinghua source, and directly put the next good OpenPCDet directory into this directory.

The revised key parts are as follows:

FROM nvidia/cuda:11.1.1-cudnn8-devel-ubuntu18.04
MAINTAINER Wjh<xxx@xxx.mail.com>
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

RUN echo "export LANG=C.UTF-8" >>/etc/profile \
&& mkdir -p /.script \
&& cp /etc/apt/sources.list /etc/apt/sources.list.bak \
&& echo "deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse\n\
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse\n\
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse\n\
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse" >/etc/apt/sources.list

# Install basics

RUN apt-get update -y \
    && apt-get install build-essential \
    && apt-get install -y apt-utils git curl ca-certificates bzip2 tree htop wget \
    && apt-get install -y libglib2.0-0 libsm6 libxext6 libxrender-dev bmon iotop g++ python3.7 python3.7-dev python3.7-distutils
...

# Install torch and torchvision
# See https://pytorch.org/ for other options if you use a different version of CUDA
RUN pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

WORKDIR /root
COPY . /root
...

RUN pip install spconv-cu111

Explain the core content:

FROM nvidia/cuda:11.1.1-cudnn8-devel-ubuntu18.04 stands for NVIDIA / CUDA: 11.1.1-cudnn8-devel-ubuntu 18 04 as the base image (understood as the parent image)

MAINTAINER sets the author of the image, which can be any string

RUN xxx is followed by the command running in this image

WORKDIR: set the working directory. The above example is set to root. We can also set it to other names ourselves

COPY: the usage is COPY [src] [dest], and the src above is ".", It means to COPY all files in the current directory to DeST (image directory)

The above basic image needs to go to the online docker to find a suitable one. You can search cuda first, find the official image, and then find the cuda version you need

For example, according to the environment I need, I chose the following one

(attach link: Docker Hub)

It is recommended to install the devel version on the Internet. It runs completely. Copy the pull indicated in the figure above and put it in the first line From xxx in the Dockerfile according to the rules

Execute the build command in the current directory to build the docker image

docker build ./ -t xxx #xxx is your own name 

ps:dockerfile basic syntax reference: How to make container image_ Container mirroring service SWR_ Frequently asked questions_ Shared version_ General class_ Hua Weiyun

Docker image and container startup, packaging and export

View existing images

docker images

After creation, create a container from the image and enter the built image container

docker run -it --name xxx --gpus all xxxx
#(the name in the front will be taken by yourself, if not, the system will take it randomly, and the xxx in the back is the container id or name)

ctrl + d can exit the current container

Note that if you want to enter the newly created image container again, you can't use the above command. A name will fix a container. If you want to enter the container, check the container id first

View the generated container ID(CONTAINER ID)

docker ps -a

Run the container and enter the bash interface

docker start container_ID  #If the prompt is not running, enter this command to start the container
docker exec -it container_ID /bin/bash #Enter bash interface

The trained model files and related codes can be copied locally into the container

docker cp local_path container_ID:container:path

It is equivalent to a virtual environment with a configured environment, where you can run code and test.

Generate a new image of the container

docker commit container_ID your_image_name #Like your_image_name = my/image:v1
#Or just enter image_name

Get new image id:

docker images

Package the new image and generate a tar file

docker save image_ID > XXX.tar
## docker save -o my_example.tar my_image_name:latest (latest is the image container version set by default)

Import the above image on another host (or locally)

docker load -i my_example.tar

Then execute docker images to see the imported image

Then run the container according to the above steps.

Some problems encountered:

no module name 'pcdet'

After compiling in the container, the above prompt appears when running the code. Therefore, it is necessary to explain it in the container and execute the following commands:

export PYTHONPATH=$HOME/Cicv_task1_223/:$PYTHONPATH

It can operate normally

libGL.so.1

Tip: LIBGL so. 1: cannot open shared object file: No such file or directory

Within the mirror environment

apt update
apt install libgl1-mesa-glx

 

Keywords: AI Deep Learning

Added by shai1 on Mon, 07 Mar 2022 11:49:39 +0200