ubuntu16.04 one step installation of cuda10 02, PyTorch, PaddlePaddle, TensorRT

Since many frameworks and suites have stopped supporting CUDA 10.0, CUDA 10.0 will be installed as originally 0's server is reconfigured cuda10 2 and new drive

Original drive unloading

1. Stop X Server

sudo service lightdm stop

2. Uninstall the previous Driver

sudo /usr/bin/nvidia-uninstall

After uninstallation is completed, enter NVIDIA SMI to prompt that the command cannot be found

nvidia-smi: command not found

CUDA installation

Click here Enter CUDA download page

Select the version you want

According to the installation prompt, enter the following command

wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run

Enter accept

┌──────────────────────────────────────────────────────────────────────────────┐
│  End User License Agreement                                                  │
│  --------------------------                                                  │
│                                                                              │
│                                                                              │
│  Preface                                                                     │
│  -------                                                                     │
│                                                                              │
│  The Software License Agreement in Chapter 1 and the Supplement              │
│  in Chapter 2 contain license terms and conditions that govern               │
│  the use of NVIDIA software. By accepting this agreement, you                │
│  agree to comply with all the terms and conditions applicable                │
│  to the product(s) included herein.                                          │
│                                                                              │
│                                                                              │
│  NVIDIA Driver                                                               │
│                                                                              │
│                                                                              │
│  Description                                                                 │
│                                                                              │
│  This package contains the operating system driver and                       │
│──────────────────────────────────────────────────────────────────────────────│
│ Do you accept the above EULA? (accept/decline/quit):                         │
│ accept                                                                       │
└──────────────────────────────────────────────────────────────────────────────┘

We install the driver and cuda separately, and uncheck the driver through Enter during installation, as follows:

After installation, modify the environment variables and activate them as follows

vim ~/.bashrc

export PATH="/usr/local/cuda-10.2/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH"
export CUDA_HOME=usr/local/cuda-10.2$CUDA_HOME

source ~/.bashrc

Enter nvcc -V verification

nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

10.2 indicates successful installation!!!

CUDNN configuration

Download cudnn for CUDA version. I download version 8.2 here to match with TensorRT later. Click here to enter Official download link , select the Library package

Right click Copy link

After downloading, use the mv command to rename (remove the fields after tgz)

mv cudnn-10.2-linux-x64-v8.2.4.15.tgz\?QCogpoDS_Z1q53hHFgWmiQWSZqnkTJQ7rwBVj-b9kZc0nkobdCw11ri1t9znzIPSTKUiijoxTwJBGOCxYUj_3QHjlTERLhG8lOFu63haEzcKbGjnUAQDxG6SiJJvoySYHG-8Su7EpOGKoo5iJySeHSnvdJrXp2BhnIKmYKqokryTSw9WVWL5lkZ9AeoSuRr4nPltqp_jV  cudnn-10.2-linux-x64-v8.2.4.15.tgz

In this way, you should have the following documents

Unzip. The cuda folder appears after unzip

tar zxvf cudnn-10.2-linux-x64-v8.2.4.15.tgz

Copy the corresponding files to the CUDA directory of the previous installation

sudo cp cuda/include/* /usr/local/cuda-10.2/include
sudo cp cuda/lib64/* /usr/local/cuda-10.2/lib64
sudo chmod a+r /usr/local/cuda-10.2/include/cudnn.h 
sudo chmod a+r /usr/local/cuda-10.2/lib64/libcudnn*

In addition, the remaining three packages need to be installed

sudo dpkg -i libcudnn8_8.2.4.15-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn8-dev_8.2.4.15-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn8-samples_8.2.4.15-1+cuda10.2_amd64.deb

Use the following command to view the versions. If there are many other versions, use sudo apt get remove libcudnn7, sudo apt get remove libcudnn7 dev, sudo apt get remove libcudnn7 samples to delete them in turn

dpkg -l | grep cudnn

ii  libcudnn8                                                    8.2.4.15-1+cuda10.2                             amd64        cuDNN runtime libraries
ii  libcudnn8-dev                                                8.2.4.15-1+cuda10.2                             amd64        cuDNN development libraries and headers
ii  libcudnn8-samples                                            8.2.4.15-1+cuda10.2                             amd64        cuDNN documents and samples

Drive installation

Enter the official website , select your own configuration, start searching and download the latest version

Query whether X-server exists

ps aux | grep X

If the display is as follows, it indicates that there is an X-server

root     18714  0.0  0.0 415608 25916 tty7     Ssl+ Dec28   0:04 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
ubuntu   28400  0.0  0.0  12940  1012 pts/16   S+   03:57   0:00 grep --color=auto X

There are many ways to turn off X. just enter all of them:

sudo service lightdm stop
sudo service gdm stop
sudo service kdm stop

Query again

ps aux | grep X

appear

ubuntu    5217  0.0  0.0  12940   928 pts/16   S+   04:05   0:00 grep --color=auto X

Closed successfully

install

sudo sh NVIDIA-Linux-x86_64-495.46.run

Follow the prompt and click by default
After installation, enter NVIDIA SMI and the following screen will appear

Fri Dec 31 04:09:03 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 495.46       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 19%   30C    P0    58W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 21%   37C    P0    58W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:82:00.0 Off |                  N/A |
| 17%   32C    P0    57W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:83:00.0 Off |                  N/A |
| 18%   34C    P0    56W / 250W |      0MiB / 11178MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

PyTorch installation

Select the version suitable for cuda-10.2

conda create -n pytorch python=3.7
conda activate pytorch
pip install torch torchvision torchaudio

Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>

PaddlePaddle installation

**Since nccl has been installed in the previous environment, I directly support multiple cards. For more installation, see paddlepaddle official website

conda create -n paddle python=3.7
conda activate paddle
pip install paddlepaddle-gpu==2.2.1 -i https://mirror.baidu.com/pypi/simple

Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W1231 04:38:06.920902 20825 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.5, Runtime API Version: 10.2
W1231 04:38:06.922672 20825 device_context.cc:465] device: 0, cuDNN Version: 8.3.
PaddlePaddle works well on 1 GPU.
W1231 04:38:08.923141 20825 parallel_executor.cc:617] Cannot enable P2P access from 0 to 2
W1231 04:38:08.923198 20825 parallel_executor.cc:617] Cannot enable P2P access from 0 to 3
W1231 04:38:09.725281 20825 parallel_executor.cc:617] Cannot enable P2P access from 1 to 2
W1231 04:38:09.725327 20825 parallel_executor.cc:617] Cannot enable P2P access from 1 to 3
W1231 04:38:09.725330 20825 parallel_executor.cc:617] Cannot enable P2P access from 2 to 0
W1231 04:38:09.725334 20825 parallel_executor.cc:617] Cannot enable P2P access from 2 to 1
W1231 04:38:10.782135 20825 parallel_executor.cc:617] Cannot enable P2P access from 3 to 0
W1231 04:38:10.782181 20825 parallel_executor.cc:617] Cannot enable P2P access from 3 to 1
W1231 04:38:14.150104 20825 fuse_all_reduce_op_pass.cc:76] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2.
PaddlePaddle works well on 4 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
>>>

TensorRT installation

Click here Go to the official website to download the appropriate version. Try to choose the version of TensorRT 8 and install it with tar package, as shown below

according to Official installation routine , edit variables according to your own system and version, and output the following commands in turn

version="8.2.2.1"
arch="x86_64"
cuda="cuda-10.2"
cudnn="cudnn8.2"
tar xzvf TensorRT-${version}.Linux.${arch}-gnu.${cuda}.${cudnn}.tar.gz

View extracted files

ls TensorRT-${version}

bin  data  doc  graphsurgeon  include  lib  onnx_graphsurgeon  python  samples  targets  TensorRT-Release-Notes.pdf  uff

Copy to / usr/local/cuda-10.2/lib64. Of course, you can also add an environment variable export LD for this directory_ LIBRARY_ PATH=$LD_ LIBRARY_ PATH:<TensorRT-${version}/lib>

sudo cp -r TensorRT-${version}/lib/* /usr/local/cuda-10.2/lib64

Install TensorRT

cd TensorRT-${version}/python
pip install tensorrt-*-cp37-none-linux_x86_64.whl

Install the uff using TensorRT and TensorFlow

cd TensorRT-${version}/uff
pip install uff-0.6.9-py2.py3-none-any.whl

Verify the installation using the following command

which convert-to-uff

/home/ubuntu/anaconda3/envs/pytorch/bin/convert-to-uff

Installing graphsurgeon

cd TensorRT-${version}/graphsurgeon
pip install graphsurgeon-0.4.5-py2.py3-none-any.whl

Install onnx graphsurgeon

cd TensorRT-${version}/onnx_graphsurgeon
pip install onnx_graphsurgeon-0.3.12-py2.py3-none-any.whl

verification

cd TensorRT-${version}/samples/sampleMNIST
make
cd ../../bin
./sample_mnist

Finally, the following results show that the installation is successful

&&&& RUNNING TensorRT.sample_mnist [TensorRT v8202] # ./sample_mnist
[12/31/2021-07:28:34] [I] Building and running a GPU inference engine for MNIST
[12/31/2021-07:28:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +160, GPU +0, now: CPU 162, GPU 215 (MiB)
[12/31/2021-07:28:36] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 162 MiB, GPU 215 MiB
[12/31/2021-07:28:36] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 183 MiB, GPU 215 MiB
[12/31/2021-07:28:37] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2
[12/31/2021-07:28:37] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +115, GPU +46, now: CPU 300, GPU 261 (MiB)
[12/31/2021-07:28:37] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +119, GPU +62, now: CPU 419, GPU 323 (MiB)
[12/31/2021-07:28:37] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[12/31/2021-07:28:42] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[12/31/2021-07:28:42] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/31/2021-07:28:42] [I] [TRT] Total Host Persistent Memory: 5440
[12/31/2021-07:28:42] [I] [TRT] Total Device Persistent Memory: 0
[12/31/2021-07:28:42] [I] [TRT] Total Scratch Memory: 0
[12/31/2021-07:28:42] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 16 MiB
[12/31/2021-07:28:42] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.084057ms to assign 3 blocks to 11 nodes requiring 57857 bytes.
[12/31/2021-07:28:42] [I] [TRT] Total Activation Memory: 57857
[12/31/2021-07:28:42] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 591, GPU 397 (MiB)
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 591, GPU 407 (MiB)
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 592, GPU 369 (MiB)
[12/31/2021-07:28:42] [I] [TRT] Loaded engine size: 1 MiB
[12/31/2021-07:28:42] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 592, GPU 379 (MiB)
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 592, GPU 387 (MiB)
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1, now: CPU 0, GPU 1 (MiB)
[12/31/2021-07:28:42] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 568, GPU 379 (MiB)
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 568, GPU 387 (MiB)
[12/31/2021-07:28:42] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1 (MiB)
[12/31/2021-07:28:42] [I] Input:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@.*@@@@@@@@@@
@@@@@@@@@@@@@@@@.=@@@@@@@@@@
@@@@@@@@@@@@+@@@.=@@@@@@@@@@
@@@@@@@@@@@% #@@.=@@@@@@@@@@
@@@@@@@@@@@% #@@.=@@@@@@@@@@
@@@@@@@@@@@+ *@@:-@@@@@@@@@@
@@@@@@@@@@@= *@@= @@@@@@@@@@
@@@@@@@@@@@. #@@= @@@@@@@@@@
@@@@@@@@@@=  =++.-@@@@@@@@@@
@@@@@@@@@@       =@@@@@@@@@@
@@@@@@@@@@  :*## =@@@@@@@@@@
@@@@@@@@@@:*@@@% =@@@@@@@@@@
@@@@@@@@@@@@@@@% =@@@@@@@@@@
@@@@@@@@@@@@@@@# =@@@@@@@@@@
@@@@@@@@@@@@@@@# =@@@@@@@@@@
@@@@@@@@@@@@@@@* *@@@@@@@@@@
@@@@@@@@@@@@@@@= #@@@@@@@@@@
@@@@@@@@@@@@@@@= #@@@@@@@@@@
@@@@@@@@@@@@@@@=.@@@@@@@@@@@
@@@@@@@@@@@@@@@++@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

[12/31/2021-07:28:42] [I] Output:
0:
1:
2:
3:
4: **********
5:
6:
7:
8:
9:

&&&& PASSED TensorRT.sample_mnist [TensorRT v8202] # ./sample_mnist

remarks

Note: if nvidia download cannot be linked using wget, modify the following configuration

sudo chmod 777 /etc/resolv.conf
vim /etc/resolv.conf

nameserver 8.8.8.8 #google domain name server
nameserver 8.8.4.4 #google domain name server

Remember to comment it out after downloading to avoid sudo: unable to resolve host XXXX when using sudo

Keywords: Pytorch Deep Learning paddlepaddle

Added by ricardo.leite on Fri, 31 Dec 2021 21:07:17 +0200

Programming VIP