Technical background
With the increase of model computation and the development of hardware technology, using GPU to complete various tasks has gradually become the mainstream means of algorithm implementation. For the occupation of some GPUs during operation, such as the utilization rate of video memory at each step, some detailed GPU information reading tools are required. Here, we mainly recommend using py3nvml to monitor the running process of python code.
General information reading
NVIDIA SMI is commonly used to read GPU utilization, video memory occupation, driver version and other information:
$ nvidia-smi Wed Jan 12 15:52:04 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro RTX 4000 On | 00000000:03:00.0 On | N/A | | 30% 39C P8 20W / 125W | 538MiB / 7979MiB | 16% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Quadro RTX 4000 On | 00000000:A6:00.0 Off | N/A | | 30% 32C P8 7W / 125W | 6MiB / 7982MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1643 G /usr/lib/xorg/Xorg 412MiB | | 0 N/A N/A 2940 G /usr/bin/gnome-shell 76MiB | | 0 N/A N/A 47102 G ...AAAAAAAAA= --shared-files 35MiB | | 0 N/A N/A 172424 G ...AAAAAAAAA= --shared-files 11MiB | | 1 N/A N/A 1643 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------+
However, if you do not use the profile and only use the output of the NVIDIA SMI instruction, there is no way to analyze the changes in the running process of the program in great detail. By the way, I recommend an exquisite gadget that is very similar to NVIDIA SMI usage: gpustat. This tool can be installed and managed directly using pip:
$ python3 -m pip install gpustat Collecting gpustat Downloading gpustat-0.6.0.tar.gz (78 kB) |████████████████████████████████| 78 kB 686 kB/s Requirement already satisfied: six>=1.7 in /home/dechin/.local/lib/python3.8/site-packages (from gpustat) (1.16.0) Collecting nvidia-ml-py3>=7.352.0 Downloading nvidia-ml-py3-7.352.0.tar.gz (19 kB) Requirement already satisfied: psutil in /home/dechin/.local/lib/python3.8/site-packages (from gpustat) (5.8.0) Collecting blessings>=1.6 Downloading blessings-1.7-py3-none-any.whl (18 kB) Building wheels for collected packages: gpustat, nvidia-ml-py3 Building wheel for gpustat (setup.py) ... done Created wheel for gpustat: filename=gpustat-0.6.0-py3-none-any.whl size=12617 sha256=4158e741b609c7a1bc6db07d76224db51cd7656a6f2e146e0b81185ce4e960ba Stored in directory: /home/dechin/.cache/pip/wheels/0d/d9/80/b6cbcdc9946c7b50ce35441cc9e7d8c5a9d066469ba99bae44 Building wheel for nvidia-ml-py3 (setup.py) ... done Created wheel for nvidia-ml-py3: filename=nvidia_ml_py3-7.352.0-py3-none-any.whl size=19191 sha256=70cd8ffc92286944ad9f5dc4053709af76fc0e79928dc61b98a9819a719f1e31 Stored in directory: /home/dechin/.cache/pip/wheels/b9/b1/68/cb4feab29709d4155310d29a421389665dcab9eb3b679b527b Successfully built gpustat nvidia-ml-py3 Installing collected packages: nvidia-ml-py3, blessings, gpustat Successfully installed blessings-1.7 gpustat-0.6.0 nvidia-ml-py3-7.352.0
The operation is also very similar to NVIDIA SMI:
$ watch --color -n1 gpustat -cpu
The returned results are as follows:
Every 1.0s: gpustat -cpu ubuntu2004: Wed Jan 12 15:58:59 2022 ubuntu2004 Wed Jan 12 15:58:59 2022 470.42.01 [0] Quadro RTX 4000 | 39'C, 3 % | 537 / 7979 MB | root:Xorg/1643(412M) de chin:gnome-shell/2940(75M) dechin:slack/47102(35M) dechin:chrome/172424(11M) [1] Quadro RTX 4000 | 32'C, 0 % | 6 / 7982 MB | root:Xorg/1643(4M)
The results returned by gpustat contain general information such as GPU model, utilization rate, video memory size and GPU current temperature.
Installation and use of py3nvml
Next, let's take a formal look at the installation and use of py3nvml, a library that can view and monitor GPU information in python in real time. You can install and manage it through pip:
$ python3 -m pip install py3nvml Collecting py3nvml Downloading py3nvml-0.2.7-py3-none-any.whl (55 kB) |████████████████████████████████| 55 kB 650 kB/s Requirement already satisfied: xmltodict in /home/dechin/anaconda3/lib/python3.8/site-packages (from py3nvml) (0.12.0) Installing collected packages: py3nvml Successfully installed py3nvml-0.2.7
py3nvml binding GPU card
In order to maximize performance, some frameworks will use all GPU cards in the entire resource pool by default during initialization. For example, the following case is demonstrated using Jax:
In [1]: import py3nvml In [2]: from jax import numpy as jnp In [3]: x = jnp.ones(1000000000) In [4]: !nvidia-smi Wed Jan 12 16:08:32 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro RTX 4000 On | 00000000:03:00.0 On | N/A | | 30% 41C P0 38W / 125W | 7245MiB / 7979MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Quadro RTX 4000 On | 00000000:A6:00.0 Off | N/A | | 30% 35C P0 35W / 125W | 101MiB / 7982MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1643 G /usr/lib/xorg/Xorg 412MiB | | 0 N/A N/A 2940 G /usr/bin/gnome-shell 75MiB | | 0 N/A N/A 47102 G ...AAAAAAAAA= --shared-files 35MiB | | 0 N/A N/A 172424 G ...AAAAAAAAA= --shared-files 11MiB | | 0 N/A N/A 812125 C /usr/local/bin/python 6705MiB | | 1 N/A N/A 1643 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 812125 C /usr/local/bin/python 93MiB | +-----------------------------------------------------------------------------+
In this case, we only allocated a space in the video memory to store a vector, but Jax automatically occupied two local GPU cards after initialization. According to the method officially provided by Jax, we can use the following operations to configure environment variables so that Jax can only see one card, so that it will not expand:
In [1]: import os In [2]: os.environ["CUDA_VISIBLE_DEVICES"] = "1" In [3]: from jax import numpy as jnp In [4]: x = jnp.ones(1000000000) In [5]: !nvidia-smi Wed Jan 12 16:10:36 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro RTX 4000 On | 00000000:03:00.0 On | N/A | | 30% 40C P8 19W / 125W | 537MiB / 7979MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Quadro RTX 4000 On | 00000000:A6:00.0 Off | N/A | | 30% 35C P0 35W / 125W | 7195MiB / 7982MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1643 G /usr/lib/xorg/Xorg 412MiB | | 0 N/A N/A 2940 G /usr/bin/gnome-shell 75MiB | | 0 N/A N/A 47102 G ...AAAAAAAAA= --shared-files 35MiB | | 0 N/A N/A 172424 G ...AAAAAAAAA= --shared-files 11MiB | | 1 N/A N/A 1643 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 813030 C /usr/local/bin/python 7187MiB | +-----------------------------------------------------------------------------+
We can see that only one GPU card has been used in the result, which has achieved our goal, but this function realized by configuring environment variables is not enough. Therefore, py3nvml also provides such a function, which can specify a series of GPU cards to perform tasks:
In [1]: import py3nvml In [2]: from jax import numpy as jnp In [3]: py3nvml.grab_gpus(num_gpus=1,gpu_select=[1]) Out[3]: 1 In [4]: x = jnp.ones(1000000000) In [5]: !nvidia-smi Wed Jan 12 16:12:37 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro RTX 4000 On | 00000000:03:00.0 On | N/A | | 30% 40C P8 20W / 125W | 537MiB / 7979MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Quadro RTX 4000 On | 00000000:A6:00.0 Off | N/A | | 30% 36C P0 35W / 125W | 7195MiB / 7982MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1643 G /usr/lib/xorg/Xorg 412MiB | | 0 N/A N/A 2940 G /usr/bin/gnome-shell 75MiB | | 0 N/A N/A 47102 G ...AAAAAAAAA= --shared-files 35MiB | | 0 N/A N/A 172424 G ...AAAAAAAAA= --shared-files 11MiB | | 1 N/A N/A 1643 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 814673 C /usr/local/bin/python 7187MiB | +-----------------------------------------------------------------------------+
It can be seen that only one GPU card is used in the result, which achieves the same effect as the operation in the previous step.
View idle GPU s
For the GPU available in the environment, py3nvml the criterion is that there are no processes on the GPU, so this is an available GPU card:
In [1]: import py3nvml In [2]: free_gpus = py3nvml.get_free_gpus() In [3]: free_gpus Out[3]: [True, True]
Of course, it should be noted that the system application will not be recognized here. It should be able to judge the daemon.
Command line information acquisition
Very similar to NVIDIA SMI, py3nvml can also be used on the command line by calling py3smi. It is worth mentioning that if NVIDIA SMI is needed to monitor GPU usage information in real time, it often needs to be used in conjunction with watch -n. however, if it is py3smi, it is not necessary. Similar functions can be realized directly with py3smi -l.
$ py3smi -l 5 Wed Jan 12 16:17:37 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI Driver Version: 470.42.01 | +---------------------------------+---------------------+---------------------+ | GPU Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +=================================+=====================+=====================+ | 0 30% 39C 8 19W / 125W | 537MiB / 7979MiB | 0% Default | | 1 30% 33C 8 7W / 125W | 6MiB / 7982MiB | 0% Default | +---------------------------------+---------------------+---------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU Owner PID Uptime Process Name Usage | +=============================================================================+ +-----------------------------------------------------------------------------+
You can see the slight difference is that there are not as many processes listed in NVIDIA SMI, and the system processes should be automatically ignored.
View the driver version and graphics card model separately
In py3nvml, the functions of viewing driver and model are listed separately:
In [1]: from py3nvml.py3nvml import * In [2]: nvmlInit() Out[2]: <CDLL 'libnvidia-ml.so.1', handle 560ad4d07a60 at 0x7fd13aa52340> In [3]: print("Driver Version: {}".format(nvmlSystemGetDriverVersion())) Driver Version: 470.42.01 In [4]: deviceCount = nvmlDeviceGetCount() ...: for i in range(deviceCount): ...: handle = nvmlDeviceGetHandleByIndex(i) ...: print("Device {}: {}".format(i, nvmlDeviceGetName(handle))) ...: Device 0: Quadro RTX 4000 Device 1: Quadro RTX 4000 In [5]: nvmlShutdown()
In this way, we do not need to filter one by one, which is more convenient in terms of flexibility and scalability.
View video memory information separately
Similarly, the usage information of video memory is listed separately here. Users do not need to filter this information separately. It is relatively detailed:
In [1]: from py3nvml.py3nvml import * In [2]: nvmlInit() Out[2]: <CDLL 'libnvidia-ml.so.1', handle 55ae42aadd90 at 0x7f39c700e040> In [3]: handle = nvmlDeviceGetHandleByIndex(0) In [4]: info = nvmlDeviceGetMemoryInfo(handle) In [5]: print("Total memory: {}MiB".format(info.total >> 20)) Total memory: 7979MiB In [6]: print("Free memory: {}MiB".format(info.free >> 20)) Free memory: 7441MiB In [7]: print("Used memory: {}MiB".format(info.used >> 20)) Used memory: 537MiB
If you insert these codes into the program, you can know the change of the video memory occupied by each step.
Summary summary
In the process of deep learning or other types of GPU operation, the monitoring of GPU information is also a very common function. If you only use the system level GPU monitoring tool, you can't track the changes of video memory and utilization at each step in great detail. If you use profiler, it seems too detailed, and the environment configuration, information output and filtering are not very convenient. At this time, you can consider using py3nvml such a tool to conduct detailed analysis on the GPU task execution process, which will help to improve the utilization of GPU and the performance of program execution.
Copyright notice
The starting link of this article is: https://www.cnblogs.com/dechinphy/p/py3nvml.html
Author ID: DechinPhy
For more original articles, please refer to: https://www.cnblogs.com/dechinphy/
Special links for rewards: https://www.cnblogs.com/dechinphy/gallery/image/379634.html
Tencent cloud column synchronization: https://cloud.tencent.com/developer/column/91958