Engineering model: use TensorRT - Accelerated reasoning under Linux to complete the process from environment installation to training deployment

1. Environment and Version Description

        ~~~~~~~        ● ubuntu 18.04
        ~~~~~~~        ● CUDA 10.0
        ~~~~~~~        ● cudnn 7.6.5
        ~~~~~~~        ● gcc 7.3
        ~~~~~~~        ● python 3.6
        ~~~~~~~        ● tensorflow 1.14
        ~~~~~~~        ● tensort 6.0.1

2. Environmental installation

2.1 CUDA installation

Drive drivers
ubuntu-drivers devices  # View the drivers supported by the device


My computer supports 410, so install the driver first

# 1. Update apt get source list
sudo apt-get update
sudo apt-get upgrade
# 2. Add driver source
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-410  # install
sudo nvidia-smi   # View the installation status of the driver

Select CUDA version according to the following figure, and select CUDA version 10.0, Click here to view the corresponding drivers on CUDA official website
, as shown in the figure below, some data:

Switch gcc

Select CUDA version 10.0, Click here to view the corresponding GCC version of CUDA official website
, as shown in the figure below, some data:

Bloggers use gcc, which is originally 7.3, so there is no need to install or replace. If you want to install or replace, use the following command to switch

# Install gcc5
sudo apt-get install gcc-5 gcc-5-multilib 	
sudo apt-get install g++-5 g++-5-multilib
# set priority
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50	
sudo update-alternatives --install /usr/bin/g++ gc++/usr/bin/g++-5 50
# See the optional version of gcc and the current version
sudo update-alternatives --config gcc
# View gcc version
gcc -version
Installing cuda

Official website, click here

After downloading, run sudo sh * Run to install
Press Enter until you finish reading the statement
If the driver is installed independently, you must choose not to install the driver! Select the following:
There will be some warnings during the installation, which can be ignored. The following configuration is required after installation

# Add environment variable
sudo gedit ~/.bashrc

# Add to last
export PATH=$PATH:/usr/local/cuda/bin  
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64  
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64  

# Save exit
source ~/.bashrc
 
#Is the test successful
sudo rm -rf /usr/local/cuda  
sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda  
nvcc --version  

cd /usr/local/cuda/samples/1_Utilities/deviceQuery 
sudo make
./deviceQuery

# restart
reboot

#input
nvidia-smi  # View the latest installation status

Install cudnn

Official website, click here For version selection and download

Delete old version, skip if none

sudo rm -rf /usr/local/cuda/include/cudnn.h
sudo rm -rf /usr/local/cuda/lib64/libcudnn*

Install the new version and cd into the cuda folder that you just unzipped

sudo cp include/cudnn.h /usr/local/cuda/include/	
sudo cp lib64/lib* /usr/local/cuda/lib64/

Establish soft link

cd /usr/local/cuda/lib64/	
sudo chmod +r libcudnn.so.7.6.5	
sudo ln -sf libcudnn.so.7.6.5 libcudnn.so.7	
sudo ln -sf libcudnn.so.7 libcudnn.so   	
sudo ldconfig

Test verification

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

If the version shown in the figure below appears, the installation is successful

2.2 installation of TensorRT

Download address, click the official website, here

Select the tar version of 6.0 to download, and extract it to get the following work

# Create a new folder under home, name it tensorrt, and then copy and unzip the downloaded compressed file
tar xzvf TensorRT-6.0.1.5.Ubuntu-18.04.x86_64-gnu.cuda-10.0.cudnn7.6.tar.gz 
 
# Unzip the TensorRT-6.0.1.5 folder and add the lib absolute path inside to the environment variable
gedit ~/.bashrc

# add to
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/jason/tensorrt/TensorRT-6.0.1.5/lib

# take effect
source ~/.bashrc
	
#Install TensorRT
cd TensorRT-6.0.1.5/python/
pip install tensorrt-6.0.1.5-cp36-none-linux_x86_64.whl
 
#Install UFF, which is related to tensorflow. You can not install it
cd TensorRT-6.0.1.5/uff
pip install uff-0.6.5-py2.py3-none-any.whl 
 
#Installing graphsurgeon
cd TensorRT-6.0.1.5/graphsurgeon
pip install graphsurgeon-0.4.1-py2.py3-none-any.whl

# Enter the python environment and test. If no error is reported, the installation is successful
import tensorrt

2.3 TF installation and other environments

sudo pip3 install tensorflow-gpu==1.14.0
sudo pip3 install numpy==1.19.5
sudo pip3 install h5py==3.1.0
sudo pip3 install onnx==1.4.1
sudo pip3 install Pillow==6.1.0
sudo pip3 install pycuda==2019.1.1
sudo pip3 install uff==0.6.5
sudo pip3 install wget==3.2

3. Model training (taking inception V3 as an example) – > omitted

Omitted, reference

4. Reasoning acceleration using tensorRT

4.1 model saving

In TensorFlow, I believe you will not be unfamiliar with the saving and calling of the model. Use the key statement Saver = TF train. Saver() and saver Save() can be completed.

However, I don't know whether you understand that tensorflow saves the structure and weight data of the model separately through the format file of checkpoint, which makes it inconvenient in some use scenarios.

Therefore, we need a way to combine the model structure and weight data in one file. tensorflow provides free_ Graph function and pb file format to solve this problem.

After saving, the model will be saved in the ckpt file. The checkpoint file saves a list of all model files in a directory. The events file is for the visualizer tensorboard.

Directly related to the saved model are the following three files:

The. data file holds the current parameter values
Saved the current parameter file name
The. meta file saves the current graph structure
When you use saver When restore () loads the model, you use this group of three checkpoint files. However, when we need to integrate the model and weight into one file, we need the following operations.

4.2. Generate PB file

ensorflow provides free_ Graph this function to generate pb files. The following code block can complete the operation of converting checkpoint file into pb file:

Load your model structure,
Provide checkpoint file address
Use TF train. Writegraph saves the graph, which will be provided to freeze_graph usage
Use free_ Generate pb file from graph

The specific implementation code is as follows

# -*-coding: utf-8 -*-
import tensorflow as tf
import numpy as np
import cv2
from tensorflow.python.framework import graph_util

resize_height = 224  
resize_width = 224 
depths = 3


def read_image(filename, resize_height, resize_width, normalization=False):
    bgr_image = cv2.imread(filename)
    if len(bgr_image.shape) == 2:
        print("Warning:gray image", filename)
        bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)

    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
    if resize_height > 0 and resize_width > 0:
        rgb_image = cv2.resize(rgb_image, (resize_width, resize_height))
    rgb_image = np.asanyarray(rgb_image)
    if normalization:
        rgb_image = rgb_image / 255.0
    return rgb_image


def freeze_graph_test(pb_path, image_path):
    with tf.Graph().as_default():
        output_graph_def = tf.GraphDef()
        with open(pb_path, "rb") as f:
            output_graph_def.ParseFromString(f.read())
            tf.import_graph_def(output_graph_def, name="")
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            input_image_tensor = sess.graph.get_tensor_by_name("input:0")
            input_keep_prob_tensor = sess.graph.get_tensor_by_name("keep_prob:0")
            input_is_training_tensor = sess.graph.get_tensor_by_name("is_training:0")

            output_tensor_name = sess.graph.get_tensor_by_name("InceptionV3/Logits/SpatialSqueeze:0")

            im = read_image(image_path, resize_height, resize_width, normalization=True)
            im = im[np.newaxis, :]
            out = sess.run(output_tensor_name, feed_dict={input_image_tensor: im,
                                                          input_keep_prob_tensor: 1.0,
                                                          input_is_training_tensor: False})
            print("out:{}".format(out))
            score = tf.nn.softmax(out, name='pre')
            class_id = tf.argmax(score, 1)
            print("pre class_id:{}".format(sess.run(class_id)))


def freeze_graph(input_checkpoint, output_graph):
    output_node_names = "InceptionV3/Logits/SpatialSqueeze"
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=True)

    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint) 
        output_graph_def = graph_util.convert_variables_to_constants(
            sess=sess,
            input_graph_def=sess.graph_def,  
            output_node_names=output_node_names.split(",")) 

        with tf.gfile.GFile(output_graph, "wb") as f:  
            f.write(output_graph_def.SerializeToString())  
        print("%d ops in the final graph." % len(output_graph_def.node))  

if __name__ == '__main__':
    # transformation
    input_checkpoint = './input_model/model.ckpt-10000'
    out_pb_path = "./output_model/frozen_model.pb"
    freeze_graph(input_checkpoint, out_pb_path)
    # test
    image_path = './test_img/guitar.jpg'
    freeze_graph_test(pb_path=out_pb_path, image_path=image_path)

4.3 conversion of pb to uff

preparation
import tensorflow as tf
import uff
import tensorrt as trt
import pycuda.driver as cuda 
import pycuda.autoinit
from tensorrt.parsers import uffparser
● uff: It's just pb Translate into TensorRT Engine supported uff File, which can be serialized or directly circulated in the past.
● pycyda: For graphics cards cuda Programmed, if you want to use TensorRT of python API,This is a required library
● uffparser : For parsing uff Model
Parameter setting
# Data customization
MODEL_DIR = './output_model/frozen_model.pb'
CHANNEL = 3
HEIGHT = 224
WIDTH = 224
ENGINE_PATH = './model_seg/model_.pb.plan'
INPUT_NODE = 'input'
OUTPUT_NODE = ['InceptionV3/Logits/SpatialSqueeze']
INPUT_SIZE = [CHANNEL, HEIGHT ,WIDTH] 
MAX_BATCH_SIZE = 1 
MAX_WORKSPACE = 1<<30
● MODEL_DIR: Generated in step 1 pb Model address
● CHANNEL,HEIGHT,WIDTH: The channel, height and width of the picture are determined according to the input size of the model
● ENGINE_PATH: Save later TensorRT Address of the engine
● INPUT_NODE: Input node of the model
● OUTPUT_NODE: The output node of the model is a list. If there are many output nodes, the node names will be listed in this list
● INPUT_SIZE: Enter the size of the picture. Pay attention to whether the channel is in front or behind. Here, enter yes CHANNEL, HEIGHT ,WIDTH
● MAX_BATCH_SIZE: When reasoning, input several pictures at a time
● MAX_WORKSPACE: Size of video memory 1<<30 That is 1 GB The size of the. Sometimes, when the program runs, it will report a memory overflow error. At this time, it can be reduced MAX_WORKSPACE,For example, 2 << 10
Analytical model of pb to UF conversion
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO)
uff_model = uff.from_tensorflow_frozen_model(FROZEN_GDEF_PATH, OUTPUT_NODE)
parser = uffparser.create_uff_parser()
parser.register_input(INPUT_NODE, INPUT_SIZE, 0)
parser.register_output(OUTPUT_NODE)

What we do here is to convert the pb file format into the UF file format. A concept you need to know is that UFF(Universal Framework Format) is a data format that describes DNN execution diagram. The binding execution diagram is input and output, so parser register_ Input and parser register_ What output does is record the input and output of tensorflow model in UFF file.

Note that for multiple outputs, because OUTPUT_NODE is a list, so you can put multiple output nodes into the list in turn.

If there are multiple inputs, you need to record the input node names in the UF one by one. register_input() requires three parameters:

name – Input name.
shape – Input shape.
order – Input order on which the framework input was originally.
Assuming that your model inputs three pictures at the same time in the input layer, you need to define three input nodes and specify the order as 0, 1 and 2 respectively. The order here refers to the order of model inputs in the UF structure. This order will be reflected in the following binding.

● parser.register_input(INPUT_NODE1, INPUT_SIZE, 0)
● parser.register_input(INPUT_NODE2, INPUT_SIZE, 1)
● parser.register_input(INPUT_NODE3, INPUT_SIZE, 2)
Save model
engine = trt.utils.uff_to_trt_engine(
		G_LOGGER,
		uff_model,
		parser,
		MAX_BATCH_SIZE,
		MAX_WORKSPACE,
		datatype=trt.infer.DataType.FLOAT)
		
# The above code creates the engine in TensorRT, that is, the engine, which will be responsible for the forward operation of the model. TensorRT is an acceleration tool for reasoning, so forward calculation is enough.
# After the engine is created successfully, it can be used. One suggestion, however, is to save the results. After all, so far, although there is little code, it is not easy to successfully convert pb files into UF files (who uses who knows!)
# Using the following statement, we have saved one PLAN file. The PLAN file is the serialized data used by the running engine to execute the network. Contains weights, steps performed in the network, and network information used to determine how to bind input and output caches.

trt.utils.cwrite_engine_to_file('./model_.pb.plan',engine.serialize())

4.4 reasoning steps

Now, let's call the previously saved plan file, enable the engine, and start reasoning with TensorRT.

engine = trt.utils.load_engine(G_LOGGER, './model_.pb.plan')

The engine is called engine, and the context in which the engine runs is called context. Both engine and context are necessary in the reasoning process. The relationship between them is as follows:

context = engine.create_execution_context()
engine = context.get_engine() 

Before running the forward operation, we need to make a confirmation. get_nb_bindings() is to get the number of input and output tensors related to this engine. For the single input-output model in this example, the number of tensors is 2. For example, if there are three input and output values of the model, confirm that there are three output values of the model. We need to know this quantity in order to prepare for the allocation of video memory in the future.

print(engine.get_nb_bindings())
assert(engine.get_nb_bindings() == 2)

Now prepare an image img that can be input to the model Jpg, and converted to fp32

img = cv2.imread(img.jpg)
img = img.astype(np.float32)

At the same time, create an array to "catch" the output data. Why do you say "catch", because then you will see that when the engine performs forward reasoning calculation, it generates a data stream, which will be written into the output array

#create output array to receive data
OUTPUT_SIZE = 10
output = np.zeros(OUTPUT_SIZE , dtype = np.float32)

We need to allocate video memory for input and output and bind it.

# Use PyCUDA to apply for GPU video memory and register in the engine
# The requested size is the input of the entire batch size and the expected output pointer size.
d_input = cuda.mem_alloc(1 * img.size * img.dtype.itemsize)
d_output = cuda.mem_alloc(1 * output.size * output.dtype.itemsize)

# The engine needs to bind the pointer of GPU video memory. PyCUDA realizes memory application by allocating into ints.
bindings = [int(d_input), int(d_output)]

Now, we can start the reasoning calculation on TensorRT!

# Establish data flow
stream = cuda.Stream()
# Pass input to cuda
cuda.memcpy_htod_async(d_input, img, stream)
# Perform forward reasoning calculation
context.enqueue(1, bindings, stream.handle, None)
# Return forecast results
cuda.memcpy_dtoh_async(output, d_output, stream)
# synchronization
stream.synchronize()

At this time, if you print out the output, you will find that there is already a value in the output array, which is the result of TensorRT calculation.

If you use the tensorflow method to predict the same set of input data and see whether the calculation results are consistent, because there will be some differences in accuracy, but generally speaking, using tensorflow and TensorRT will get consistent results.

4.5 complete code

The appeal process is a false parameter, and the following part is the real and effective path

import uff
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
from tensorrt.parsers import uffparser
import cv2
MODEL_DIR = '/home/liyang/ly/inception2trt/tf_pb/output_model/frozen_model.pb'
CHANNEL = 3
HEIGHT = 224
WIDTH = 224
ENGINE_PATH = './model_.pb.plan'
INPUT_NODE = 'input'
output_node = 'InceptionV3/Logits/SpatialSqueeze'
INPUT_SIZE = [CHANNEL, HEIGHT, WIDTH]
MAX_BATCH_SIZE = 1
MAX_WORKSPACE = 1<<30

# PB-UFF
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO)
uff_model = uff.from_tensorflow_frozen_model(FROZEN_GDEF_PATH, [OUTPUT_NODE])
parser = uffparser.create_uff_parser()
parser.register_input(INPUT_NODE, INPUT_SIZE, 0)
parser.register_output(OUTPUT_NODE)

# SERIVER
engine = trt.utils.uff_to_trt_engine(G_LOGGER,uff_model,
                     parser,MAX_BATCH_SIZE,
                     MAX_WORKSPACE,datatype=trt.infer.DataType.FLOAT)
# save
trt.utils.cwrite_engine_to_file('./checkpoint/model_.pb.plan',engine.serialize())




# start tensorR
engine = trt.utils.load_engine(G_LOGGER, './model_.pb.plan')
context = engine.create_execution_context()
input_img = cv2.imread('./1.jpg')
def infer32(batch_size):
    # commissioning
    engine = context.get_engine()
    assert(engine.get_nb_bindings() == 2)
    start = time.time()
    dims = engine.get_binding_dimensions(1).to_DimsCHW()
    elt_count = dims.C() * dims.H() * dims.W() * batch_size
    input_img = input_img.astype(np.float32)
    output = cuda.pagelocked_empty(elt_count, dtype=np.float32)
    d_input = cuda.mem_alloc(batch_size * input_img.size * input_img.dtype.itemsize)
    d_output = cuda.mem_alloc(batch_size * output.size * output.dtype.itemsize)
    bindings = [int(d_input), int(d_output)]
    stream = cuda.Stream()
    cuda.memcpy_htod_async(d_input, input_img, stream)
    context.enqueue(batch_size, bindings, stream.handle, None)
    cuda.memcpy_dtoh_async(output, d_output, stream)
    end = time.time()
    return output


infer32(1)


reference resources: https://blog.csdn.net/qq_34067821/article/details/90710192
           ~~~~~~~~~~           https://zhuanlan.zhihu.com/p/139767249
           ~~~~~~~~~~           https://zhuanlan.zhihu.com/p/64099452
           ~~~~~~~~~~           https://zhuanlan.zhihu.com/p/64114667

Keywords: TensorFlow Deep Learning Object Detection TensorRT

Added by Beauford on Wed, 02 Feb 2022 15:31:07 +0200