OpenVINO computer vision model acceleration - from zero to several major case practices

OpenVINO computer vision model acceleration tutorial

1, Foundation

1. OpenVINO introduction

OpenVINO is the first version released in May 2018. Why choose this time? Before that, deep learning was very popular. There were many computer vision frameworks. Before deep learning, such as opencv and matlab, there were these computer vision development frameworks, but after the development of deep learning, these visual frameworks faced embarrassing problems? That is, they are all aimed at traditional computer vision. Although they also support these computer vision for deep learning to varying degrees, in essence, they are not a new framework that fully supports the deployment of computer deep learning, Therefore, at this time, intel found the market demand energy and the needs of developers, so it opened the computer vision framework OpenVINO.

What are the advantages of this framework over other traditional visual frameworks such as opencv and matlab? It supports accelerated computing on a variety of edge hardware platforms (take a simple example: compare OpenCV to running a training model of computer face recognition such as CDNN on a computer or other computer vision hardware platform. There are 5 frames per second on the CPU, but if it is used for OpenVINO acceleration, 20 frames per second and 30 frames are possible. Therefore, OpenVINO can complete an average speed increase of 5-10 times).

Why can we increase such a high speed? openVINO is intel, which adopts a variety of hardware platform instructions and multithreading methods to accelerate the fusion of the training model from the bottom instruction set.

The latest version is 2021.2, and intel also uses a large number of pre training models in the openVINO framework, Therefore, it supports the rapid demonstration of visual tasks in a variety of common scenes (for example, for our face recognition, it has four lightweight models with only a few M, and then the speed can reach more than 100 frames per second, which is particularly stable. Pedestrian detection, license plate detection and recognition, scene text detection and recognition. For these common visual tasks, its demo can be run quickly based on openVINO, that is, it can achieve the effect of rapid demonstration, which is also the goal of openVINO A big advantage, so you can omit a lot of training, find data training model, and these times can complete rapid delivery)

https://docs.openvinotoolkit.org/2021.1/index.html official website

OpenVINO is also divided into two parts. The two most important components are the model optimizer for model transformation and the reasoning engine for model acceleration. With these two things, our OpenVINO can receive the model trained by Epoch, then I can convert it to orx and transfer it to OpenVINO for accelerated call. For example, we are the model trained by tensorflow. After you generate the pp file, we can also convert it to OpenVINO, convert it to intermediate format through its MO model optimizer, and then speed up the call. Therefore, it can be realized that the models generated by the training received from almost all our in-depth learning can be deployed by OpenVINO, and then the computing of various multi edge hardware such as graphics card, fpga and computing stick supported by CPU and intel can be deployed, and some public computers can be deployed.
Its advantage is that it can remove the N card. When the control model is not very large, I can support it through the CPU when the production conditions are met, and then we don't need to change a high-level n card (n card means NVIDIA Series graphics card), so as to significantly improve the cost advantage. Therefore, it is a good deployment framework for deep learning model.

At the same time, intel has integrated some basic traditional computer vision related things of the original opencv in openvino. Therefore, once you have openvino, you will have all the functions of OpenCV, plus the functions of in-depth learning and deployment framework acceleration. Therefore, openvino is such a computer vision related framework.

2. Course involves

OpenVINO C++/Python development environment configuration, OpenVINO C++/Python SDK development skills, support for ONNX format model, Tensorflow model and openvino, and code demonstration of the combination of the latest yoov5 to ONNX and openvino.

3. Installation and construction of OpenVINO development environment

The introduction of OpenVINO is to quickly build a visual application prototype system and solution, deploy and run the model in real time on the end-side equipment, and its role is to run it in real time on our pc and some boards. The version used is 2021.2, the latest version.

Picture 1

3.1. System uninstall vs. installation tool VS2019, OpenVINO_2021.2.185 ,Python3.6.5 ,cmake

Install uninstall vs tool https://www.cnblogs.com/kuangqiu/p/7760281.html ,

Tools vs delete links; https://github.com/Microsoft/VisualStudioUninstaller/releases

regedit can view the registry vs delete; https://blog.csdn.net/inch2006/article/details/102372940

VS2019 download; https://visualstudio.microsoft.com/zh-hans/ Then execute exe and download the c + + library for installation

OpenVINO_2021.2.185 you need to register, download and double-click to install https://docs.openvinotoolkit.org/2021.2/openvino_docs_install_guides_installing_openvino_windows.html There are corresponding installation steps and corresponding environment installation web page steps. Pay attention to executing scripts

Python3.6.5 installation link https://www.jb51.net/article/147615.htm Official website link https://www.python.org/downloads/windows/

Pycham ide tutorial https://www.runoob.com/python/python-ide.html Official website link https://www.jetbrains.com/pycharm/download/#section=windows

cmake: CMake 3.10 or higher 64-bit NOTE: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14.
Cmake: link: https://cmake.org/download/ Pay attention to configuring environment variables

3.2. Familiar with OpenVINO installation directory

deployment_tools
	inference_engine Inference engine
	model_optimizer Model optimizer	
	ngraph The latest also belongs to the reasoning engine
	open_model_zoo Open model library
		models
			intel interl The model library we have trained is for us to use. There are only descriptions of the model,
				Real use or need tools->downloader ->downloader Tools to download
			public Various model libraries provided by others
		tools Help us download some models we need
			downloader Download and convert tools, such as pytorch_to_onnx Conversion

3.3 installation environment test

Execute openvino_2021.2.185\deployment_tools\demo\demo_security_barrier_camera.bat this script to test the C + + environment. This is a case of vehicle license plate number recognition model. If the environment is correct, the final effect picture is.

Figure 2

3.4 basic overview of OpenVINO

Figure 3

Figure 4

3.5 C++ vs development environment configuration test

Configure include directory
C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\inference_engine\include
C:\Program Files (x86)\Intel\openvino_2021.2.185\opencv\include\opencv2

Configuration Library Directory
C:\Program Files (x86)\Intel\openvino_2021.2.185\opencv\lib
C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\inference_engine\lib\intel64\Release

Configure link dependent Libraries
Using python script output
C:\Program Files (x86)\Intel\openvino_2021.2.185\opencv\lib 451 without d Lib Library
C:\Program Files (x86)\Intel\openvino_ 2021.2.185\deployment_ tools\inference_ Lib Library of engine \ lib \ Intel 64 \ release

Figure 5

Configure environment variables
C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\inference_engine\external\tbb\bin
C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\inference_engine\bin\intel64\Release
C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\ngraph\lib
C:\Program Files (x86)\Intel\openvino_2021.2.185\opencv\bin

Test code

Figure 6

For example, a group of friends learning together encountered a problem: unable to locate the program input point report dynamic link library information_ engine. DLL error, as shown in the figure

Figure 9

Problem solving method: when setting environment variables, the four environment variables related to openVINO should be set to the front to solve this problem.

Figure 10

3.6. OpenVINO SDK learning

The starting IE module is the reasoning module, which is the most commonly used. It's easy to look at the conversion module after getting familiar with it.
The first is the OpenVINO development process, which gives you a model. How do you run a model to complete reasoning, and then display and other links.
The second is to query whether your current device hardware support does not support. There are some functions to query the hardware support, where the device module can support model acceleration, etc.
The third is the application and function introduction of the SDK integrated with your application,
OpenVINO IE development process:

Figure 7

1. Initializing the core itself is an object, which is a class object introduced from the API level of IE inference engine. Through this object, we load the CPU/GPU dependency at the bottom of openVINO.
2. After that, we can read the model in the next step. This model is called intermediate file and IR mode,
Why call the model an intermediate file? There are many deep learning frameworks. It has a variety of static diagrams, dynamic diagrams, GIT timely operation mode and so on. We train and generate these models on GPU, but we finally need to run on various embedded devices, I want to convert all the things on your model into instructions that I support and let it execute as quickly as possible (this is not to parse its original model, but to mechanically parse it, which must be through the connected format files understood by various devices at the bottom). OpenVINO is such a compilation intermediate layer, which is called IR. In the deployment of many deep learning frameworks, the deployment framework has its own middle tier structure. OpenVINO is converted to IR through the model inference MO, so we can read IR.
3. Then we know that there are inputs and outputs in model reasoning, and the subsequent is to configure inputs and outputs.
4. After configuring the input and output, load the model.
5. Create another inference request.
6. Then prepare the input data.
7. Then complete the reasoning.
8. Finally, the output result of post-processing analysis is output.
9. The last is to prepare the loop for input reasoning and output post-processing. This is the development process of OpenVINO's reasoning engine.

Hardware support query:

Import header file:#include "inference_engine.hpp"
Namespace introduction: using namespace InferenceEngine;
Device query function:
To create a core class object: InferenceEngine::Core ie;
Query available devices: std::vector<std::string> devices = ie.GetAvailableDevices();
Full name of output device: ie.GetMetric("CPU", METRIC_KEY(FULL_DEVICE_NAME)).as<std::string>();

IE related API functions support:

Core class: InferenceEngine::Core ie;
For data conversion: InferenceEngine::Blob,InferenceEngine::TBlob, InferenceEngine:: NV12Blob
 Formatting: InferenceEngine::BlobMap
 Input / output format read settings:InferenceEngine::InputsDataMap,InferenceEngine::InputInfo,InferenceEngine::OutputDataMap
 Some packaging classes:
adopt Core conduct read Later get CNN Network: InferenceEngine::CNNNetwork
 take CNN The network becomes an executable network: InferenceEngine::ExecutableNetwork
 Request, you can get the reasoning object, and finally parse the output: InferenceEngine::InferRequest

2, Project practice

1. ResNet18 implementation of image classification

1.1 introduction to pre training model: OpenVINO's own model ResNet18 image classification model. How does this image classification model turn it into IR? ResNet18 itself is a pytorch model. How is it converted into ONNX format, and then into IR format? Finally, the demonstration is completed. There is a set of process for how to turn this. We'll talk about it later.
Now, if there is a ready-made model, how can we use it through the SDK of OpenVINO IE to realize the reasoning deployment of image classification.
1.2 image preprocessing: the image has a preprocessing process, that is, to use all images of this model, we must first convert it to 0-1. The normal image is between 0-255, RGB three channels, and the output here is between 0-1.
1.3. mean=[0.485, 0.456, 0.406] variance std=[0.229, 0.224, 0.225] the operation is to subtract its mean value and divide it by its variance. This part is processed through opencv API.
1.4. Input format output format. The input and output of each deep learning framework has its own defined format. The input format NCHW (number of images, number of channels of channel images, H height, W width) = 13244 * 244 indicates that it is a color image. The width and height of one image is 244 each time.
Output format after reasoning: 1x1000 (this model is trained based on the number set of imagenet_classes, because it has 1000 classifications, so its output is 1 times 1000). In this case, we find the string corresponding to its largest index, which is the exact classification of our image.

1.5 what to do at the code level:

1,initialization Core ie
2,ie.ReadNetwork read CNN The network needs two files in the model resnet18.bin resnet18.xml
3,Gets the input and output formats and sets the precision
4,Get executable network and link hardware
5,After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
6,Get input Blob Format conversion class object
7,Input image data preprocessing(include BGR->RGB,Size floating point number calculation conversion image sequence conversion HWC->NCHW)
8,Executive reasoning
9,Get our output
10,Finally, we need to obtain the output dimension information, analyze the data and output the largest one is us resnet18 Results of image recognition

Figure 8

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
std::string labels_txt_file = "D:/code/OpenVINO/OpenVINO_SupportMode/resnet18_ir/imagenet_classes.txt"; //File name of the model 
std::vector<std::string> readClassNames();//Functions to read files
int main(int argc, char** argv)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/OpenVINO_SupportMode/resnet18_ir/resnet18.xml";
    std::string binFilename = "D:/code/OpenVINO/OpenVINO_SupportMode/resnet18_ir/resnet18.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        //Because this model is from the pytorch digital set, its accuracy is the full accuracy of FP32, so we need to set FP32 at this time
        input_data->setPrecision(Precision::FP32);
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::RGB);
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        //Because this model is from the pytorch digital set, its accuracy is the full accuracy of FP32, so we need to set FP32 at this time
        output_data->setPrecision(Precision::FP32);
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto infer_request = executable_network.CreateInferRequest();

    //6. Gets the input Blob format conversion class object
    auto input = infer_request.GetBlob(input_name);//Get the blob of input (class object for input formatting)
    size_t num_channels = input->getTensorDesc().getDims()[1];
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;

    //7. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    cv::Mat srcOriginal = cv::imread("D:/succoBlar.png");//Original drawing to be parsed
    cv::Mat src;
    cv::cvtColor(srcOriginal, src, cv::COLOR_BGR2RGB);// Because the receiving order of resnet18 model is RGB and the order of opencv is BGR, it is necessary to convert and then preprocess the image of resnet18 model
    cv::Mat blob_image;//Convert to network, you can parse the picture format
    cv::resize(src, blob_image, cv::Size(w, h));//Conversion size
    blob_image.convertTo(blob_image, CV_32F);//Convert to floating point number
    blob_image = blob_image / 255.0;//Convert to 0-1
    cv::subtract(blob_image, cv::Scalar(0.485, 0.456, 0.406), blob_image);//The value of each channel is subtracted from the mean
    cv::divide(blob_image, cv::Scalar(0.229, 0.224, 0.225), blob_image);// The value of each channel is divided by the variance
   
                                                                        //8. Set the set data into the input Blob - > in fact, the memory space for storing the input image data has been opened up when GetBlob() 
    float* data = static_cast<float*>(input->buffer());//This is to directly fill the data into the specified space of input after data conversion
    //be careful; The order of mat images returned by opencv is HWC. To convert it to NCHW is to convert a matrix of HWC type to NCHW type, which is the problem of matrix filling
    // HWC = NCHW conversion
    for (size_t row = 0; row < h; row++) {
        for (size_t col = 0; col < w; col++) {
            for (size_t ch = 0; ch < num_channels; ch++) {
                //blob_image is the HWC format from opencv - "conversion to NCHW" means that each channel becomes a graph according to the channel order. The of the first few channels is the storage of the first few pictures
                data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3f>(row, col)[ch];
            }
        }
    }

    //8. Executive reasoning
    infer_request.Infer();

    //9. Get our output
    auto output = infer_request.GetBlob(output_name);
    const float* probs = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Convert output data to Precision
    
    //10. Finally, we need to obtain the output dimension information, analyze the data and output the largest one, which is the result of resnet18 image recognition
    const SizeVector outputDims = output->getTensorDesc().getDims();
    std::cout << outputDims[0] << "X" << outputDims[1] << std::endl;
    float max = probs[0];
    int max_index = 0;
    for (int i = 1; i < outputDims[1]; i++)
    {
        if (max < probs[i])
        {
            max = probs[i];
            max_index = i;
        }
    }
    std::cout << "class index:" << max_index << std::endl;
    //Parsing imagenet_classes data set training model txt output image recognition classification structure to the picture
    std::vector<std::string> labels = readClassNames();
    std::cout << "class name:" << labels[max_index] << std::endl;
    //Write text on the picture
    cv::putText(srcOriginal, labels[max_index], cv::Point(50,50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
    cv::imshow("input image ", srcOriginal);
    cv::waitKey(0);

    return 0;
}
//Parsing imagenet_classes.txt file to get 1000 categories
std::vector<std::string> readClassNames()
{
    std::vector<std::string> classNames;

    std::ifstream fp(labels_txt_file);
    if (!fp.is_open())
    {
        printf("could not open file...\n");
        exit(-1);
    }
    std::string name;
    while (!fp.eof())
    {
        std::getline(fp, name);
        if (name.length())
            classNames.push_back(name);
    }
    fp.close();
    return classNames;
}

2. SSD vehicle and license plate detection

In the above project, we use the resnet18 model to realize the function of image classification. Resnet18, a residual series network, is also a basic backbone network. Realizing image classification is one of its basic functions. In computer vision, in addition to classification, another most common is target detection. So next, we will realize how to deploy and accelerate an object detection network through openvino. Here, we use a model that comes with openvino model library. This model can quickly help us realize the object detection of a specific application scenario. This application scenario is the detection of vehicles and license plates, which is very common at highway checkpoints. This case is how to realize a fast vehicle and license plate detection through openvino framework.

2.1 practical cases of vehicle license plate detection

2.1.1 introduction to the model

Figure 11

We should also explain it here; You may have some very common models, and then you can detect the whole scene. This is open scene detection, but in practical applications, your application scenario may be applied in a vertical field. You just need to do your best in this, so you may build your model very small at this time, However, the model will also be used in specific scenes to achieve high recognition rate. For example, our current model of vehicle license plate detection is very small, but it is very good in the application scenario of highway bayonet.
"C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\open_model_zoo\models\intel\vehicle-license-plate-detection-barrier-0106\description\vehicle-license-plate-detection-barrier-0106.html" you can view this file, which is the official description of the model ehicle-license-plate-detection-barrier-0106.


Figure 12 13

2.1.2 process calling sequence of code

It is similar to the previous resnet18 image classification,
Loading model
Set input and output
Build input
Execute inference
Parse output
Display results
Can directly take the last code for transformation.

2.1.3. Download the model in OpenVINO model library

I knew OpenVINO before_ 2021.2.185\deployment_ tools\open_ model_ Zoo \ models \ intel in this directory is all the model libraries of intel supported by OpenVINO, but only the introduction here shows that the practice model download still needs to be downloaded through the tool.
Download tools of Intel official model;
Use the script in this directory; “C:\Program Files (x86)\Intel\openvino_2021.2.185\deployment_tools\open_model_zoo\tools\downloader\downloader.py”,
Execute orders;

Previously encountered the problem that permissions cannot be downloaded: the solution is to find CMD Exe can be opened by administrator.
After the download is successful, the vehicle-license-plate-detection-barrier-0106 folder will be generated
vehicle-license-plate-detection-barrier-0106 will download three models with different quantization ratios
FP16 semi precision
FP16-INT8 8-bit quantized
FP32 full precision
We use FP32 full precision on PC.

2.1.4 code demonstration

Figure 14

Example code

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;

int main(int argc, char** argv)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/OpenVINO_SupportMode/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.xml";
    std::string binFilename = "D:/code/OpenVINO/OpenVINO_SupportMode/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto infer_request = executable_network.CreateInferRequest();

    //6. Gets the input Blob format conversion class object
    auto input = infer_request.GetBlob(input_name);//Get the blob of input (class object for input formatting)
    size_t num_channels = input->getTensorDesc().getDims()[1];
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;

    //7. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    cv::Mat src = cv::imread("D:/Vihecle.jpg");//Original drawing to be parsed
    //cv::namedWindow("input",cv::WINDOW_FREERATIO);// Set the free scale of the window to ensure that the picture can be displayed normally even if it is too large
    int im_h = src.rows;
    int im_w = src.cols;
    cv::Mat blob_image;//Convert to network, you can parse the picture format
    cv::resize(src, blob_image, cv::Size(w, h));//Conversion size
   
    //8. Set the set data into the input Blob - > in fact, the memory space for storing the input image data has been opened up when GetBlob() 
    unsigned char* data = static_cast<unsigned char*>(input->buffer());//This is to directly fill the data into the specified space of input after data conversion
    //be careful; The order of mat images returned by opencv is HWC. To convert it to NCHW is to convert a matrix of HWC type to NCHW type, which is the problem of matrix filling
    // HWC = NCHW conversion
    for (size_t row = 0; row < h; row++) {
        for (size_t col = 0; col < w; col++) {
            for (size_t ch = 0; ch < num_channels; ch++) {
                //blob_image is the HWC format from opencv - "conversion to NCHW" means that each channel becomes a graph according to the channel order. The of the first few channels is the storage of the first few pictures
                data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
            }
        }
    }

    //8. Executive reasoning
    infer_request.Infer();

    //9. Get our output
    auto output = infer_request.GetBlob(output_name);
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //10. Finally, you need to obtain the output dimension information and parse the data
    const SizeVector outputDims = output->getTensorDesc().getDims();
    std::cout << outputDims[2] << "X" << outputDims[3] << std::endl;
    const int max_num = outputDims[2];//Is the output N
    const int object_size = outputDims[3];//It's the output 7
    for (int n = 0; n < max_num; n++)
    {
        float lable = detection_out[n*object_size + 1];// +1 indicates that the output is the second lableID of the seven
        float confidence = detection_out[n * object_size + 2];
        float xmin = detection_out[n * object_size + 3] * im_w;  //The floating point coordinates obtained from the output are 0-1, and the actual coordinates are multiplied by the original width and height
        float ymin = detection_out[n * object_size + 4] * im_h;
        float xmax = detection_out[n * object_size + 5] * im_w;
        float ymax = detection_out[n * object_size + 6] * im_h;
        if (confidence > 0.5)
        {
            printf("lable id = %d\n", static_cast<int>(lable));
            cv::Rect box;
            box.x = static_cast<int>(xmin);
            box.y = static_cast<int>(ymin);
            box.width = static_cast<int>(xmax - xmin);
            box.height = static_cast<int>(ymax - ymin);
            if (lable == 2)//License plate
            {
                cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8, 0);
            }
            else
            {
                cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8, 0);
            }
            cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
        }
    }

    cv::imshow("input", src);
    cv::waitKey(0);

    return 0;
}

2.2 Chinese license plate recognition

Why Chinese license plate recognition? Because the model trained by OpenVINO is very friendly, it can directly detect Chinese license plates and output results. Of course, this is also a specific scenario applied to license plate detection and recognition at highway checkpoints. It can be used in fixed application scenarios, but you can train it in other scenarios. There is a training framework and script on github, which is based on tersorflow1 Developed by version X.

2.2.1 model introduction

Figure 15

View license-plate-recognition-barrier-0001 Introduction to the official template provided by HTML

Figure 16, 17

2.2.2 call execution process

Load model (two models (detection and recognition), both loaded only once)
Detect the vehicle and license plate. If it is a license plate, identify the license plate. If it is a vehicle, do not deal with it

2.2.3 code demonstration

Figure 18

Practice code

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;

static const char* const items[] = {
    "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
    "<Anhui>", "<Beijing>", "<Chongqing>", "<Fujian>",
    "<Gansu>", "<Guangdong>", "<Guangxi>", "<Guizhou>",
    "<Hainan>", "<Hebei>", "<Heilongjiang>", "<Henan>",
    "<HongKong>", "<Hubei>", "<Hunan>", "<InnerMongolia>",
    "<Jiangsu>", "<Jiangxi>", "<Jilin>", "<Liaoning>",
    "<Macau>", "<Ningxia>", "<Qinghai>", "<Shaanxi>",
    "<Shandong>", "<Shanghai>", "<Shanxi>", "<Sichuan>",
    "<Tianjin>", "<Tibet>", "<Xinjiang>", "<Yunnan>",
    "<Zhejiang>", "<police>",
    "A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
    "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T",
    "U", "V", "W", "X", "Y", "Z"
};


void load_plate_recog_model(InferenceEngine::InferRequest &plate_request, std::string &plate_input_name1, std::string &plate_input_name2, std::string &plate_output_name);
void fetch_plate_text(InferenceEngine::InferRequest& plate_request, std::string& plate_input_name1, std::string& plate_input_name2, std::string& plate_output_name, cv::Mat &image, cv::Mat plateROI);

int main(int argc, char** argv)
{
    InferenceEngine::InferRequest plate_request;
    std::string plate_input_name1;//Although the official document says what the name is, there are sometimes bug s, so the code gets the name at last
    std::string plate_input_name2;
    std::string plate_output_name;
    load_plate_recog_model(plate_request, plate_input_name1, plate_input_name2, plate_output_name);//Load license plate recognition model

    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/OpenVINO_SupportMode/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.xml";
    std::string binFilename = "D:/code/OpenVINO/OpenVINO_SupportMode/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto infer_request = executable_network.CreateInferRequest();

    //6. Gets the input Blob format conversion class object
    auto input = infer_request.GetBlob(input_name);//Get the blob of input (class object for input formatting)
    size_t num_channels = input->getTensorDesc().getDims()[1];
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;

    //7. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    cv::Mat src = cv::imread("D:/Vihecle.jpg");//Original drawing to be parsed
    //cv::namedWindow("input",cv::WINDOW_FREERATIO);// Set the free scale of the window to ensure that the picture can be displayed normally even if it is too large
    int im_h = src.rows;
    int im_w = src.cols;
    cv::Mat blob_image;//Convert to network, you can parse the picture format
    cv::resize(src, blob_image, cv::Size(w, h));//Conversion size

    //8. Set the set data into the input Blob - > in fact, the memory space for storing the input image data has been opened up when GetBlob() 
    unsigned char* data = static_cast<unsigned char*>(input->buffer());//This is to directly fill the data into the specified space of input after data conversion
    //be careful; The order of mat images returned by opencv is HWC. To convert it to NCHW is to convert a matrix of HWC type to NCHW type, which is the problem of matrix filling
    // HWC = NCHW conversion
    for (size_t row = 0; row < h; row++) {
        for (size_t col = 0; col < w; col++) {
            for (size_t ch = 0; ch < num_channels; ch++) {
                //blob_image is the HWC format from opencv - "conversion to NCHW" means that each channel becomes a graph according to the channel order. The of the first few channels is the storage of the first few pictures
                data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
            }
        }
    }

    //8. Executive reasoning
    infer_request.Infer();

    //9. Get our output
    auto output = infer_request.GetBlob(output_name);
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //10. Finally, you need to obtain the output dimension information and parse the data
    const SizeVector outputDims = output->getTensorDesc().getDims();
    std::cout << outputDims[2] << "X" << outputDims[3] << std::endl;
    const int max_num = outputDims[2];//Is the output N
    const int object_size = outputDims[3];//It's the output 7
    for (int n = 0; n < max_num; n++)
    {
        float lable = detection_out[n * object_size + 1];// +1 indicates that the output is the second lableID of the seven
        float confidence = detection_out[n * object_size + 2];
        float xmin = detection_out[n * object_size + 3] * im_w;  //The floating point coordinates obtained from the output are 0-1, and the actual coordinates are multiplied by the original width and height
        float ymin = detection_out[n * object_size + 4] * im_h;
        float xmax = detection_out[n * object_size + 5] * im_w;
        float ymax = detection_out[n * object_size + 6] * im_h;
        if (confidence > 0.5)
        {
            printf("lable id = %d\n", static_cast<int>(lable));
            cv::Rect box;
            box.x = static_cast<int>(xmin);
            box.y = static_cast<int>(ymin);
            box.width = static_cast<int>(xmax - xmin);
            box.height = static_cast<int>(ymax - ymin);
            if (lable == 2)//License plate
            {
                // recognize plate
                cv::Rect plate_roi;//Each side is a little more than the detected rectangle for easy identification
                plate_roi.x = box.x - 5;
                plate_roi.y = box.y - 5;
                plate_roi.width = box.width + 10;
                plate_roi.height = box.height + 10;
                fetch_plate_text(plate_request, plate_input_name1, plate_input_name2, plate_output_name, src, src(plate_roi));
                cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8, 0);
            }
            else
            {
                cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8, 0);
            }
            cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
        }
    }

    cv::imshow("input", src);
    cv::waitKey(0);

    return 0;
}

void load_plate_recog_model(InferenceEngine::InferRequest& plate_request, std::string& plate_input_name1, std::string& plate_input_name2, std::string& plate_output_name)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/license-plate-recognition-barrier-0001/FP32/license-plate-recognition-barrier-0001.xml";
    std::string binFilename = "D:/code/OpenVINO/license-plate-recognition-barrier-0001/FP32/license-plate-recognition-barrier-0001.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    int cnt = 0;
    for (auto item : inputs)//Because this model has two inputs, it will cycle twice
    {
        if (cnt == 0)
        {
            plate_input_name1 = item.first;
            auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
            input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
            //This is also the input data mode set according to the model
            input_data->setLayout(Layout::NCHW);
            //ColorFormat is also input according to the model settings
            input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        }
        if (cnt == 1)
        {
            plate_input_name2 = item.first;
            auto input_data = item.second;
            input_data->setPrecision(Precision::FP32);
        }
        //input_name = item.first;
        
        std::cout << "input name " << cnt+1 << " = " << item.first << std::endl;
        cnt++;
    }
    for (auto item : outputs)
    {
        plate_output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << plate_output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    plate_request = executable_network.CreateInferRequest();
}

void fetch_plate_text(InferenceEngine::InferRequest& plate_request, std::string& plate_input_name1, std::string& plate_input_name2, std::string& plate_output_name, cv::Mat& image, cv::Mat plateROI)
{
    //Set input
    auto input1 = plate_request.GetBlob(plate_input_name1);
    size_t num_channels = input1->getTensorDesc().getDims()[1];
    size_t h = input1->getTensorDesc().getDims()[2];
    size_t w = input1->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;//Convert to network, you can parse the picture format
    cv::resize(plateROI, blob_image, cv::Size(w, h));//Conversion size 
    unsigned char* data = static_cast<unsigned char*>(input1->buffer());//This is to directly fill the data into the specified space of input after data conversion
    //be careful; The order of mat images returned by opencv is HWC. To convert it to NCHW is to convert a matrix of HWC type to NCHW type, which is the problem of matrix filling
    // HWC = NCHW conversion
    for (size_t row = 0; row < h; row++) {
        for (size_t col = 0; col < w; col++) {
            for (size_t ch = 0; ch < num_channels; ch++) {
                //blob_image is the HWC format from opencv - "conversion to NCHW" means that each channel becomes a graph according to the channel order. The of the first few channels is the storage of the first few pictures
                data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
            }
        }
    }
    auto input2 = plate_request.GetBlob(plate_input_name2);
    int max_sequence = input2->getTensorDesc().getDims()[0];//The first dimension of the second input is the length of the sequence
    float* blob2 = input2->buffer().as<float*>();
    blob2[0] = 0.0f;
    std::fill(blob2 + 1, blob2 + max_sequence, 1.0f);

    //Executive reasoning
    plate_request.Infer();

    //Output results
    auto output = plate_request.GetBlob(plate_output_name);
    const float* plate_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection
    std::string result;
    for (int i = 0; i < max_sequence; i++)
    {
        if (plate_data[i] == -1)
            break;
        result += items[std::size_t(plate_data[i])];
    }
    cv::putText(image, result.c_str(), cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
}

3. Pedestrian detection

3.1 model introduction



Figure 19 20 21

3.2 procedure execution process

Because they are SSD models and consistent with the input and output format of the vehicle detection model, they are BCHW inputs and 1 N 7 format outputs. Therefore, the overall code of vehicle detection can be directly taken for use. You only need to change the model path.

3.3 program demonstration

Figure 22

3.4 pedestrian detection in video

The code flow is similar, but the reasoning part is extracted into a function, which is passed in by passing parameters. The input is changed to read the video, and each frame is displayed after image reasoning.
It can be seen that openVINO model detection is used for video. It has no effect on the speed, and for this video with small width and height. openVINO model has the effect of hundreds of frames per second for video processing. You can also specify that openVINO has a significant acceleration effect on the model after using the model.
Practical effect

Figure 23

Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
void infer_process(cv::Mat &src, InferenceEngine::InferRequest & infer_request, std::string& input_name, std::string& output_name);
//Image pedestrian detection
int main(int argc, char** argv)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/pedestrian-detection-adas-0002/FP32/pedestrian-detection-adas-0002.xml";
    std::string binFilename = "D:/code/OpenVINO/pedestrian-detection-adas-0002/FP32/pedestrian-detection-adas-0002.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto infer_request = executable_network.CreateInferRequest();

    //6. Create video stream / load video file
    cv::VideoCapture capture("D:/padestrian_detection.mp4");
    cv::Mat src;
    while (true)
    {
        bool ret = capture.read(src);
        if (ret == false)
            break;
        //7. Loop execution reasoning
        infer_process(src, infer_request, input_name, output_name);
        cv::imshow("src", src);
        char c = cv::waitKey(1);
        if (c == 27)//Press esc
            break;
    }

    //cv::imshow("pedestrian_detection_demo", src);
    cv::waitKey(0);

    return 0;
}

void infer_process(cv::Mat& src, InferenceEngine::InferRequest& infer_request, std::string& input_name, std::string& output_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request.GetBlob(input_name);//Get the blob of input (class object for input formatting)
    size_t num_channels = input->getTensorDesc().getDims()[1];
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;

    //2. Convert input image
    int im_h = src.rows;
    int im_w = src.cols;
    cv::Mat blob_image;//Convert to network, you can parse the picture format
    cv::resize(src, blob_image, cv::Size(w, h));//Conversion size

    //3. Set the set data into the input Blob - > in fact, the memory space for storing the input image data has been opened up when GetBlob() 
    unsigned char* data = static_cast<unsigned char*>(input->buffer());//This is to directly fill the data into the specified space of input after data conversion
    //be careful; The order of mat images returned by opencv is HWC. To convert it to NCHW is to convert a matrix of HWC type to NCHW type, which is the problem of matrix filling
    // HWC = NCHW conversion
    for (size_t row = 0; row < h; row++) {
        for (size_t col = 0; col < w; col++) {
            for (size_t ch = 0; ch < num_channels; ch++) {
                //blob_image is the HWC format from opencv - "conversion to NCHW" means that each channel becomes a graph according to the channel order. The of the first few channels is the storage of the first few pictures
                data[image_size * ch + row * w + col] = blob_image.at<cv::Vec3b>(row, col)[ch];
            }
        }
    }

    //4. Executive reasoning
    infer_request.Infer();

    //5. Get our output
    auto output = infer_request.GetBlob(output_name);
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //6. Finally, you need to obtain the output dimension information and parse the data
    const SizeVector outputDims = output->getTensorDesc().getDims();
    //std::cout << outputDims[2] << "X" << outputDims[3] << std::endl;
    const int max_num = outputDims[2];//Is the output N
    const int object_size = outputDims[3];//It's the output 7
    for (int n = 0; n < max_num; n++)
    {
        float lable = detection_out[n * object_size + 1];// +1 indicates that the output is the second lableID of the seven
        float confidence = detection_out[n * object_size + 2];
        float xmin = detection_out[n * object_size + 3] * im_w;  //The floating point coordinates obtained from the output are 0-1, and the actual coordinates are multiplied by the original width and height
        float ymin = detection_out[n * object_size + 4] * im_h;
        float xmax = detection_out[n * object_size + 5] * im_w;
        float ymax = detection_out[n * object_size + 6] * im_h;
        if (confidence > 0.5)
        {
            //printf("lable id = %d\n", static_cast<int>(lable));
            cv::Rect box;
            box.x = static_cast<int>(xmin);
            box.y = static_cast<int>(ymin);
            box.width = static_cast<int>(xmax - xmin);
            box.height = static_cast<int>(ymax - ymin);
            cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8, 0);
            cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 1, 8);
        }
    }
}

4. Asynchronous reasoning for real-time face detection

4.1 model introduction

OpenVINO comes with a series of lightweight face detection models, most of which can reach 100 or higher fps, so it can be more suitable for deployment on edge end devices.


Figure 24, 25, 26

It can be compared with OpenCV. Opencv is a traditional module, and OpenVINO is also an algorithm. The difference is that each algorithm here is a module based on deep learning, and each model provided. You can get different algorithm performance by developing around this model

4.2. Synchronous / asynchronous execution

Previous model practices used infer_request.Infer() is a synchronous model, but asynchronous is different,

4.3 code demonstration

Figure 30

Synchronous implementation

Asynchronous implementation

Figure 27

Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
//Convert the U8 type data from mat to Blob. Because Blob is of any type, it is also declared as a template
template <typename T> void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0)
{
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
            << "but by fact we were not able to cast inputBlob to MemoryBlob";
    }
    // locked memory holder should be alive all time while access to its buffer happens
    auto mblobHolder = mblob->wmap();

    T* blob_data = mblobHolder.as<T*>();

    cv::Mat resized_image(orig_image);
    if (static_cast<int>(width) != orig_image.size().width ||
        static_cast<int>(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c < channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                blob_data[batchOffset + c * width * height + h * width + w] =
                    resized_image.at<cv::Vec3b>(h, w)[c];
            }
        }
    }
}

void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name);
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name);

int main(int argc, char** argv)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/face-detection-0202/FP32/face-detection-0202.xml";
    std::string binFilename = "D:/code/OpenVINO/face-detection-0202/FP32/face-detection-0202.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto curr_infer_request = executable_network.CreateInferRequestPtr();
    auto next_infer_request = executable_network.CreateInferRequestPtr();

    //6. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    cv::VideoCapture capture(0);
    cv::Mat curr_frame;
    cv::Mat next_frame;
    capture.read(curr_frame);
    frameToBlob(curr_frame, curr_infer_request, input_name);
    bool first_frame = true;
    bool last_frame = false;
    while (true)
    {
        bool ret = capture.read(next_frame);
        if (ret == false)//It's the last frame
        {
            last_frame = true;//No more settings entered
        }
        if (!last_frame)//The next frame is not set at the last frame
        {
            frameToBlob(next_frame, next_infer_request, input_name);
        }
        if (first_frame)
        {
            curr_infer_request->StartAsync();//Start reasoning
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else
        {
            if (!last_frame)//Each start is the next frame ready for asynchronous exchange
            {
                next_infer_request->StartAsync();
            }
        }
        //7. Loop execution reasoning, each reasoning is the current frame, and it will be parsed after the reasoning is successful
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY))
        {
            infer_process(curr_frame, curr_infer_request, output_name);
        }
        cv::imshow("src", curr_frame);
        char c = cv::waitKey(1);
        if (c == 27)//Press esc
            break;
        if (last_frame)
        {
            break;
        }
        // Asynchronous interaction
        next_frame.copyTo(curr_frame);
        curr_infer_request.swap(next_infer_request);//Function in pointer field
    }

    return 0;
}
// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request->GetBlob(input_name);//Get the blob of input (class object for input formatting)
    matU8ToBlob<uchar>(src, input);
    return;
}
// SSD MobileNetV2 model output reasoning function encapsulation
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name)
{
    //1. Executive reasoning
   //infer_ request->Infer();    Asynchronous reasoning does not need this step

    //2. Get our output
    auto output = infer_request->GetBlob(output_name);
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //3. Finally, you need to obtain the output dimension information and parse the data
    const SizeVector outputDims = output->getTensorDesc().getDims();
    //std::cout << outputDims[2] << "X" << outputDims[3] << std::endl;
    const int max_num = outputDims[2];//Is the output N
    const int object_size = outputDims[3];//It's the output 7
    int im_h = src.rows;
    int im_w = src.cols;
    for (int n = 0; n < max_num; n++)
    {
        float lable = detection_out[n * object_size + 1];// +1 indicates that the output is the second lableID of the seven
        float confidence = detection_out[n * object_size + 2];
        float xmin = detection_out[n * object_size + 3] * im_w;  //The floating point coordinates obtained from the output are 0-1, and the actual coordinates are multiplied by the original width and height
        float ymin = detection_out[n * object_size + 4] * im_h;
        float xmax = detection_out[n * object_size + 5] * im_w;
        float ymax = detection_out[n * object_size + 6] * im_h;
        if (confidence > 0.5)
        {
            //printf("lable id = %d\n", static_cast<int>(lable));
            cv::Rect box;
            box.x = static_cast<int>(xmin);
            box.y = static_cast<int>(ymin);
            box.width = static_cast<int>(xmax - xmin);
            box.height = static_cast<int>(ymax - ymin);
            cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8, 0);
            cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 1, 8);
        }
    }
}

5. Real time facial expression recognition

It mainly loads the model of facial expression recognition based on face detection to recognize five common facial expressions.

5.1 model introduction


Figure 28, 29

5.2 procedure execution process

After face detection, the ROI area of face detection is sent to the expression recognition model, and then the detection results are output.
be careful; Do some protection to ensure that the rectangle obtained by face detection is in the image, and make size judgment rather than coordinate judgment, otherwise the setting input will collapse during face expression detection.

5.3 code demonstration

Figure 31

Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
//Convert the U8 type data from mat to Blob. Because Blob is of any type, it is also declared as a template
template <typename T> void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0)
{
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
            << "but by fact we were not able to cast inputBlob to MemoryBlob";
    }
    // locked memory holder should be alive all time while access to its buffer happens
    auto mblobHolder = mblob->wmap();

    T* blob_data = mblobHolder.as<T*>();

    cv::Mat resized_image(orig_image);
    if (static_cast<int>(width) != orig_image.size().width ||
        static_cast<int>(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c < channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                blob_data[batchOffset + c * width * height + h * width + w] =
                    resized_image.at<cv::Vec3b>(h, w)[c];
            }
        }
    }
}

//Several values of expression
static const char* const items[] = {
    "neutral", "happy", "sad", "surprise", "anger"
};

//Load the model of facial expression recognition, outgoing reasoning and input / output name
void load_face_emotion_model(std::shared_ptr<InferenceEngine::InferRequest>& face_emotion_request, std::string& face_emotion_input_name, std::string& face_emotion_output_name);

//Facial expression detection setting input reasoning analysis output
std::string text_recognization_text(std::shared_ptr<InferenceEngine::InferRequest>& face_emotion_request, std::string& face_emotion_input_name, std::string& face_emotion_output_name, cv::Mat& image, cv::Rect& face_roi);

//After asynchronous reasoning of face detection, the results are processed
cv::Rect infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name);

//The input reasoning setting function of SSD MobileNetV2 model is to convert the input src into the corresponding format and set it to infer_ Blob of request
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name);

int main(int argc, char** argv)
{
    //Loading the model of facial expression recognition
    std::shared_ptr<InferenceEngine::InferRequest> face_emotion_request;
    std::string face_emotion_input_name;
    std::string face_emotion_output_name;
    load_face_emotion_model(face_emotion_request, face_emotion_input_name, face_emotion_output_name);

    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/face-detection-0202/FP32/face-detection-0202.xml";
    std::string binFilename = "D:/code/OpenVINO/face-detection-0202/FP32/face-detection-0202.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto curr_infer_request = executable_network.CreateInferRequestPtr();
    auto next_infer_request = executable_network.CreateInferRequestPtr();

    //6. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    cv::VideoCapture capture(0);
    cv::Mat curr_frame;
    cv::Mat next_frame;
    capture.read(curr_frame);
    frameToBlob(curr_frame, curr_infer_request, input_name);
    bool first_frame = true;
    bool last_frame = false;
    while (true)
    {
        bool ret = capture.read(next_frame);
        if (ret == false)//It's the last frame
        {
            last_frame = true;//No more settings entered
        }
        if (!last_frame)//The next frame is not set at the last frame
        {
            frameToBlob(next_frame, next_infer_request, input_name);
        }
        if (first_frame)
        {
            curr_infer_request->StartAsync();//Start reasoning
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else
        {
            if (!last_frame)//Each start is the next frame ready for asynchronous exchange
            {
                next_infer_request->StartAsync();
            }
        }
        //7. Loop execution reasoning, each reasoning is the current frame, and it will be parsed after the reasoning is successful
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY))
        {
            //The reasoning of face detection returns the face rectangle
            cv::Rect face_box = infer_process(curr_frame, curr_infer_request, output_name);
            //Set the expression recognition input for the returned face position rectangle, and output the reasoning analysis
            if(face_box.width > 64 && face_box.height > 64 && face_box.x > 0 && face_box.y > 0)//Protect, or it's easy to run away 
                text_recognization_text(face_emotion_request, face_emotion_input_name, face_emotion_output_name, curr_frame, face_box);
        }
        cv::imshow("src", curr_frame);
        char c = cv::waitKey(1);
        if (c == 27)//Press esc
            break;
        if (last_frame)
        {
            break;
        }
        // Asynchronous interaction
        next_frame.copyTo(curr_frame);
        curr_infer_request.swap(next_infer_request);//Function in pointer field
    }

    return 0;
}
// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request->GetBlob(input_name);//Get the blob of input (class object for input formatting)
    matU8ToBlob<uchar>(src, input);
    return;
}
// SSD MobileNetV2 model output reasoning function encapsulation
cv::Rect infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name)
{
    //1. Executive reasoning
   //infer_ request->Infer();    Asynchronous reasoning does not need this step
    cv::Rect box;
    //2. Get our output
    auto output = infer_request->GetBlob(output_name);
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //3. Finally, you need to obtain the output dimension information and parse the data
    const SizeVector outputDims = output->getTensorDesc().getDims();
    //std::cout << outputDims[2] << "X" << outputDims[3] << std::endl;
    const int max_num = outputDims[2];//Is the output N
    const int object_size = outputDims[3];//It's the output 7
    int im_h = src.rows;
    int im_w = src.cols;
    for (int n = 0; n < max_num; n++)
    {
        float lable = detection_out[n * object_size + 1];// +1 indicates that the output is the second lableID of the seven
        float confidence = detection_out[n * object_size + 2];
        float xmin = detection_out[n * object_size + 3] * im_w;  //The floating point coordinates obtained from the output are 0-1, and the actual coordinates are multiplied by the original width and height
        float ymin = detection_out[n * object_size + 4] * im_h;
        float xmax = detection_out[n * object_size + 5] * im_w;
        float ymax = detection_out[n * object_size + 6] * im_h;
        if (confidence > 0.9)
        {
            //printf("lable id = %d\n", static_cast<int>(lable));,
            //Do some protection to ensure that the size of the rectangle is judged in the image class rather than the coordinates
            xmin = std::min(std::max(0.0f, xmin), static_cast<float>(im_w));
            ymin = std::min(std::max(0.0f, ymin), static_cast<float>(im_h));
            xmax = std::min(std::max(0.0f, xmax), static_cast<float>(im_w));
            ymax = std::min(std::max(0.0f, ymax), static_cast<float>(im_h));
            box.x = static_cast<int>(xmin);
            box.y = static_cast<int>(ymin);
            box.width = static_cast<int>(xmax - xmin);
            box.height = static_cast<int>(ymax - ymin);
            //std::cout << box.x << " " << box.y << " " << box.width << " " << box.height;
            cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8, 0);
            return box;
            //cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 1, 8);
        }
    }
    return box;
}
void load_face_emotion_model(std::shared_ptr<InferenceEngine::InferRequest>& face_emotion_request, std::string& face_emotion_input_name, std::string& face_emotion_output_name)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/emotions-recognition-retail-0003/FP32/emotions-recognition-retail-0003.xml";
    std::string binFilename = "D:/code/OpenVINO/emotions-recognition-retail-0003/FP32/emotions-recognition-retail-0003.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)//Because this model has two inputs, it will cycle twice
    {
        face_emotion_input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR

    }
    for (auto item : outputs)
    {
        face_emotion_output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << face_emotion_output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    face_emotion_request = executable_network.CreateInferRequestPtr();
}

std::string text_recognization_text(std::shared_ptr<InferenceEngine::InferRequest>& face_emotion_request, std::string& face_emotion_input_name, std::string& face_emotion_output_name, cv::Mat& image, cv::Rect& face_roi)
{
    //Set input
    cv::Mat faceROI = image(face_roi);
    frameToBlob(faceROI, face_emotion_request, face_emotion_input_name);

    //Executive reasoning
    face_emotion_request->Infer();

    //Output string result return
    auto output = face_emotion_request->GetBlob(face_emotion_output_name);
    const float* probs = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Convert output data to Precision

    //10. Finally, we need to obtain the output dimension information, analyze the data and output the largest one, which is the result of resnet18 image recognition
    const SizeVector outputDims = output->getTensorDesc().getDims();

    float max = probs[0];
    int max_index = 0;
    for (int i = 1; i < outputDims[1]; i++)
    {
        if (max < probs[i])
        {
            max = probs[i];
            max_index = i;
        }
    }
    cv::putText(image, items[max_index], face_roi.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
    return items[max_index];
}

6. Face key point landmark detection

6.1 model introduction


Figure 32, 33

6.2 procedure execution process

Complete face detection, obtain ROI area, and directly perform landMark detection. Consistent with facial expression process

6.3 code demonstration

Figure 34

It is essential to check how many input structures are, what output structures are, how to parse them, how to call multiple networks together at the same time, and the use of synchronization and asynchrony.
Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
//Convert the U8 type data from mat to Blob. Because Blob is of any type, it is also declared as a template
template <typename T> void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0)
{
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
            << "but by fact we were not able to cast inputBlob to MemoryBlob";
    }
    // locked memory holder should be alive all time while access to its buffer happens
    auto mblobHolder = mblob->wmap();

    T* blob_data = mblobHolder.as<T*>();

    cv::Mat resized_image(orig_image);
    if (static_cast<int>(width) != orig_image.size().width ||
        static_cast<int>(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c < channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                blob_data[batchOffset + c * width * height + h * width + w] =
                    resized_image.at<cv::Vec3b>(h, w)[c];
            }
        }
    }
}

//Several values of expression
static const char* const items[] = {
    "neutral", "happy", "sad", "surprise", "anger"
};

//Load the model of facial expression recognition, outgoing reasoning and input / output name
void load_textRecognization_model(std::string xmlFilename, std::string binFilename, std::shared_ptr<InferenceEngine::InferRequest>& face_request, std::string& face_input_name, std::string& face_output_name);

//Facial expression detection setting input reasoning analysis output
void text_recognization_text(std::shared_ptr<InferenceEngine::InferRequest>& face_emotion_request, std::string& face_emotion_input_name, std::string& face_emotion_output_name, cv::Mat& image, cv::Rect& face_roi);

//After asynchronous reasoning of face detection, the results are processed
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name, std::vector<cv::Rect>& vc_RoiRect);

//The input reasoning setting function of SSD MobileNetV2 model is to convert the input src into the corresponding format and set it to infer_ Blob of request
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name);

int main(int argc, char** argv)
{
    //Loading the model of facial expression recognition
    std::shared_ptr<InferenceEngine::InferRequest> face_landmarks_request;
    std::string face_landmarks_input_name;
    std::string face_landmarks_output_name;
    std::string facial_landmarks_xmlFilename = "D:/code/OpenVINO/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.xml";
    std::string facial_landmarks_binFilename = "D:/code/OpenVINO/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.bin";
    load_textRecognization_model(facial_landmarks_xmlFilename, facial_landmarks_binFilename, face_landmarks_request, face_landmarks_input_name, face_landmarks_output_name);

    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/face-detection-0202/FP32/face-detection-0202.xml";
    std::string binFilename = "D:/code/OpenVINO/face-detection-0202/FP32/face-detection-0202.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto curr_infer_request = executable_network.CreateInferRequestPtr();
    auto next_infer_request = executable_network.CreateInferRequestPtr();

    //6. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    //Picture reading
#if 1
    cv::Mat curr_frame = cv::imread("D:/face_emotion.jpg");
    std::vector<cv::Rect> vc_RoiRect;
    frameToBlob(curr_frame, curr_infer_request, input_name);
    curr_infer_request->Infer();
    infer_process(curr_frame, curr_infer_request, output_name, vc_RoiRect);
    //Set the expression recognition input for the returned face position rectangle, and output the reasoning analysis
    for (int  i = 0; i < vc_RoiRect.size(); i++)
    {
        if (vc_RoiRect[i].width > 64 && vc_RoiRect[i].height > 64 && vc_RoiRect[i].x > 0 && vc_RoiRect[i].y > 0)//Protect, or it's easy to run away 
            text_recognization_text(face_landmarks_request, face_landmarks_input_name, face_landmarks_output_name, curr_frame, vc_RoiRect[i]);
    }
    cv::imshow("src", curr_frame);
    char c = cv::waitKey(0);
#endif
#if 0
    //Video reading
    cv::VideoCapture capture(0);
    cv::Mat curr_frame;
    cv::Mat next_frame;
    capture.read(curr_frame);
    frameToBlob(curr_frame, curr_infer_request, input_name);
    bool first_frame = true;
    bool last_frame = false;
    std::vector<cv::Rect> vc_RoiRect;
    while (true)
    {
        vc_RoiRect.clear();
        bool ret = capture.read(next_frame);
        if (ret == false)//It's the last frame
        {
            last_frame = true;//No more settings entered
        }
        if (!last_frame)//The next frame is not set at the last frame
        {
            frameToBlob(next_frame, next_infer_request, input_name);
        }
        if (first_frame)
        {
            curr_infer_request->StartAsync();//Start reasoning
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else
        {
            if (!last_frame)//Each start is the next frame ready for asynchronous exchange
            {
                next_infer_request->StartAsync();
            }
        }
        //7. Loop execution reasoning, each reasoning is the current frame, and it will be parsed after the reasoning is successful
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY))
        {
            //The reasoning of face detection returns the face rectangle
            infer_process(curr_frame, curr_infer_request, output_name, vc_RoiRect);
            //Set the expression recognition input for the returned face position rectangle, and output the reasoning analysis
            for (int  i = 0; i < vc_RoiRect.size(); i++)
            {
                if (vc_RoiRect[i].width > 64 && vc_RoiRect[i].height > 64 && vc_RoiRect[i].x > 0 && vc_RoiRect[i].y > 0)//Protect, or it's easy to run away 
                    face_lanmarks_text(face_landmarks_request, face_landmarks_input_name, face_landmarks_output_name, curr_frame, vc_RoiRect[i]);
            }
        }
        cv::imshow("src", curr_frame);
        char c = cv::waitKey(1);
        if (c == 27)//Press esc
            break;
        if (last_frame)
        {
            break;
        }
        // Asynchronous interaction
        next_frame.copyTo(curr_frame);
        curr_infer_request.swap(next_infer_request);//Function in pointer field
    }
#endif
    return 0;
}

// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request->GetBlob(input_name);//Get the blob of input (class object for input formatting)
    matU8ToBlob<uchar>(src, input);
    return;
}
// SSD MobileNetV2 model output reasoning function encapsulation
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name, std::vector<cv::Rect> &vc_RoiRect)
{
    //1. Executive reasoning
   //infer_ request->Infer();    Asynchronous reasoning does not need this step
    cv::Rect box;
    //2. Get our output
    auto output = infer_request->GetBlob(output_name);
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //3. Finally, you need to obtain the output dimension information and parse the data
    const SizeVector outputDims = output->getTensorDesc().getDims();
    //std::cout << outputDims[2] << "X" << outputDims[3] << std::endl;
    const int max_num = outputDims[2];//Is the output N
    const int object_size = outputDims[3];//It's the output 7
    int im_h = src.rows;
    int im_w = src.cols;
    for (int n = 0; n < max_num; n++)
    {
        float lable = detection_out[n * object_size + 1];// +1 indicates that the output is the second lableID of the seven
        float confidence = detection_out[n * object_size + 2];
        float xmin = detection_out[n * object_size + 3] * im_w;  //The floating point coordinates obtained from the output are 0-1, and the actual coordinates are multiplied by the original width and height
        float ymin = detection_out[n * object_size + 4] * im_h;
        float xmax = detection_out[n * object_size + 5] * im_w;
        float ymax = detection_out[n * object_size + 6] * im_h;
        if (confidence > 0.95)
        {
            //printf("lable id = %d\n", static_cast<int>(lable));,
            //Do some protection to ensure that the size of the rectangle is judged in the image class rather than the coordinates
            xmin = std::min(std::max(0.0f, xmin), static_cast<float>(im_w));
            ymin = std::min(std::max(0.0f, ymin), static_cast<float>(im_h));
            xmax = std::min(std::max(0.0f, xmax), static_cast<float>(im_w));
            ymax = std::min(std::max(0.0f, ymax), static_cast<float>(im_h));
            box.x = static_cast<int>(xmin);
            box.y = static_cast<int>(ymin);
            box.width = static_cast<int>(xmax - xmin);
            box.height = static_cast<int>(ymax - ymin);
            //std::cout << box.x << " " << box.y << " " << box.width << " " << box.height;
            cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8, 0);
            vc_RoiRect.push_back(box);
            //cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 1, 8);
        }
    }
    return ;
}

void load_textRecognization_model(std::string xmlFilename, std::string binFilename, std::shared_ptr<InferenceEngine::InferRequest>& face_emotion_request, std::string& face_emotion_input_name, std::string& face_emotion_output_name)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    /*std::string xmlFilename = "D:/code/OpenVINO/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.xml";
    std::string binFilename = "D:/code/OpenVINO/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.bin";*/
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)//Because this model has two inputs, it will cycle twice
    {
        face_emotion_input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR

    }
    for (auto item : outputs)
    {
        face_emotion_output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << face_emotion_output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    face_emotion_request = executable_network.CreateInferRequestPtr();
}

void text_recognization_text(std::shared_ptr<InferenceEngine::InferRequest>& face_lanmarks_request, std::string& face_lanmarks_input_name, std::string& face_lanmarks_output_name, cv::Mat& image, cv::Rect& face_roi)
{
    //Set input
    cv::Mat faceROI = image(face_roi);
    frameToBlob(faceROI, face_lanmarks_request, face_lanmarks_input_name);

    //Executive reasoning
    face_lanmarks_request->Infer();

    //Output string result return
    auto output = face_lanmarks_request->GetBlob(face_lanmarks_output_name);
    const float* probs = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Convert output data to Precision

    //10. Finally, it is necessary to obtain the output dimension information and analyze the data. The coordinates obtained here are equivalent to the coordinates of the face, and the last drawing point is the coordinates of the picture
    const SizeVector outputDims = output->getTensorDesc().getDims();//
    int i_width = face_roi.width;
    int i_height = face_roi.height;
    for (int i = 0; i < outputDims[1]; i+=2)
    {
        float x = probs[i] * i_width + face_roi.x;
        float y = probs[i + 1] * i_height + face_roi.y;
        cv::circle(image, cv::Point(x, y), 3, cv::Scalar(255, 0, 0), 2, 8, 0);//Draw circle
    }
    //cv::putText(image, items[max_index], face_roi.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
    return ;
}

7. Real time semantic road segmentation model

OpenVINO also supports the semantic road segmentation model, that is, the semantic road segmentation model, which divides the road into four parts: background, road, roadside and marking line (lane line, etc.).

7.1 model introduction


Figure 35, 36

7.2 procedure flow

Figure 37

7.3 code demonstration

The key is how to analyze the multi-channel output in the segmented network

Figure 38

Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
//Convert the U8 type data from mat to Blob. Because Blob is of any type, it is also declared as a template
template <typename T> void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0)
{
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
            << "but by fact we were not able to cast inputBlob to MemoryBlob";
    }
    // locked memory holder should be alive all time while access to its buffer happens
    auto mblobHolder = mblob->wmap();

    T* blob_data = mblobHolder.as<T*>();

    cv::Mat resized_image(orig_image);
    if (static_cast<int>(width) != orig_image.size().width ||
        static_cast<int>(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c < channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                blob_data[batchOffset + c * width * height + h * width + w] =
                    resized_image.at<cv::Vec3b>(h, w)[c];
            }
        }
    }
}
// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name);
// SSD MobileNetV2 model output reasoning function encapsulation
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name);
std::vector<cv::Vec3b> color_tab;
int main(int argc, char** argv)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/road-segmentation-adas-0001/FP32/road-segmentation-adas-0001.xml";
    std::string binFilename = "D:/code/OpenVINO/road-segmentation-adas-0001/FP32/road-segmentation-adas-0001.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto curr_infer_request = executable_network.CreateInferRequestPtr();
    auto next_infer_request = executable_network.CreateInferRequestPtr();

    //6. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    cv::VideoCapture capture("D:/lane.avi");
    cv::Mat curr_frame;
    cv::Mat next_frame;
    capture.read(curr_frame);
    //Set inference input
    frameToBlob(curr_frame, curr_infer_request, input_name);
    bool first_frame = true;
    bool last_frame = false;
    
    color_tab.push_back(cv::Vec3b(0, 0, 0));//Corresponding BG
    color_tab.push_back(cv::Vec3b(255, 0, 0));//Corresponding road
    color_tab.push_back(cv::Vec3b(0, 0, 255));//Corresponding curb
    color_tab.push_back(cv::Vec3b(0, 255, 255));//Corresponding mark
    while (true)
    {
        bool ret = capture.read(next_frame);
        if (ret == false)//It's the last frame
        {
            last_frame = true;//No more settings entered
        }
        if (!last_frame)//The next frame is not set at the last frame
        {
            frameToBlob(next_frame, next_infer_request, input_name);
        }
        if (first_frame)
        {
            curr_infer_request->StartAsync();//Start reasoning
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else
        {
            if (!last_frame)//Each start is the next frame ready for asynchronous exchange
            {
                next_infer_request->StartAsync();
            }
        }
        //7. Loop execution reasoning, each reasoning is the current frame, and it will be parsed after the reasoning is successful
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY))
        {
            infer_process(curr_frame, curr_infer_request, output_name);
        }
        cv::imshow("Asynchronous display of road segmentation", curr_frame);
        char c = cv::waitKey(1);
        if (c == 27)//Press esc
            break;
        if (last_frame)
        {
            break;
        }
        // Asynchronous interaction
        next_frame.copyTo(curr_frame);
        curr_infer_request.swap(next_infer_request);//Function in pointer field
    }

    return 0;
}

// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request->GetBlob(input_name);//Get the blob of input (class object for input formatting)
    matU8ToBlob<uchar>(src, input);
    return;
}

// SSD MobileNetV2 model output reasoning function encapsulation
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name)
{
    //1. Executive reasoning
   //infer_ request->Infer();    Asynchronous reasoning does not need this step

    //2. Get our output
    auto output = infer_request->GetBlob(output_name);
    
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //3. Finally, you need to obtain the output dimension information and parse the data
    //detection_out is a set of 4*w*h values. We need to analyze the maximum of the four values of each corresponding pixel of h*w and keep it
    const SizeVector outputDims = output->getTensorDesc().getDims();
    const int out_c = outputDims[1];//Four types of values per pixel
    const int out_h = outputDims[2];
    const int out_w = outputDims[3];
    cv::Mat result = cv::Mat::zeros(cv::Size(out_w, out_h), CV_8UC3);//Storage results
    int step = out_h * out_w;
    for (int row = 0; row < out_h; row++)
    {
        for (int col = 0; col < out_w; col++)
        {
            int max_index = 0;
            float max_pord = detection_out[row * out_w + col];//The weight of the 0th category of the pixel
            for (int cn = 1; cn < out_c; cn++)
            {
                float pord = detection_out[cn * step + row * out_w + col];//The weight of the ith category of the pixel
                if (pord > max_pord)
                {
                    max_index = cn;
                    max_pord = pord;
                }
            }
            result.at<cv::Vec3b>(row, col) = color_tab[max_index];//Sets the color value of pixels
        }
    }
    cv::resize(result, result, cv::Size(src.cols, src.rows));//Size conversion for easy stacking
    cv::addWeighted(src, 0.5, result, 0.5, 0, src);//superposition
    return;
}

8. Instance segmentation

Our most common example segmentation model is the Mask R-CNN model, and openVINO also supports a variety of Mask R-CNN at different levels and resolutions.

8.1 instance segmentation model


Figure 39 40

8.2 procedure execution process

The key to the process is to process the output, especially to parse raw_ The mask matrix attribute of masks needs to be conducive to the conversion and superposition of the relevant knowledge of opencv.

8.3 code demonstration

Figure 41

Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
//Convert the U8 type data from mat to Blob. Because Blob is of any type, it is also declared as a template
template <typename T> void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0)
{
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
            << "but by fact we were not able to cast inputBlob to MemoryBlob";
    }
    // locked memory holder should be alive all time while access to its buffer happens
    auto mblobHolder = mblob->wmap();

    T* blob_data = mblobHolder.as<T*>();

    cv::Mat resized_image(orig_image);
    if (static_cast<int>(width) != orig_image.size().width ||
        static_cast<int>(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c < channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                blob_data[batchOffset + c * width * height + h * width + w] =
                    resized_image.at<cv::Vec3b>(h, w)[c];
            }
        }
    }
}
void read_coco_labels(std::vector<std::string>& labels) {
    std::string label_file = "coco_labels.txt";
    std::ifstream fp(label_file);
    if (!fp.is_open())
    {
        printf("could not open file...\n");
        exit(-1);
    }
    std::string name;
    while (!fp.eof())
    {
        std::getline(fp, name);
        if (name.length())
            labels.push_back(name);
    }
    fp.close();
}
// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request);
// SSD MobileNetV2 model output reasoning function encapsulation
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name);
std::vector<std::string> coco_labels;
int main(int argc, char** argv)
{
    
    read_coco_labels(coco_labels);
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/instance-segmentation-security-0050/FP32/instance-segmentation-security-0050.xml";
    std::string binFilename = "D:/code/OpenVINO/instance-segmentation-security-0050/FP32/instance-segmentation-security-0050.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string info_name = "";
    
    int in_index = 0;

    for (auto item : inputs)
    {
        if (in_index == 0)
        {
            input_name = item.first;
            auto input_data = item.second;
            input_data->setPrecision(Precision::U8);
            input_data->setLayout(Layout::NCHW);
            input_data->getPreProcess().setColorFormat(ColorFormat::BGR);
            std::cout << "input name = " << input_name << std::endl;
        }
        else
        {
            info_name = item.first;
            auto input_data = item.second;
            input_data->setPrecision(Precision::FP32);
            std::cout << "info_name name = " << info_name << std::endl;
        }
        in_index++;
    }
    for (auto item : outputs)//Because the name of the model is accurate, you can directly specify the name according to the official document when accessing
    {
        std::string output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }

    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto curr_infer_request = executable_network.CreateInferRequestPtr();

    //6. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    cv::Mat curr_frame = cv::imread("D:/objects.jpg");
    std::vector<cv::Rect> vc_RoiRect;
    frameToBlob(curr_frame, curr_infer_request, input_name);//Set first input
    //Set second input
    auto input2 = curr_infer_request->GetBlob(info_name);
    auto imInfoDim = inputs.find(info_name)->second->getTensorDesc().getDims()[1];
    InferenceEngine::MemoryBlob::Ptr minput2 = InferenceEngine::as<InferenceEngine::MemoryBlob>(input2);
    auto minput2Holder = minput2->wmap();
    float* p = minput2Holder.as<InferenceEngine::PrecisionTrait<InferenceEngine::Precision::FP32>::value_type*>();
    p[0] = static_cast<float>(inputs[input_name]->getTensorDesc().getDims()[2]);
    p[1] = static_cast<float>(inputs[input_name]->getTensorDesc().getDims()[3]);
    p[2] = 1.0f;
    std::cout << p[0] << "  " << p[1] << std::endl;

    infer_process(curr_frame, curr_infer_request);
    cv::imshow("src", curr_frame);
    char c = cv::waitKey(0);


    return 0;
}

// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request->GetBlob(input_name);//Get the blob of input (class object for input formatting)
    matU8ToBlob<uchar>(src, input);
    return;
}

// SSD MobileNetV2 model output reasoning function encapsulation
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request)
{
    //1. Executive reasoning
   infer_request->Infer();   //Asynchronous reasoning does not need this step

    //2. Get our output
    auto scores = infer_request->GetBlob("scores");
    auto boxes = infer_request->GetBlob("boxes");
    auto clazzes = infer_request->GetBlob("classes");
    auto raw_masks = infer_request->GetBlob("raw_masks");
    const float* scores_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(scores->buffer());//Output result of detection
    const float* boxes_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(boxes->buffer());// The output coordinate size of this rectangle is based on 480 * 480
    const float* clazzes_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(clazzes->buffer());
    const auto raw_masks_data = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(raw_masks->buffer());//The output is raw_masks structure
    //3. Finally, you need to obtain the output dimension information and parse the data
    const SizeVector scores_outputDims = scores->getTensorDesc().getDims();
    const SizeVector boxes_outputDims = boxes->getTensorDesc().getDims();
    const SizeVector mask_outputDims = raw_masks->getTensorDesc().getDims();

    const int max_scoresSize = scores_outputDims[0];//100
    const int max_boxesSize = boxes_outputDims[1];//4
    int mask_h = mask_outputDims[2];//raw_ The width and height of masks are inconsistent with the official documents. Now it is 14 * 14
    int mask_w = mask_outputDims[3];
    size_t box_stride = mask_h * mask_w * mask_outputDims[1];//Represents the step distance between each instance 81 * 14 * 14. Each instance between these 100 instances is separated by so many bytes
    int im_w = src.cols;
    int im_h = src.rows;
    float w_rate = static_cast<float>(im_w) / 480.0;
    float h_rate = static_cast<float>(im_h) / 480.0;
    cv::RNG rng(12345);
    for (int  i = 0; i < max_scoresSize; i++)
    {
        float confidence = scores_out[i];
        float xmin = boxes_out[i * max_boxesSize] * w_rate;  //The output coordinate size of this rectangle is based on 480 * 480
        float ymin = boxes_out[i * max_boxesSize + 1] * h_rate;
        float xmax = boxes_out[i * max_boxesSize + 2] * w_rate;
        float ymax = boxes_out[i * max_boxesSize + 3] * h_rate;
        if (confidence > 0.5)
        {
            cv::Scalar color(rng.uniform(0, 255), rng.uniform(0, 255), rng.uniform(0, 255));//Get a random color
            cv::Rect box;
            float x1 = std::min(std::max(0.0f, xmin), static_cast<float>(im_w));
            float y1 = std::min(std::max(0.0f, ymin), static_cast<float>(im_h));
            float x2 = std::min(std::max(0.0f, xmax), static_cast<float>(im_w));
            float y2 = std::min(std::max(0.0f, ymax), static_cast<float>(im_h));
            box.x = static_cast<int>(x1);
            box.y = static_cast<int>(y1);
            box.width = static_cast<int>(x2 - x1);
            box.height = static_cast<int>(y2 - y1);
            int label = static_cast<int>(clazzes_data[i]);//Where is the category to output this instance
            // Parse the ith instance of the mask (box_stripe * i) + the first category of 81 categories. The step of each category is width * height (mask_h * mask_w * label)
            float* mask_arr = raw_masks_data + box_stride * i + mask_h * mask_w * label;//Find the address corresponding to this category of this instance (size is 14 * 14)
            cv::Mat mask_mat(mask_h, mask_w, CV_32FC1, mask_arr);//Create data in mat with size of 14 * 14
            cv::Mat roi_img = src(box);
            cv::Mat resized_mask_mat(box.height, box.width, CV_32FC1);
            cv::resize(mask_mat, resized_mask_mat, cv::Size(box.width, box.height));//Size conversion

            cv::Mat uchar_resized_mask(box.height, box.width, CV_8UC3, color);

            roi_img.copyTo(uchar_resized_mask, resized_mask_mat <= 0.5);//Set roi_img pixel corresponding to resized_ mask_ Copy mat 1 to uchar_resized_mask up
            cv::addWeighted(uchar_resized_mask, 0.7, roi_img, 0.3, 0.0f, roi_img);//Perform mat stack uchar_resized_mask accounts for 0.7, roi_img ratio 0.3 output to roi_img
            cv::putText(src, coco_labels[label].c_str(), box.tl() + (box.br() - box.tl()) / 2, cv::FONT_HERSHEY_PLAIN, 1.0, cv::Scalar(0, 0, 255), 1, 8);
        }
    }

    return;
}

9. Scene text detection

9.1 model introduction


Figure 42 43

9.2 procedure execution steps

Loading mode
Set input
Parse output

9.3 code demonstration

Figure 44

Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
//Convert the U8 type data from mat to Blob. Because Blob is of any type, it is also declared as a template
template <typename T> void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0)
{
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
            << "but by fact we were not able to cast inputBlob to MemoryBlob";
    }
    // locked memory holder should be alive all time while access to its buffer happens
    auto mblobHolder = mblob->wmap();

    T* blob_data = mblobHolder.as<T*>();

    cv::Mat resized_image(orig_image);
    if (static_cast<int>(width) != orig_image.size().width ||
        static_cast<int>(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c < channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                blob_data[batchOffset + c * width * height + h * width + w] =
                    resized_image.at<cv::Vec3b>(h, w)[c];
            }
        }
    }
}
// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name);
// SSD MobileNetV2 model output reasoning function encapsulation
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name);
std::vector<cv::Vec3b> color_tab;
int main(int argc, char** argv)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/text-detection-0003/FP32/text-detection-0003.xml";
    std::string binFilename = "D:/code/OpenVINO/text-detection-0003/FP32/text-detection-0003.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name1 = "";
    std::string output_name2 = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    int out_index = 0;
    for (auto item : outputs)
    {
        if (out_index == 1)
        {
            output_name2 = item.first;
            auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
            output_data->setPrecision(Precision::FP32);//Output or floating point output
            std::cout << "output name = " << output_name1 << std::endl;
        }
        else
        {
            output_name1 = item.first;
            auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
            output_data->setPrecision(Precision::FP32);//Output or floating point output
            std::cout << "output name = " << output_name1 << std::endl;
        }
        out_index++;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto curr_infer_request = executable_network.CreateInferRequestPtr();
    auto next_infer_request = executable_network.CreateInferRequestPtr();

    //6. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
#if 1
    cv::Mat curr_frame = cv::imread("D:/openvino_ocr.jpg");
    std::vector<cv::Rect> vc_RoiRect;
    frameToBlob(curr_frame, curr_infer_request, input_name);
    curr_infer_request->Infer();
    infer_process(curr_frame, curr_infer_request, output_name2);//For the time being, only the second output is needed to accurately locate it
    //cv::imshow("src", curr_frame);
    char c = cv::waitKey(0);

#endif 


#if 0
    cv::VideoCapture capture("D:/lane.avi");
    cv::Mat curr_frame;
    cv::Mat next_frame;
    capture.read(curr_frame);
    //Set inference input
    frameToBlob(curr_frame, curr_infer_request, input_name);
    bool first_frame = true;
    bool last_frame = false;

    color_tab.push_back(cv::Vec3b(0, 0, 0));//Corresponding BG
    color_tab.push_back(cv::Vec3b(255, 0, 0));//Corresponding road
    color_tab.push_back(cv::Vec3b(0, 0, 255));//Corresponding curb
    color_tab.push_back(cv::Vec3b(0, 255, 255));//Corresponding mark
    while (true)
    {
        bool ret = capture.read(next_frame);
        if (ret == false)//It's the last frame
        {
            last_frame = true;//No more settings entered
        }
        if (!last_frame)//The next frame is not set at the last frame
        {
            frameToBlob(next_frame, next_infer_request, input_name);
        }
        if (first_frame)
        {
            curr_infer_request->StartAsync();//Start reasoning
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else
        {
            if (!last_frame)//Each start is the next frame ready for asynchronous exchange
            {
                next_infer_request->StartAsync();
            }
        }
        //7. Loop execution reasoning, each reasoning is the current frame, and it will be parsed after the reasoning is successful
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY))
        {
            infer_process(curr_frame, curr_infer_request, output_name);
        }
        cv::imshow("Asynchronous display of road segmentation", curr_frame);
        char c = cv::waitKey(1);
        if (c == 27)//Press esc
            break;
        if (last_frame)
        {
            break;
        }
        // Asynchronous interaction
        next_frame.copyTo(curr_frame);
        curr_infer_request.swap(next_infer_request);//Function in pointer field
    }
#endif

    return 0;
}

// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request->GetBlob(input_name);//Get the blob of input (class object for input formatting)
    matU8ToBlob<uchar>(src, input);
    return;
}

// SSD MobileNetV2 model output reasoning function encapsulation
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name)
{
    //1. Executive reasoning
   //infer_ request->Infer();    Asynchronous reasoning does not need this step

    //2. Get our output
    auto output = infer_request->GetBlob(output_name);
    //The output value is a floating-point number between 0 and 1, indicating the confidence of this category. We can select the category with high confidence
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //3. Finally, you need to obtain the output dimension information and parse the data
    //detection_out is a set of 2*w*h values. We want to analyze that the 2 categories of each corresponding pixel of h*w are text and non text
    const SizeVector outputDims = output->getTensorDesc().getDims();
    const int out_c = outputDims[1];//2 types of values per pixel
    const int out_h = outputDims[2];
    const int out_w = outputDims[3];
    cv::Mat mask = cv::Mat::zeros(cv::Size(out_w, out_h), CV_32F);//Storing results 0-1
    int step = out_h * out_w;
    for (int row = 0; row < out_h; row++)
    {
        for (int col = 0; col < out_w; col++)
        {
            //The text detection model has only two categories
            float p1 = detection_out[row * out_w + col];//The probability value of the first category of the pixel
            float p2 = detection_out[step + row * out_w + col];//The address of the second category possibility value of the pixel is text
            if (p1 < p2)
            {
                mask.at<float>(row, col) = p2;
            }
        }
    }
    cv::resize(mask, mask, cv::Size(src.cols, src.rows));//Size conversion for easy stacking
    //Become a binary image
    mask = mask * 255;//The value multiplied by 255 changes from 0-1 to 0-255
    mask.convertTo(mask, CV_8U);//Data type conversion
    cv::threshold(mask, mask, 200, 255, cv::THRESH_BINARY);
    //Draw outline
    std::vector<std::vector<cv::Point>> contours;
    cv::findContours(mask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
    for (int i = 0; i < contours.size(); i++)
    {
        cv::Rect box = cv::boundingRect(contours[i]);//Gets the bounding rectangle of the profile
        cv::rectangle(src, box, cv::Scalar(255, 0, 0), 2, 8, 0);
    }
    cv::imshow("mask",mask);
    cv::imshow("Scene text detection", src);
   // cv::addWeighted(src, 0.5, mask, 0.5, 0, src);// superposition
    return;
}

10. Character recognition

10.1 model introduction

Figure 45

10.2 procedure execution steps

After text detection, return the text area to the text recognition model as input for recognition

10.3 code demonstration

Figure 46

Code practice

#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <fstream>

using namespace InferenceEngine;
//Convert the U8 type data from mat to Blob. Because Blob is of any type, it is also declared as a template
template <typename T> void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0)
{
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
            << "but by fact we were not able to cast inputBlob to MemoryBlob";
    }
    // locked memory holder should be alive all time while access to its buffer happens
    auto mblobHolder = mblob->wmap();

    T* blob_data = mblobHolder.as<T*>();

    cv::Mat resized_image(orig_image);
    if (static_cast<int>(width) != orig_image.size().width ||
        static_cast<int>(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c < channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                blob_data[batchOffset + c * width * height + h * width + w] =
                    resized_image.at<cv::Vec3b>(h, w)[c];
            }
        }
    }
}

//CTC algorithm for character recognition
std::string alphabet = "0123456789abcdefghijklmnopqrstuvwxyz#";
std::string ctc_decode(const float* blob_out, int seq_w, int seq_l) {
    printf("seq width: %d, seq length: %d \n", seq_w, seq_l);
    std::string res = "";
    bool prev_pad = false;
    const int num_classes = alphabet.length();
    int seq_len = seq_w * seq_l;
    for (int i = 0; i < seq_w; i++) {
        int argmax = 0;
        int max_prob = blob_out[i * seq_l];
        for (int j = 0; j < num_classes; j++) {
            if (blob_out[i * seq_l + j] > max_prob) {//seq_w instances seq_l category: take the one with the largest confidence ratio of 37 categories for each instance as the result
                max_prob = blob_out[i * seq_l + j];
                argmax = j;
            }
        }
        auto symbol = alphabet[argmax];
        if (symbol == '#') {
            prev_pad = true;
        }
        else {
            if (res.empty() || prev_pad || (!res.empty() && symbol != res.back())) {
                prev_pad = false;
                res += symbol;
            }
        }
    }
    return res;
}

//Load the model of character recognition, outgoing reasoning and input / output names
void load_textRecognization_model(std::string xmlFilename, std::string binFilename, std::shared_ptr<InferenceEngine::InferRequest>& request, std::string& input_name, std::string& output_name);

//Character recognition setting input reasoning analysis output
void text_recognization_text(std::shared_ptr<InferenceEngine::InferRequest>& request, std::string& input_name, std::string& output_name, cv::Mat& image, cv::Rect& face_roi);

//After asynchronous reasoning of character recognition, the results are processed
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name, std::vector<cv::Rect>& vc_RoiRect);

//The input reasoning setting function of SSD MobileNetV2 model is to convert the input src into the corresponding format and set it to infer_ Blob of request
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name);

int main(int argc, char** argv)
{
    //Load the model of character recognition
    std::shared_ptr<InferenceEngine::InferRequest> text_detection_request;
    std::string text_detection_input_name;
    std::string text_detection_output_name;
    std::string text_detection_xmlFilename = "D:/code/OpenVINO/text-recognition-0012/FP32/text-recognition-0012.xml";
    std::string text_detection_binFilename = "D:/code/OpenVINO/text-recognition-0012/FP32/text-recognition-0012.bin";
    load_textRecognization_model(text_detection_xmlFilename, text_detection_binFilename, text_detection_request, text_detection_input_name, text_detection_output_name);

    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    std::string xmlFilename = "D:/code/OpenVINO/text-detection-0003/FP32/text-detection-0003.xml";
    std::string binFilename = "D:/code/OpenVINO/text-detection-0003/FP32/text-detection-0003.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    std::string input_name = "";
    std::string output_name1 = "";
    std::string output_name2 = "";
    for (auto item : inputs)
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR
        std::cout << "input name = " << input_name << std::endl;
    }
    int out_index = 0;
    for (auto item : outputs)
    {
        if (out_index == 1)
        {
            output_name2 = item.first;
            auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
            output_data->setPrecision(Precision::FP32);//Output or floating point output
            std::cout << "output name = " << output_name1 << std::endl;
        }
        else
        {
            output_name1 = item.first;
            auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
            output_data->setPrecision(Precision::FP32);//Output or floating point output
            std::cout << "output name = " << output_name1 << std::endl;
        }
        out_index++;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    auto curr_infer_request = executable_network.CreateInferRequestPtr();

    //6. Preprocessing of input image data (including BGR - > RGB, size floating point calculation conversion, image sequence conversion HWC - > nchw)
    //Picture reading
    cv::Mat curr_frame = cv::imread("D:/openvino_ocr.jpg");
    std::vector<cv::Rect> vc_RoiRect;
    frameToBlob(curr_frame, curr_infer_request, input_name);
    curr_infer_request->Infer();
    infer_process(curr_frame, curr_infer_request, output_name2, vc_RoiRect);
    //Conduct character recognition setting input for the rectangular box of the returned text area, and conduct reasoning analysis output
    for (int i = 0; i < vc_RoiRect.size(); i++)
    {
        //If (vc_roirect [i]. Width > 32 & & vc_roirect [i]. Height > 120 & & vc_roirect [i]. X > 0 & & vc_roirect [i]. Y > 0) / / protection is required, otherwise it is easy to run away 
            text_recognization_text(text_detection_request, text_detection_input_name, text_detection_output_name, curr_frame, vc_RoiRect[i]);
    }
    cv::imshow("src", curr_frame);
    char c = cv::waitKey(0);

    return 0;
}

// SSD MobileNetV2 model input reasoning setting function encapsulation because the infer needs to be modified_ All requests still need to pass pointers, smart pointers
void frameToBlob(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& input_name)
{
    //1. Through the incoming reasoning engine; Gets the input Blob format conversion class object
    auto input = infer_request->GetBlob(input_name);//Get the blob of input (class object for input formatting)
    matU8ToBlob<uchar>(src, input);
    return;
}

// SSD MobileNetV2 model output reasoning function encapsulation
void infer_process(cv::Mat& src, std::shared_ptr<InferenceEngine::InferRequest>& infer_request, std::string& output_name, std::vector<cv::Rect>& vc_RoiRect)
{
    //1. Executive reasoning
   //infer_ request->Infer();    Asynchronous reasoning does not need this step
    cv::Rect box;
    //2. Get our output
    auto output = infer_request->GetBlob(output_name);
    const float* detection_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output->buffer());//Output result of detection

    //3. Finally, you need to obtain the output dimension information and parse the data
    //detection_out is a set of 2*w*h values. We want to analyze that the 2 categories of each corresponding pixel of h*w are text and non text
    const SizeVector outputDims = output->getTensorDesc().getDims();
    const int out_c = outputDims[1];//2 types of values per pixel
    const int out_h = outputDims[2];
    const int out_w = outputDims[3];
    cv::Mat mask = cv::Mat::zeros(cv::Size(out_w, out_h), CV_8U);//Storing results 0-1
    int step = out_h * out_w;
    for (int row = 0; row < out_h; row++)
    {
        for (int col = 0; col < out_w; col++)
        {
            //The text detection model has only two categories
            float p1 = detection_out[row * out_w + col];//The probability value of the first category of the pixel
            float p2 = detection_out[step + row * out_w + col];//The address of the second category possibility value of the pixel is text
            if (p2 > 1.0) {
                mask.at<uchar>(row, col) = 255;//0 - 255 is stored between to mask the text area
            }
        }
    }
    cv::resize(mask, mask, cv::Size(src.cols, src.rows));//Size conversion for easy stacking
    //Become a binary image
    //mask = mask * 255;// The value multiplied by 255 changes from 0-1 to 0-255
    //mask.convertTo(mask, CV_8U);// Data type conversion
    //cv::threshold(mask, mask, 200, 255, cv::THRESH_BINARY);
    //Draw outline
    std::vector<std::vector<cv::Point>> contours;
    cv::findContours(mask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
    for (int i = 0; i < contours.size(); i++)
    {
        cv::Rect box = cv::boundingRect(contours[i]);//Gets the bounding rectangle of the profile
        box.x = box.x - 4;
        box.y = box.y - 4;
        box.width = box.width + 8;
        box.height = box.height + 8;
        cv::rectangle(src, box, cv::Scalar(255, 0, 0), 2, 8, 0);
        vc_RoiRect.push_back(box);
    }
    cv::imshow("mask", mask);
    return;
}

void load_textRecognization_model(std::string xmlFilename, std::string binFilename, std::shared_ptr<InferenceEngine::InferRequest>& request, std::string& input_name, std::string& output_name)
{
    //1. Initialize Core ie
    InferenceEngine::Core ie;//The core class of IE is actually ie

    //2. ie.ReadNetwork needs two files in the CNN network model to read resnet18 bin resnet18. xml
    /*std::string xmlFilename = "D:/code/OpenVINO/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.xml";
    std::string binFilename = "D:/code/OpenVINO/facial-landmarks-35-adas-0002/FP32/facial-landmarks-35-adas-0002.bin";*/
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xmlFilename, binFilename);//After reading and loading the CNN network, the network structure will be automatically parsed, and then the input and output can be obtained

    //3. Gets the input and output formats and sets the precision
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();//InputsDataMap is essentially an array of vector s. If you have multiple inputs, it corresponds to each
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();
    /*std::string input_name = "";
    std::string output_name = "";*/
    for (auto item : inputs)//Because this model has two inputs, it will cycle twice
    {
        input_name = item.first;
        auto input_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        input_data->setPrecision(Precision::U8);//The input image data format is unsigned char, which is the precision of 8 bits
        //This is also the input data mode set according to the model
        input_data->setLayout(Layout::NCHW);
        //ColorFormat is also input according to the model settings
        input_data->getPreProcess().setColorFormat(ColorFormat::BGR);//The model specifies that the input picture is BGR

    }
    for (auto item : outputs)
    {
        output_name = item.first;
        auto output_data = item.second;//It is a data structure. Auto auto inference of C++11 is temporarily used. Because there is only one input, it is temporarily defined in this way to set its accuracy
        output_data->setPrecision(Precision::FP32);//Output or floating point output
        std::cout << "output name = " << output_name << std::endl;
    }
    //4. Get executable network and link hardware
    auto executable_network = ie.LoadNetwork(network, "CPU");//The network will be loaded into the CPU hardware, which can also be set to GPU

    //5. After creating a reasoning request, you can try reasoning, but there are still many things to do before reasoning, such as format setting
    request = executable_network.CreateInferRequestPtr();
}

void text_recognization_text(std::shared_ptr<InferenceEngine::InferRequest>& request, std::string& input_name, std::string& output_name, cv::Mat& image, cv::Rect& text_roi)
{
    //Set input
    cv::Mat gray;
    cv::cvtColor(image, gray, cv::COLOR_BGR2GRAY);
    cv::Mat faceROI = gray(text_roi);//Because the model of text recognition needs gray image as input
   // frameToBlob(faceROI, face_lanmarks_request, face_lanmarks_input_name);

    auto reco_input_blob = request->GetBlob(input_name);
    size_t num_channels = reco_input_blob->getTensorDesc().getDims()[1];
    size_t h = reco_input_blob->getTensorDesc().getDims()[2];
    size_t w = reco_input_blob->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(faceROI, blob_image, cv::Size(w, h));
    // HWC =>NCHW
    unsigned char* data = static_cast<unsigned char*>(reco_input_blob->buffer());
    for (size_t row = 0; row < h; row++) {
        for (size_t col = 0; col < w; col++) {
            data[row * w + col] = blob_image.at<uchar>(row, col);//Is of type unchar 
        }
    }

    //Executive reasoning
    request->Infer();

    //Output string result return
    auto reco_output = request->GetBlob(output_name);
    const float* blob_out = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(reco_output->buffer());
    const SizeVector reco_dims = reco_output->getTensorDesc().getDims();
    const int RW = reco_dims[0];
    const int RB = reco_dims[1];
    const int RL = reco_dims[2];


    //10. Finally, it is necessary to obtain the output dimension information and analyze the data. The coordinates obtained here are equivalent to the coordinates of the face, and the last drawing point is the coordinates of the picture
    std::string ocr_txt = ctc_decode(blob_out, RW, RL);
    std::cout << ocr_txt << std::endl;
    cv::putText(image, ocr_txt, text_roi.tl(), cv::FONT_HERSHEY_PLAIN, 1.0, cv::Scalar(0, 255, 0), 1);

    return;
}

11. pytorch model transformation and deployment

For many previous detection and recognition, we can quickly develop demos based on OpenVINO, but there is another kind of network that does not have some pre training models, so we can train through other deep learning frameworks. After training the generated models, we will redeploy them to our OpenVINO for reasoning to accelerate the deployment of such a deep model, This piece needs an underlying support. This support requires an automasz in OpenVINO, We talked about IE (interface engine inference engine) in OpenVINO, and this example is to talk about how to convert a python model into an onnx model, then let OpenVINO load the onnx model or turn it into an IR model, let OpenVINO load the IR model, and finally realize the steps of reasoning and demonstration of a model.

Some links given by group friends about environmental deployment, because they learn by themselves openVINO Just as a reserve technology,
Therefore, the relevant of these parts will not be understood and practiced for the time being.
https://www.cnblogs.com/qianwangxingfu/p/13582884.html
https://www.bilibili.com/video/BV1UE411N7gS
anconda to configure pytorch and tensorflow
 install Pycharm  stay Pycharm Just select the interpreter

11.1 ONNX and support

If you are a model trained with Pytorch, you can convert it to onnx. Now that you have trained the python model and have pth files, how can you turn it into an onnx or an IR. (you can call the API in pytoch for conversion). In previous cases, we loaded bin and xml, that is, the intermediate file of IR. In fact, we can also load the onnx format file of the model to complete reasoning, which has the same effect.

11.2 pth file to ONNX format file

11.3. ONNX conversion IR file (binxml file)


·Figure 47 48

12. Transformation and deployment of tensorflow model

13. YOLOv5 model deployment and reasoning

Summary:

Introduction of OpenVINO: landing requirements of the model, especially on edge devices

Environment construction of C + + practice: configuration header file path, library path, library dependency, and environment variable configuration of related applications

ResNet18 image classification model: set model input and analyze two-dimensional output [11000];

Vehicle and license plate detection and recognition: the key is analytical thinking output [1,1,N,7], multi model reasoning support, ROI region transfer

Pedestrian detection: the key is the real-time encapsulation of video, input reasoning and analysis, and the encapsulation of input interface, and the generalization of C + +

Face detection: asynchronous reasoning in video
Expression recognition: multi model reasoning support and asynchronous reasoning to ensure real-time
Face landmark 35 key points detection; Pay attention to the returned ROI region judgment and the extended development of facial features

Real time semantic road segmentation: the key is to analyze the multi-channel output of the segmentation network, which is in the format of 1:4:h:w, and how to get the address position of the corresponding type of the corresponding pixel. And related opencv skills to complete the result output
Instance segmentation: both RCNN models have two input layers to be set, and analyze the category mask generation of the instance ROI area and complete the instance output with relevant opencv technology

Scene text detection: the key to the parsing skills of PixelLink model output is the conversion between floating point number and uchar type
Scene character recognition: the key lies in the requirements of gray image input, output structure analysis of CTC greedy algorithm, and ROI interception of character area.

In the follow-up, there are model transformation, Python and other related contents. Because there is no relevant technical reserve for the time being, we will put them here for the time being, and then learn the knowledge related to openVINO after the relevant in-depth learning model and python language have been systematically studied.

The code project has been tested, and the material will be in the csdn resource last time
Blog notes from; Teacher Jia Zhigang's openVINO course

Keywords: Deep Learning

Added by bentobenji on Wed, 19 Jan 2022 16:43:55 +0200