https://zhuanlan.zhihu.com/p/191569603
[toc]
C + + deployment pytorch model
preface
The project needs to call the network trained by pytorch with C + +. Before officially starting the project, I checked various materials on the Internet. There are three implementation methods: directly implement the network from the most basic CNN module with C + +; Save the network model and parameters, and then use the DNN module of opencv to load. This method can also be used in other network architectures such as tensorflow and torch. The specific contents will be given below* Use the C + + interface provided by pytorch official website: LibTorch. The principle is to save the network model and parameters, and then load them with LibTorch. Since the first item of C + + is too hard core from the layer, and its own level is limited, it will not be introduced here. Big guys can try it on their own. Here, only the implementation methods of opencv and LibTorch are introduced.
Operating environment: win10 64 position cuda 10.2 pytorch 1.6.0 torchvision 0.7 opencv 4.3 vs2019 LibTorch 1.6 ps: pytorch Relevant software is the latest version downloaded directly from the official website.
Train a simple pytorch network
First, refer to the official pytorch documentation Train a classifier Code to train a simple image classifier. The code is as follows:
import torch.optim as optim import torch.nn.functional as F import torch.nn as nn import numpy as np import matplotlib.pyplot as plt import torch import torch.onnx import torchvision import torchvision.transforms as transforms device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=0)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=0)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
print(images.shape)
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(3, 6, 3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 12, 3)
self.conv3 = nn.Conv2d(12, 32, 3)
self.fc1 = nn.Linear(32 4 4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pool</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pool</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">conv2</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span> <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">conv3</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">32</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">*</span> <span class="mi">4</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">fc1</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">fc2</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">fc3</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">return</span> <span class="n">x</span>
net = Net()
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(100): # loop over the dataset multiple times
<span class="n">running_loss</span> <span class="o">=</span> <span class="mf">0.0</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">data</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">trainloader</span><span class="p">,</span> <span class="mi">0</span><span class="p">):</span> <span class="c1"># get the inputs; data is a list of [inputs, labels]</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">),</span> <span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span> <span class="c1"># zero the parameter gradients</span> <span class="n">optimizer</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span> <span class="c1"># forward + backward + optimize</span> <span class="n">outputs</span> <span class="o">=</span> <span class="n">net</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span> <span class="n">loss</span> <span class="o">=</span> <span class="n">criterion</span><span class="p">(</span><span class="n">outputs</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span> <span class="n">loss</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span> <span class="n">optimizer</span><span class="o">.</span><span class="n">step</span><span class="p">()</span> <span class="c1"># print statistics</span> <span class="n">running_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">()</span> <span class="k">if</span> <span class="n">i</span> <span class="o">%</span> <span class="mi">2000</span> <span class="o">==</span> <span class="mi">1999</span><span class="p">:</span> <span class="c1"># print every 2000 mini-batches</span> <span class="k">print</span><span class="p">(</span><span class="n">outputs</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">'[</span><span class="si">%d</span><span class="s1">, </span><span class="si">%5d</span><span class="s1">] loss: </span><span class="si">%.3f</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">epoch</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">running_loss</span> <span class="o">/</span> <span class="mi">2000</span><span class="p">))</span> <span class="n">running_loss</span> <span class="o">=</span> <span class="mf">0.0</span>
print('Finished Training')
Compared with the code in the official document, the above code only adds the convolution layer and uses GPU for training, and the output result is not processed, but simply outputs the probability values of various categories. Save the network after training. The code is as follows:
# Save network structure and parameters
#Method 1: save the network structure and parameters
PATH = './cifar_net.pth'
torch.save(net, PATH)
#Method 2: save network parameters
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
#Method 3: export network to ONNX
dummy_input = torch.randn(1, 3, 32, 32).to(device)
torch.onnx.export(net, dummy_input, "torch.onnx")
#Method 4: save the network bit TORCHSCRIPT
dummy_input = torch.randn(1, 3, 32, 32).to(device)
traced_cell = torch.jit.trace(net, dummy_input)
traced_cell.save("tests.pth")
The above four preservation methods mainly use method 3 and method 4, and the specific application methods will be described in detail below. Here's a simple point: I began to think that the network saved in method 1 can be imported directly with the load function like tensorflow, and the original network architecture can be reconstructed automatically. However, after the experiment, I found that in order to import successfully, the class defining the network also needs to be placed in the corresponding py file, which is a little... Method 1 import sample code as follows:
import torch
import torchvision
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
import cv2
import torchvision.transforms as transforms
PATH = './cifar_net.pth'
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(3, 6, 3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 12, 3)
self.conv3 = nn.Conv2d(12, 32, 3)
self.fc1 = nn.Linear(32 4 4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pool</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pool</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">conv2</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span> <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">conv3</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">32</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">*</span> <span class="mi">4</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">fc1</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">fc2</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">fc3</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">return</span> <span class="n">x</span>
model = torch.load(PATH)
By now, all preparations have been completed.
Using opencv to load the trained model and network
Reference link: OpenCV4.0 run quick style migration (Torch) opencv official document Unsupported Lua type when loading pytoch model by OpenCV solution Export the model from PYTORCH to ONNX and run it using ONNX RUNTIME (official website link) This method is a commonly used method and can be used in many deep learning frameworks: according to the instructions in the official opencv documents, the following frameworks can be supported: cafe, Darknet, onnx, Tensorflow, Torch, etc. Unfortunately, there is no pytoch I use, but according to the method in the third reference link, onnx can be used to save the country. First, save the network and parameters in the corresponding format by using the method shown in method 3 of saving the model. Then use the net CV:: DNN:: readnetfromonnx (const string & onnxfile) function provided by OpenCV to read the saved network. The code implementation is as follows:
//Test opencv loading pytorch model
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
using namespace cv;
using namespace cv::dnn;
#include <fstream>
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
String modelFile = "./torch.onnx";
String imageFile = "./dog.jpg";
<span class="n">dnn</span><span class="o">::</span><span class="n">Net</span> <span class="n">net</span> <span class="o">=</span> <span class="n">cv</span><span class="o">::</span><span class="n">dnn</span><span class="o">::</span><span class="n">readNetFromONNX</span><span class="p">(</span><span class="n">modelFile</span><span class="p">);</span> <span class="c1">//Read network and parameters
Mat image = imread(imageFile); // Read test picture
cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
Mat inputBolb = blobFromImage(image, 0.00390625f, Size(32, 32), Scalar(), false, false); // Convert the image to the correct input format
net.setInput(inputBolb); // input image
Mat result = net.forward(); // Forward calculation
cout << result << endl;
}
The above code simplifies the code of the first reference link and changes the model of the input network from torch to ONNX format. The operation results are as follows:
[-0.19793352, -4.0697966, 1.2769811, 2.7011304, 0.22390884, 1.9039617, -0.47333384, -0.15912014, 0.32441139, -2.4327304]
If you need to deploy a network of other deep learning frameworks, the steps are basically similar.
Use the LibTorch provided by pytorch to load the trained model and network
Reference link: Introduction to windows+VS2019+PyTorchLib configuration and use C + + calls pytorch, LibTorch vs configuration under win10 and cmake configuration Link to the official website of TORCHSCRIPT model loaded in C + + First of all, there are two ways to save pytroch as TORCHSCRIPT. One is tracking, and the other is scripting. See for details Official documents In theory, this method can be saved in both ways. The tracking method in method 4 is used in this paper. First, configure the LibTorch environment according to the method in the first reference link, and then copy and paste the sample code for testing, but I personally use totensor (image) when running to(at::kCUDA); An error is reported in this statement, indicating that ToTensor() is not defined. The function of this sentence is also very simple, that is, to convert the ordinary image format into the format required for model input. Therefore, I modified the conversion code according to the second reference link. The code is as follows:
#include <torch/script.h>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <torch/torch.h>
// Some people say that the order of calling has something to do with it. I don't seem to be of any use~~
int main()
{
torch::DeviceType device_type;
if (torch::cuda::is_available()) {
std::cout << "CUDA available! Predicting on GPU." << std::endl;
device_type = torch::kCUDA;
}
else {
std::cout << "Predicting on CPU." << std::endl;
device_type = torch::kCPU;
}
torch::Device device(device_type);<span class="c1">//Init model
std::string model_pb = "tests.pth";
auto module = torch::jit::load(model_pb);
module.to(at::kCUDA);
<span class="k">auto</span> <span class="n">image</span> <span class="o">=</span> <span class="n">cv</span><span class="o">::</span><span class="n">imread</span><span class="p">(</span><span class="s">"dog.jpg"</span><span class="p">,</span> <span class="n">cv</span><span class="o">::</span><span class="n">ImreadModes</span><span class="o">::</span><span class="n">IMREAD_COLOR</span><span class="p">);</span> <span class="n">cv</span><span class="o">::</span><span class="n">Mat</span> <span class="n">image_transfomed</span><span class="p">;</span> <span class="n">cv</span><span class="o">::</span><span class="n">resize</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">image_transfomed</span><span class="p">,</span> <span class="n">cv</span><span class="o">::</span><span class="n">Size</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">));</span> <span class="c1">// convert to tensort
torch::Tensor tensor_image = torch::from_blob(image_transfomed.data,
{ image_transfomed.rows, image_transfomed.cols,3 }, torch::kByte);
tensor_image = tensor_image.permute({ 2,0,1 });
tensor_image = tensor_image.toType(torch::kFloat);
tensor_image = tensor_image.div(255);
tensor_image = tensor_image.unsqueeze(0);
tensor_image = tensor_image.to(at::kCUDA);
torch::Tensor output = module.forward({ tensor_image }).toTensor();
auto max_result = output.max(1, true);
auto max_index = std::get<1>(max_result).item<float>();
std::cout << output << std::endl;
//return max_index;
return 0;
}
The operation results are as follows:
CUDA available! Predicting on GPU.
1.0824 -4.6106 1.0189 2.9937 1.4570 1.4964 -1.3164 -0.7753 0.4567 -3.2543
[ CUDAFloatType{1,10} ]