Deploying yolox using libtorch

preface

The last article introduced how to use ncnn to deploy yolox on nano. Because the ncnn related deployment code of yolox is open source, we only need to configure the environment according to the process, modify the relevant code and compile it. The author has open source the code of four deployment methods, but there is no libtorch. Libtorch, as the C + + version of pytorch, also has very efficient deployment efficiency, It is also relatively easy to use, so this article describes how to deploy yolox using libtorch.

1. Model transformation pytorch->torchscript

First of all, we should serialize the model of pytorch into a C + + callable model. Here, I imitate the code of torch to onnx in the original hub and create an export_libtorch.py file, put it in export_onnx.py directory, the conversion code is as follows:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Copyright (c) Megvii, Inc. and its affiliates.

import argparse
import os
from loguru import logger
import torch
from torch import nn

from yolox.exp import get_exp
from yolox.models.network_blocks import SiLU
from yolox.utils import replace_module


def make_parser():
    parser = argparse.ArgumentParser("YOLOX libtorch deploy")
    parser.add_argument(
        "--output-name", type=str, default="yolox.pt", help="output name of models"
    )
    parser.add_argument("--input", default="images", type=str, help="input name of onnx model")
    parser.add_argument("--output", default="output", type=str, help="output name of onnx model")
    parser.add_argument("-o", "--opset", default=11, type=int, help="onnx opset version")

    parser.add_argument(
        "-f",
        "--exp_file",
        default=None,
        type=str,
        help="expriment description file",
    )
    parser.add_argument("-expn", "--experiment-name", type=str, default=None)
    parser.add_argument("-n", "--name", type=str, default=None, help="model name")
    parser.add_argument("-c", "--ckpt", default=None, type=str, help="ckpt path")
    parser.add_argument(
        "opts",
        help="Modify config options using the command-line",
        default=None,
        nargs=argparse.REMAINDER,
    )

    return parser


@logger.catch
def main():
    args = make_parser().parse_args()
    logger.info("args value: {}".format(args))
    exp = get_exp(args.exp_file, args.name)
    exp.merge(args.opts)

    if not args.experiment_name:
        args.experiment_name = exp.exp_name

    model = exp.get_model()
    if args.ckpt is None:
        file_name = os.path.join(exp.output_dir, args.experiment_name)
        ckpt_file = os.path.join(file_name, "best_ckpt.pth.tar")
    else:
        ckpt_file = args.ckpt

    ckpt = torch.load(ckpt_file, map_location="cpu")
    # load the model state dict

    model.eval()
    if "model" in ckpt:
        ckpt = ckpt["model"]
    model.load_state_dict(ckpt)
    model = replace_module(model, nn.SiLU, SiLU)
    model.head.decode_in_inference = False

    logger.info("loaded checkpoint done.")
    dummy_input = torch.randn(1, 3, exp.test_size[0], exp.test_size[1])

    traced_script_module = torch.jit.trace(model, dummy_input)
    output1 = traced_script_module(torch.ones(1, 3, 640, 640))
    output2 = model(torch.ones(1, 3, 640, 640))
    print(output1)                             # Check whether the reasoning after conversion is consistent
    print(output2)
    traced_script_module.save(args.output_name)
    logger.info("generate jit::torch named {}".format(args.output_name))

if __name__ == "__main__":
    main()

Direct run command:

python tools/export_libtorch.py -n yolox-s -c weights/best_ckpt.pth.tar

Similarly, if it is the weight of your own training, you need to modify exps/default/yolox_s.py file, change the number of categories to 1:

After running, you get yolox PT file (if GPU reasoning is required, cuda needs to be loaded during model transformation).

2, libtorch environment configuration

In fact, there is no configuration, because the libtorch official has compiled it for us. You can download it directly and use it torch's official website Medium:

Directly copy and link to the browser or download from Xunlei. Here, it should be noted that it should correspond to the pytorch version and cuda version you use as far as possible, otherwise there may be problems, because I use torch 1 Version 8, so the link has been modified to 1.9 0 to 1.8 0. It can be downloaded normally. The file is a little large. Wait slowly; After downloading it, unzip it directly to get a 4G folder (too big...), with the following structure:

First record the file path, which will be used when compiling. You can simply write a program to test whether it can be used normally:

#include <iostream>
#include "torch/torch.h"
#include "torch/jit.h"

int main() {
    std::cout << "Hello, World!" << std::endl;
    auto a = torch::tensor({{1, 2}, {3, 4}});
    std::cout << a << std::endl;
    return 0;
}

CMakeLists.txt file is as follows:

find_package(PythonInterp REQUIRED)
cmake_minimum_required(VERSION 3.16)
project(Libtorch_test)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_PREFIX_PATH /home/cai/github/libtorch)      # Write the path of libtorch save just now

find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(Libtorch_test  libtorch_test.cpp)
target_link_libraries(Libtorch_test ${TORCH_LIBRARIES})

If the following results are obtained after compilation, it can be used normally (if it is developed by ubuntu, it is recommended to use clion to write and compile c + + code. When creating a new C + + project, it will help you create the two files at the same time and directly copy them to run. However, the bad thing about clion is that it is not free... I think of a way to do it myself)

3, Using libtorch to call the torchscripts model

Because yolox is anchor free, its post-processing part is very friendly and not as complex as yolov5. For coco 80 class, the output is still 85 channels, which is the same as yolov5, The same is true (two center points (2) + width and height (2) + confidence (1) + classification (80) = 85). The difference is that without anchor, it is easier to calculate the offset back to the original drawing.

Give cmakelists first Txt file content, because opencv is used, opencv should also be compiled in advance:

find_package(PythonInterp REQUIRED)
cmake_minimum_required(VERSION 3.16)
project(Libtorch_yolox)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_PREFIX_PATH /home/cai/github/libtorch)          # Path modification

find_package(Torch REQUIRED)
find_package(OpenCV REQUIRED)

include_directories(${OpenCV_INCLUDE_DIRS})

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(Libtorch_yolox main.cpp)
target_link_libraries(Libtorch_yolox ${TORCH_LIBRARIES} ${OpenCV_LIBS})

Create a new main CPP files are directly coded. The main difference between CPP files and ncnn and tensorrt codes is that they are basically the same as the pre-processing and reasoning part of the image and the post-processing part after network output. However, due to the different types of data output by different frameworks, corresponding modifications should also be made:

#include <iostream>
#include "torch//jit.h"
#include "opencv2/opencv.hpp"
#include "torch/script.h"
#include "algorithm"

using namespace cv;
using namespace std;

static const int INPUT_W = 640;
static const int INPUT_H = 640;

#define NMS_THRESH 0.1
#define BBOX_CONF_THRESH 0.3

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
};

struct GridAndStride
{
    int grid0;
    int grid1;
    int stride;
};

Mat narrow_640_pad(Mat& img,float &scale) {

    int nw = int(img.cols * scale);
    int nh = int(img.rows * scale);

    resize(img, img, cv::Size(nw, nh), INTER_CUBIC);

    int top = 0;
    int bottom = INPUT_H - nh - top;
    int left = 0;
    int right = INPUT_W - nw - left;

    Mat resize_img;
    copyMakeBorder(img, resize_img, top, bottom, left, right, BORDER_CONSTANT, Scalar(114, 114, 114));

    return resize_img;
}

static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const torch::Tensor &feat_ptr, float prob_threshold, std::vector<Object>& objects)
{

    const int num_class = feat_ptr.sizes()[1] - 5;

    const int num_anchors = grid_strides.size();

//    const float* feat_ptr = (float *)feat_blob.data;

    for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
    {
        const int grid0 = grid_strides[anchor_idx].grid0;
        const int grid1 = grid_strides[anchor_idx].grid1;
        const int stride = grid_strides[anchor_idx].stride;

        // yolox/models/yolo_head.py decode logic
        //  outputs[..., :2] = (outputs[..., :2] + grids) * strides
        //  outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
        torch::Tensor x_center = (feat_ptr[anchor_idx][0] + grid0) * stride;
        torch::Tensor y_center = (feat_ptr[anchor_idx][1] + grid1) * stride;
        torch::Tensor w = exp(feat_ptr[anchor_idx][2]) * stride;
        torch::Tensor h = exp(feat_ptr[anchor_idx][3]) * stride;
        torch::Tensor x0 = x_center - w * 0.5f;
        torch::Tensor y0 = y_center - h * 0.5f;

        torch::Tensor box_objectness = feat_ptr[anchor_idx][4];

        for (int class_idx = 0; class_idx < num_class; class_idx++)
        {
            torch::Tensor box_cls_score = feat_ptr[anchor_idx][5 + class_idx];
            torch::Tensor box_prob = box_objectness * box_cls_score;
            if (box_prob.item().toFloat() > prob_threshold)
            {
                Object obj;
                obj.rect.x = x0.item().toFloat();
                obj.rect.y = y0.item().toFloat();
                obj.rect.width = w.item().toFloat();
                obj.rect.height = h.item().toFloat();
                obj.label = class_idx;
                obj.prob = box_prob.item().toFloat();

                objects.push_back(obj);
            }

        }

    } // point anchor loop
}

static int generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
    for (auto stride : strides)
    {
        int num_grid = target_size / stride;
        for (int g1 = 0; g1 < num_grid; g1++)
        {
            for (int g0 = 0; g0 < num_grid; g0++)
            {
                grid_strides.push_back((GridAndStride){g0, g1, stride});
            }
        }
    }
}

static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right) {
    int i = left;
    int j = right;
    float p = faceobjects[(left + right) / 2].prob;

    while (i <= j) {
        while (faceobjects[i].prob > p)
            i++;

        while (faceobjects[j].prob < p)
            j--;

        if (i <= j) {
            // swap
            std::swap(faceobjects[i], faceobjects[j]);

            i++;
            j--;
        }
    }
}


static void qsort_descent_inplace(std::vector<Object>& objects)
{
    if (objects.empty())
        return;

    qsort_descent_inplace(objects, 0, objects.size() - 1);
}

static inline float intersection_area(const Object& a, const Object& b)
{
    cv::Rect_<float> inter = a.rect & b.rect;
    return inter.area();
}

static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
{
    picked.clear();

    const int n = faceobjects.size();

    std::vector<float> areas(n);
    for (int i = 0; i < n; i++)
    {
        areas[i] = faceobjects[i].rect.area();
    }

    for (int i = 0; i < n; i++)
    {
        const Object& a = faceobjects[i];

        int keep = 1;
        for (int j = 0; j < (int)picked.size(); j++)
        {
            const Object& b = faceobjects[picked[j]];

            // intersection over union
            float inter_area = intersection_area(a, b);
            float union_area = areas[i] + areas[picked[j]] - inter_area;
            // float IoU = inter_area / union_area
            if (inter_area / union_area > nms_threshold)
                keep = 0;
        }

        if (keep)
            picked.push_back(i);
    }
}


static void decode_outputs(torch::Tensor &prob, std::vector<Object>& objects, float &scale, const int img_w, const int img_h) {
    std::vector<Object> proposals;
    std::vector<int> strides = {8, 16, 32};
    std::vector<GridAndStride> grid_strides;
    generate_grids_and_stride(INPUT_W, strides, grid_strides);
    generate_yolox_proposals(grid_strides, prob,  BBOX_CONF_THRESH, proposals);
    std::cout << "num of boxes before nms: " << proposals.size() << std::endl;

    qsort_descent_inplace(proposals);

    std::vector<int> picked;
    nms_sorted_bboxes(proposals, picked, NMS_THRESH);


    int count = picked.size();

    std::cout << "num of boxes: " << count << std::endl;

    objects.resize(count);
    for (int i = 0; i < count; i++)
    {
        objects[i] = proposals[picked[i]];

        // adjust offset to original unpadded
        float x0 = (objects[i].rect.x) / scale;
        float y0 = (objects[i].rect.y) / scale;
        float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
        float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;

        // clip
        x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
        y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
        x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
        y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);

        objects[i].rect.x = x0;
        objects[i].rect.y = y0;
        objects[i].rect.width = x1 - x0;
        objects[i].rect.height = y1 - y0;
    }
}

const float color_list[80][3] =
        {
                {0.850, 0.325, 0.098},
                {0.000, 0.447, 0.741},
                {0.929, 0.694, 0.125},
                {0.494, 0.184, 0.556},
                {0.466, 0.674, 0.188},
                {0.301, 0.745, 0.933},
                {0.635, 0.078, 0.184},
                {0.300, 0.300, 0.300},
                {0.600, 0.600, 0.600},
                {1.000, 0.000, 0.000},
                {1.000, 0.500, 0.000},
                {0.749, 0.749, 0.000},
                {0.000, 1.000, 0.000},
                {0.000, 0.000, 1.000},
                {0.667, 0.000, 1.000},
                {0.333, 0.333, 0.000},
                {0.333, 0.667, 0.000},
                {0.333, 1.000, 0.000},
                {0.667, 0.333, 0.000},
                {0.667, 0.667, 0.000},
                {0.667, 1.000, 0.000},
                {1.000, 0.333, 0.000},
                {1.000, 0.667, 0.000},
                {1.000, 1.000, 0.000},
                {0.000, 0.333, 0.500},
                {0.000, 0.667, 0.500},
                {0.000, 1.000, 0.500},
                {0.333, 0.000, 0.500},
                {0.333, 0.333, 0.500},
                {0.333, 0.667, 0.500},
                {0.333, 1.000, 0.500},
                {0.667, 0.000, 0.500},
                {0.667, 0.333, 0.500},
                {0.667, 0.667, 0.500},
                {0.667, 1.000, 0.500},
                {1.000, 0.000, 0.500},
                {1.000, 0.333, 0.500},
                {1.000, 0.667, 0.500},
                {1.000, 1.000, 0.500},
                {0.000, 0.333, 1.000},
                {0.000, 0.667, 1.000},
                {0.000, 1.000, 1.000},
                {0.333, 0.000, 1.000},
                {0.333, 0.333, 1.000},
                {0.333, 0.667, 1.000},
                {0.333, 1.000, 1.000},
                {0.667, 0.000, 1.000},
                {0.667, 0.333, 1.000},
                {0.667, 0.667, 1.000},
                {0.667, 1.000, 1.000},
                {1.000, 0.000, 1.000},
                {1.000, 0.333, 1.000},
                {1.000, 0.667, 1.000},
                {0.333, 0.000, 0.000},
                {0.500, 0.000, 0.000},
                {0.667, 0.000, 0.000},
                {0.833, 0.000, 0.000},
                {1.000, 0.000, 0.000},
                {0.000, 0.167, 0.000},
                {0.000, 0.333, 0.000},
                {0.000, 0.500, 0.000},
                {0.000, 0.667, 0.000},
                {0.000, 0.833, 0.000},
                {0.000, 1.000, 0.000},
                {0.000, 0.000, 0.167},
                {0.000, 0.000, 0.333},
                {0.000, 0.000, 0.500},
                {0.000, 0.000, 0.667},
                {0.000, 0.000, 0.833},
                {0.000, 0.000, 1.000},
                {0.000, 0.000, 0.000},
                {0.143, 0.143, 0.143},
                {0.286, 0.286, 0.286},
                {0.429, 0.429, 0.429},
                {0.571, 0.571, 0.571},
                {0.714, 0.714, 0.714},
                {0.857, 0.857, 0.857},
                {0.000, 0.447, 0.741},
                {0.314, 0.717, 0.741},
                {0.50, 0.5, 0}
        };


static void draw_objects(const cv::Mat& bgr, const std::vector<Object>& objects)
{
//    static const char* class_names[] = {
//            "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
//            "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
//            "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
//            "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
//            "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
//            "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
//            "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
//            "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
//            "hair drier", "toothbrush"
//    };                               // If it is a coco 80 class, you can release the comment. And comment out the following
    static const char* class_names[] = {
            "label"                        // Modify according to the number of categories
    };

    cv::Mat image = bgr.clone();

    for (size_t i = 0; i < objects.size(); i++)
    {
        const Object& obj = objects[i];

        fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
                obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);

        cv::Scalar color = cv::Scalar(color_list[obj.label][0], color_list[obj.label][1], color_list[obj.label][2]);
        float c_mean = cv::mean(color)[0];
        cv::Scalar txt_color;
        if (c_mean > 0.5){
            txt_color = cv::Scalar(0, 0, 0);
        }else{
            txt_color = cv::Scalar(255, 255, 255);
        }

        cv::rectangle(image, obj.rect, color * 255, 2);

        char text[256];
        sprintf(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);

        int baseLine = 0;
        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);

        cv::Scalar txt_bk_color = color * 0.7 * 255;

        int x = obj.rect.x;
        int y = obj.rect.y + 1;
        //int y = obj.rect.y - label_size.height - baseLine;
        if (y > image.rows)
            y = image.rows;
        //if (x + label_size.width > image.cols)
        //x = image.cols - label_size.width;

        cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
                      txt_bk_color, -1);

        cv::putText(image, text, cv::Point(x, y + label_size.height),
                    cv::FONT_HERSHEY_SIMPLEX, 0.4, txt_color, 1);
    }

    cv::imwrite("det_res.jpg", image);
    fprintf(stderr, "save vis file\n");
    /* cv::imshow("image", image); */
    /* cv::waitKey(0); */
}


int main() {
    const char*imagepath = "../1.jpg";

    torch::jit::script::Module model;
    try{
        model = torch::jit::load("../yolox_best.pt");
//        model.to(at::kCUDA);
    } catch (const c10::Error&e) {
        cerr << "error loading the model\n";
        return -1;
    }

    Mat image = imread(imagepath);
    Mat image_copy = image.clone();

    if (image.empty()){
        cout << "Eroor: Could not load image" << endl;
        return -1;
    }
    double time0 = static_cast<double >(getTickCount());

    float scale = std::min(INPUT_W / (image.cols*1.0), INPUT_H / (image.rows*1.0));

    image = narrow_640_pad(image,scale);

    torch::Tensor tensor_image = torch::from_blob(image.data,{1,image.rows,image.cols,3},torch::kByte);
    tensor_image = tensor_image.permute({0,3,1,2});
    tensor_image = tensor_image.toType(torch::kFloat);
//    tensor_image = tensor_image.to(at::kCUDA);
    tensor_image = tensor_image.div(255)-0.5;
    tensor_image = tensor_image.div(0.5);

    torch::Tensor result = model.forward({tensor_image}).toTensor();
    result = result.squeeze(0);

    std::vector<Object> objects;
    decode_outputs(result,objects,scale,image_copy.cols,image_copy.rows);
    draw_objects(image_copy, objects);

    //[5] Calculate run time and output
    time0 = ((double)getTickCount() - time0) / getTickFrequency();   //End time - start time, converted to seconds
    cout << "\t This method runs on: " << time0 << "second" << endl;    //Output run time

    return 0;
}

Run the executable directly after compilation/ Libtorch_yolox gets the following results:

559 is the number of detected frames, 94 is the number of target frames after NMS, and the following information is the specific information of each target frame. Still use the picture tested before. The test effect is as follows, and the effect is basically the same as before. If you have time, find a camera or video to try the effect.

Keywords: C++ Pytorch

Added by cyberdesi on Sat, 01 Jan 2022 15:56:09 +0200