How to export preprocessed image data for model reasoning from Deepstream Infer Plugin and use TensorRT for reasoning test

During the process of integrating models into Deepstream Infer Plugin, there may be some problems. One of the problems that puzzles people is that after a model is integrated into Deepstream Infer Plugin, the accuracy of model reasoning decreases, which is worse than that of directly using python or C + + to call the original model or using TensorRT API instead of Deepstream, The two methods use the same video or picture and the same model engine file. The performance of the model in the deep stream infer plugin is worse. After many investigations and experiments, it can be determined that this problem has nothing to do with the implementation of the model itself and data preprocessing. The root cause of this problem is still under investigation. I will update it if there are certain results.

This problem can be roughly divided into three aspects.

First, is there a problem with the implementation of the model network itself? For example, the cuda stream used for reasoning should be the same from beginning to end, and it is better to create it instead of using the default cuda stream. Use python or C + + code to call TensorRT API to test the effect of the model, and confirm that the output of the model is the same as the test result of the official original version, This part of the work is related to the model network you use or implement, so it's hard to say too specific steps and details.

After eliminating the problem of the model network implementation itself, the second aspect is to conduct an investigation, which is to compare the image data preprocessing results in Deepstream Infer Plugin with the image data preprocessing results when you directly use python or C + + to call TensorRT API to test the model instead of DeepSstream, The preprocessed data of the two can be restored into pictures and compared with the naked eye. This is a method, but it is not very accurate. A better way is to export the raw data preprocessed by Deepstream Infer Plugin for reasoning into binary files, Then, in your code that does not use Deepstream to directly call TensorRT API, read the raw data directly from the export file in binary mode and copy it to the GPU buffer for reasoning, and then make reasoning. If the result is better than that obtained by reasoning in Deepstream Infer Plugin, That has explained the problem. In TensorRT somewhere in Deepstream or at a lower level, it is the third aspect of troubleshooting, which needs the assistance of nvidia engineers, because the implementation code of context - > enqueue () is the core part of TensorRT and is not open source.

So how to export the image data preprocessed by Deepstream Infer Plugin for reasoning?

In / opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp

NvDsInferStatus InferPreprocessor::transform(
    NvDsInferContextBatchInput& batchInput, void* devBuf,
    CudaStream& mainStream, CudaEvent* waitingEvent)
This function is where the final preprocessing (channel conversion and 1 / 255.0 normalization) is completed. The processed data can be restored to color pictures by adding the following code to the last place in this function:

    

        uchar* tf = new uchar[m_NetworkInputLayer.inferDims.numElements];
        cudaMemcpyAsync(tf, outPtr, m_NetworkInputLayer.inferDims.numElements,cudaMemcpyDeviceToHost,*m_PreProcessStream);
        for (int p=0; p<m_NetworkInputLayer.inferDims.numElements; p++){
            tf[p] *= 255;
            if (tf[p] > 255) tf[p] = 255;
            if (tf[p] < 0) tf[p] = 0;
        }

        cv::Mat img(m_NetworkInfo.height, m_NetworkInfo.width,CV_8UC3,m_NetworkInfo.width*3 );
        img.convertTo(img,CV_RGB2BGR);
        static int g_idx=0;
        cv::imwrite("input"+std::to_string(g_idx++)+".jpg",img);

The code can also be added to nvdsinfercontextimpl:: queueinputbatch (nvdsinfercontextbatchinput & batchinput). After transform() is called:

    ...
    assert(m_BackendContext && backendBuffer);
    assert(m_InferStream && m_InputConsumedEvent && m_InferCompleteEvent);
    
    float* outPtr = (float*)m_BindingBuffers[INPUT_LAYER_INDEX];
    int size = 3*m_NetworkInfo.height*m_NetworkInfo.width;
    uchar* tf = new uchar[size];
    cudaMemcpyAsync(tf, outPtr, size, cudaMemcpyDeviceToHost,*m_InferStream);  
    for (int p=0; p< size; p++){
        tf[p] *= 255;
        if (tf[p] > 255) tf[p] = 255;
        if (tf[p] < 0) tf[p] = 0;
    }
     cv::Mat img(m_NetworkInfo.height, m_NetworkInfo.width, CV_8UC3, tf, m_NetworkInfo.width*3 );
     static g_idx = 0;
     cv::imwrite("input"+std::to_string(g_idx++)+".jpg",img);

Use the following code to export binary raw data to a file:

        float* outPtr = (float*)m_BindingBuffers[INPUT_LAYER_INDEX];

        int size = 3*m_NetworkInfo.height*m_NetworkInfo.width;

        float* tf2 = new float[size];

        cudaMemcpyAsync(tf2, outPtr, size*sizeof(float),cudaMemcpyDeviceToHost,*m_InferStream);

        std::ofstream of("input_raw.bin", std::ios::binary);

        of.write(reinterpret_cast<char *>(tf2), size*sizeof(float));

        of.close();

        delete tf2;

Main codes for directly importing preprocessed binary data for reasoning:

    void* buffers[2];
    static float prob[BATCH_SIZE * OUTPUT_SIZE];
    ...
    const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
    const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
    cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float));
    cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
    cudaStream_t stream;
    CUDA_CHECK(cudaStreamCreate(&stream));
    
    std::string raw_file = std::string("input_raw.bin");
    std::ifstream file(raw_file, std::ios::binary);
    if (!file.good()) {
          printf("read raw data file error!\n");
    }else {
          char *rawStream = nullptr;
          size_t size = 0;
          file.seekg(0, file.end);
          size = file.tellg();
          file.seekg(0, file.beg);
          rawStream = new char[size];
          assert(rawStream);
          file.read(rawStream, size);
          file.close();
          doInference(*context, stream, buffers,reinterpret_cast<float *>(rawStream),prob,BATCH_SIZE );
          ...
      }
    

Keywords: OpenCV Deep Learning TensorRT Jetson deepstream

Added by joejoejoe on Thu, 16 Dec 2021 14:00:53 +0200