preface
When deploying CV algorithm based on deep learning, almost all of them involve input image preprocessing.
In python, the task of image preprocessing is basically contracted by numpy.
But, when the deployment needs to be implemented in C + +, the problem arises: numpy cannot be used. How to convert the image into the format accepted by the model?
*** Outer part cv::dnn::blobFromImage()
Image preprocessing in python
First, we need to know why there is image preprocessing, including but not limited to the following reasons:
- The image size in the data set is not necessarily the same, and the input size during training must be fixed
- The model input size should not be too large, otherwise the training will occupy a lot of memory and video memory
- The contribution of each channel of multi-channel image to the training target is not necessarily the same
- Putting the training data on the same distribution dimension is conducive to improving the generalization ability
- Some CV tasks are sensitive to light and need to be adjusted through image processing
- Data enhancement correlation
The neural network learns a fitting method from the data set. When predicting, the data used for prediction naturally should also conform to the distribution of the data set.
Common image preprocessing methods
- Size modification (resize)
- Color channel swap
- Image normalization
- Image dimension exchange
- Add dimension (expand)
- Image capture (crop)
- Image padding
- Array reshape
Powerful Numpy
The image processing libraries opencv and PIL commonly used in Python can be seamlessly connected with numpy array. The deep learning frameworks such as python, tf and padding can also directly convert numpy data. Moreover, the slicing and processing of numpy array are very powerful, so they almost monopolize the implementation methods of preprocessing and post-processing in CV.
YOLOX is preprocessed by Numpy
YOLOX is the latest algorithm of the classic target detection series YOLO. Its image preprocessing steps during training and reasoning are: resize = > padding = > swap = > normalize = > transfer = > expand. The specific implementation is as follows:
# input_size is the input size required by the model, and image is CV2 Results of imread image_padded = np.ones([self.input_size[0], self.input_size[1], 3], dtype=np.float32) * 114.0 # resize r = min(self.input_size[0]/image.shape[0], self.input_size[1]/image.shape[1]) image_resized = cv2.resize(image, (int(image.shape[1] * r), int(image.shape[0] * r)), cv2.INTER_LINEAR) # padding image_padded[:int(image.shape[0] * r), :int(image.shape[1] * r), :] = image_resized # swap RB channels img = image_padded[:, :, ::-1] # normalize img = (img - self.mean)/self.std # expand img = np.expand_dims(img.transpose(2, 0, 1), axis=0) model_input = np.ascontiguousarray(img, dtype=np.float32)
Familiar with numpy slicing operation, you can easily complete the image preprocessing steps of YOLOX.
Buried CV2 dnn. blobFromImage()
In fact, opencv itself also provides a preprocessing interface blobFromImage. The prototype in python is as follows:
cv2.dnn.blobFromImage(image[, scalefactor[, size[, mean[, swapRB[, crop[, ddepth]]]]]]) # scalefactor: scale factor # Size: resize size # Mean: channel mean # swapRB: swap RB channels # crop: intercept area # ddepth: output image data format, CV2 CV_ 8U or cv2. CV_ 32F
In the order of image processing, first resize, then swap Rb, then subtract mean, and then multiply by scalefactor.
Finally, a blob will be returned, that is, a four-dimensional array. The shape is [1, channels, size[0], size[1]].
In fact, this blob is also a numpy Ndarray, so it can be perfectly replaced by numpy (what numpy can do, it may not be able to handle)
Image preprocessing in C + +
There is no convenient array slicing and access method like numpy in C + +. The method of accessing opencv Mat format depends on pointers, so the preprocessing operation involving dimensions is particularly complex.
For example, the Foucs module in the YOLOX and yorov5 models, because many AI chips do not support the slicing operation of this focus (the hardware op operator is slightly behind the academic), this operation must be implemented in advance during image preprocessing.
In python, the focus module is very simple:
# focus module img_focus = np.concatnate([image[::2, ::2, :], image[1::2, ::2, :], image[::2, 1::2, :], image[1::2, 1::2, :]], axis=-1)
However, the implementation of C++ opencv Mat is very troublesome. It needs to rearrange Mat data with the help of pointers
Mat data rearrangement
The focus operation is actually to split a picture into four equal size images (equivalent to downsample), and then splice the four small images from top to bottom. Therefore, it involves the preprocessing of both size (downsample) and channel (splicing).
The idea of data rearrangement is to paste the data from the downsample of the original Mat wide high school to the channel of the new Mat. The C + + implementation is as follows
Mat focusImage(vector<Mat> srcChannels) { Mat focusMat(YOLOArgs::modelHeight/2, YOLOArgs::modelWidth/2, CV_32FC(12), 0.0); int startPt[4][2] = {{0, 0}, {0, 1}, {1, 0}, {1, 1}}, startX, startY; for (size_t i = 0; i < YOLOArgs::modelHeight / 2; i++) { for (size_t j = 0; j < YOLOArgs::modelWidth / 2; j++) { for (size_t k = 0; k < 4; k++) { for (size_t s = 0; s < srcChannels.size(); s++) { startX = startPt[k][0]; startY = startPt[k][1]; focusMat.at<Vec<float, 12>>(i, j)[4 * s + k] = srcChannels[s].at<float>(2 * i + startY, 2 * j + startX); } } } } return focusMat; }
The above code uses four for loops, which is very troublesome compared with python numpy.
cv::dnn::blobFromImage() interface for lazy people
Although the focus module is not common in other models, another preprocessing operation is very common, that is, transfer, which converts the model input format from NHWC to NCHW.
According to the above idea of Mat data rearrangement, transfer also needs three for loops to rearrange. At this point, the blobFromImage() interface that sits on the sidelines in python comes in handy.
Besides transfer, cv::dnn::blobFromImage() can also complete resize, swapRB, scale and crop. It can be said that it is the preferred function for image preprocessing in C + +.