sketch
In the previous chapter, we introduced the details of iOS acquisition. This time, let me introduce the knowledge of iOS hard coding.
First of all, why coding is needed? Last time, we mentioned something, CMSampleBuffer, which can be used to encapsulate ImageBuffer and store raw stream data. It is a general structure. I'll first introduce what this is, and then I'll understand why it needs coding.
First acquaintance with CVPixelBufferRef
In our usual color world, we all know RGB three primary colors. Using the mixture of these three colors can form the vast majority of colors in the world.
In the computer, we call this format kCVPixelFormatType_32RGBA, that is, every 32-bit data contains 8-bit R, 8-bit G, 8-bit B and 8-bit A. Usually, we use another one instead of this one, namely kCVPixelFormatType_32BGRA.
Here, we can understand that if RGBA is used to represent a graph, a pixel contains 4 bytes, and each byte represents the primary color of a word bit. If we need to watch a 60 second 30fps720p video, we need 720 * 1280 * 4 * 60 * 30 = 6635520000b data, which is equivalent to 791MB data. I think most people's network speed may not reach 10MB/s, not to mention 791MB/s. if we don't go through video coding, we can't reproduce and watch the video, let alone live broadcast.
How is this data stored in iOS? That object is CVPixelBufferRef. CVPixelBufferRef is actually the pixel layout of a picture. It can be one of the following types:
image type | meaning |
---|---|
kCVPixelFormatType_32RGBA | RGBA32 bit, layout RGBA |
kCVPixelFormatType_32BGRA | BGRA32 is and the layout is BGRA |
kCVPixelFormatType_420YpCbCr8Planar / kCVPixelFormatType_420YpCbCr8PlanarFullRange | I420 data arrangement: the front height * stripes [0] is the Y component, the middle height * stripes [1] / 2 is the U component, and the rear data is the V component |
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange | NV12 data arrangement, with the layout of height * stripes [0] as Y component and height * stripes [0] as UV component, is a commonly used type, which is used in coding, decoding and rendering. The UV data range is (luma = [16235] chroma = [16240]) |
kCVPixelFormatType_420YpCbCr8BiPlanarFullRange | NV12 data arrangement, with the layout of height * stripes [0] as Y component and height * stripes [0] as UV component, is a commonly used type, which is used in coding, decoding and rendering. UV data range is (luma = [0255] chroma = [0255]) |
kCVPixelFormatType_OneComponent8 | Gray component, i.e. Y component, is often used as input source for image quality enhancement on the terminal. |
kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange / kCVPixelFormatType_420YpCbCr10BiPlanarFullRange | HDR10 related data source. Later, we will explain how to decode this data source. |
We know what CVPixelBufferRef is. How do we create a CVPixelBuffer? Let's look at the relevant function prototypes
/*! * @param CFAllocatorRef The memory application function can use the default kcunallocatordefault. * @param width The width of the canvas you want to create. * @param height The height of the canvas you want to create. * @param pixelFormatType The type of canvas can be one of the several mentioned above or defined. * @param pixelBufferAttributes Properties of canvas, you can specify iOSSurface or OpenGL properties. * @param pixelBufferOut Reference pointer to the canvas instance created. */ CV_EXPORT CVReturn CVPixelBufferCreate( CFAllocatorRef CV_NULLABLE allocator, size_t width, size_t height, OSType pixelFormatType, CFDictionaryRef CV_NULLABLE pixelBufferAttributes, CV_RETURNS_RETAINED_PARAMETER CVPixelBufferRef CV_NULLABLE * CV_NONNULL pixelBufferOut) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0);
Example code:
int width = 1280; int height = 720; OSType format = kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange; const void *keys[] = { kCVPixelBufferOpenGLESCompatibilityKey, kCVPixelBufferIOSurfacePropertiesKey }; const void *values[] = { (__bridge void *)[NSNumber numberWithBool:YES], (__bridge void *)[NSDictionary dictionary] }; CFDictionaryRef attributes = CFDictionaryCreate(NULL, keys, values, 2, NULL, NULL); CVPixelBufferRef pixelBuffer = nil; CVReturn ret = CVPixelBufferCreate(kCFAllocatorDefault, width, height, format, attributes, &pixelBuffer);
We can already create CVPixelBufferRef ourselves, year ~ however, it seems that it has nothing to do with the topic of this chapter.
In fact, it is not. In audio and video, it is often necessary to cut or beautify the image, and the image needs to be processed. An in-depth understanding of CVPixelBufferRef is helpful for us to deal with such an image source. Next, let's access the source data related to CVPixelBufferRef.
Function prototype:
/*! * @brief Gets the memory pointer offset of Buffer for accessing memory. * @param pixelBuffer buffer to access memory * @param planeIndex The component of the buffer that needs to access memory. */ CV_EXPORT void * CV_NULLABLE CVPixelBufferGetBaseAddressOfPlane(CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0); /** * @brief It is used to obtain the number of elements in a row of pixels, similar to stripe, because the underlying layer will use 32-bit or 64 bit to align the data. * @param pixelBuffer buffer to access * @param planeIndex Which component to access */ CV_EXPORT size_t CVPixelBufferGetBytesPerRowOfPlane( CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0); /** * @brief Gets the width of a component in the Buffer */ CV_EXPORT size_t CVPixelBufferGetWidthOfPlane( CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0); /** * @brief Gets the height of a component in the Buffer */ CV_EXPORT size_t CVPixelBufferGetHeightOfPlane( CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0);
Example code:
// Get the pointer of the nth component, similar to YUV420. The 0th component is Y, the 1st component is U, and the 2nd component is V void* address = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, n); size_t bytes = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, n); size_t height = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, n); for (size_t i = 0; i < (n == 0 ? height : height / 2); ++i) { for (size_t j = 0; j < bytes; ++j) { // todo update address } }
As for CVPixelBufferRef, it is originally a c-layer pointer data. It does not have the automatic memory recovery mechanism of oc. It needs to release the memory itself. The code for releasing the memory is as follows:
CVPixelBufferRelease(pixelBuffer);
So far, you have preliminarily known how to generate CVPixelBufferRef, how to operate CVPixelBufferRef, and how to destroy CVPixelBufferRef. This is very important for our subsequent access to x264 soft encoder.
VTCompressionSessionRef
Now that we understand how iOS stores images, let's talk about iOS's hard encoder.
In fact, no matter which encoder, it can't change the problem of process. The coding process is as follows:
- Create encoder
- Configure encoder
- Start coding
- Reset encoder
- Destroy encoder
According to the above process, whether Android encoder, soft encoder or other encoders, it is basically the same. As long as we can find the relevant process code, we can complete the relevant process code.
Create encoder
/*! * @brief Create VTB encoder function prototype * @param allocator Specifies the associated memory request function * @param width Coded width * @param height Coded high * @param codecType The encoding type can be AVC (kcmvideocodectype)_ H264), or hevc (kcmvideocodectype)_ HEVC) * @param encoderSpecification * @param compressedDataAllocator Used to specify whether to use iOSSurface and the input image type. Refer to the previous kCVPixelFormatType * @param outputCallback Set the callback function to accept the encoded data source. * @param compressionSessionOut Encoder pointer */ VT_EXPORT OSStatus VTCompressionSessionCreate( CM_NULLABLE CFAllocatorRef allocator, int32_t width, int32_t height, CMVideoCodecType codecType, CM_NULLABLE CFDictionaryRef encoderSpecification, CM_NULLABLE CFDictionaryRef sourceImageBufferAttributes, CM_NULLABLE CFAllocatorRef compressedDataAllocator, CM_NULLABLE VTCompressionOutputCallback outputCallback, void * CM_NULLABLE outputCallbackRefCon, CM_RETURNS_RETAINED_PARAMETER CM_NULLABLE VTCompressionSessionRef * CM_NONNULL compressionSessionOut) API_AVAILABLE(macosx(10.8), ios(8.0), tvos(10.2));
The above is the creation function prototype of the encoder. For how to use it, please refer to the createSession function in LWVideoEncoder.m. the following is a specific code example:
void compressSessionOutputCallback(void *opaque, void *sourceFrameRef, OSStatus compressStatus, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuf) { } // ...... long pixelBufferType = self.config.pixelBufferType; CFDictionaryRef ioSurfaceValue = CFDictionaryCreate(kCFAllocatorDefault, nil, nil, 0, &kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks); CFTypeRef pixelBufferFormatValue = (__bridge CFTypeRef)@(pixelBufferType); CFTypeRef keys[3] = { kCVPixelBufferOpenGLESCompatibilityKey, kCVPixelBufferIOSurfacePropertiesKey, kCVPixelBufferPixelFormatTypeKey }; CFTypeRef values[3] = { kCFBooleanTrue, ioSurfaceValue, pixelBufferFormatValue }; CFDictionaryRef sourceAttributes = CFDictionaryCreate(kCFAllocatorDefault, keys, values, 3, &kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks); if (ioSurfaceValue) { CFRelease(ioSurfaceValue); ioSurfaceValue = nil; } status = VTCompressionSessionCreate(nil, self.config.width, self.config.height, self.config.codecType, 0, sourceAttributes, 0, compressSessionOutputCallback, (__bridge void * _Nullable)(self), &_encoderSession);
Configure encoder
Configuring the encoder is the main task of the encoder. Of course, some developers need to modify and optimize the encoder kernel. Of course, as the development of the end-to-end coding SDK, it is only the debugging parameters. Since it comes to debugging parameters, I have to list all the parameters of iOS hard encoder, which can be learned later.
Configuration variable | meaning |
---|---|
kVTCompressionPropertyKey_RealTime | Real time coding, 1 is on, 0 is off. When on, the encoder will give priority to speed rather than image quality. In this mode, there may be low and medium bit rate mosaic problems, but the coding speed is a little higher than that when off. It is commonly used in the live broadcasting industry with low delay requirements. When off, the speed will be limited, but the image quality is met, which can alleviate the mosaic problem of low and medium bit rate and provide image quality, which is commonly used for end-to-end transcoding. |
kVTCompressionPropertyKey_AllowFrameReordering | Whether to allow B-frame coding. If it is enabled, it is allowed; otherwise, it is turned off |
kVTCompressionPropertyKey_AllowTemporalCompression | Whether compression is allowed. The default is yes. P and B frames can be encoded. Otherwise, only I frames are encoded |
kVTCompressionPropertyKey_MaxKeyFrameInterval | It can be approximately regarded as GOP, that is, how many frames come out after an I frame |
kVTCompressionPropertyKey_AllowOpenGOP | Open GOP, allowing forward reference I frame during encoding |
kVTCompressionPropertyKey_NumberOfPendingFrames | Set the size of the buffer queue. If it is set to 1, each frame will be output in real time |
kVTCompressionPropertyKey_ProfileLevel | Set the profile and level of the session, which are generally set to kVTProfileLevel_H264_Baseline_AutoLevel, kVTProfileLevel_H264_Main_AutoLevel, kVTProfileLevel_H264_High_AutoLevel, kVTProfileLevel_HEVC_Main_AutoLevel |
kVTCompressionPropertyKey_ExpectedFrameRate | The expected frame rate is actually useless. The output is set according to the set dts/pts |
kVTEncodeFrameOptionKey_ForceKeyFrame | Force to I frame |
kVTCompressionPropertyKey_AverageBitRate | Coding rate configuration |
The above is the relevant configuration used at present. Here are some simple example codes:
// Configure cfboolean ref VTSessionSetProperty(_encoderSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue); // Configuration number VTSessionSetProperty(_encoderSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(value));
Start coding
CMTime pts = CMSampleBufferGetPresentationTimeStamp(sampleBuffer); CMTime duration = CMSampleBufferGetDuration(sampleBuffer); CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer); VTEncodeInfoFlags infoFlags = 0; // Parameter 1 specifies the encoder session. // Parameter 2: specify the PixelBuffer to be encoded. // Parameter 3 specifies the related pts, which affects the image quality + frame rate, and is brought out by the callback data SampleBuffer. // Parameter 4: specify the relevant duration. // Parameter 5, you can specify whether to output keyframes. // Parameter 6: transparent data transmission. // Parameter 7: a reference to info. The callback will prompt whether the frame is discarded or waiting for synchronization. OSStatus status = VTCompressionSessionEncodeFrame(_encoderSession, pixelBuffer, pts, duration, nil, sourceFrameRef, &infoFlags); // If not, judge whether the current session has expired // Due to the background cutting, iOS will reset all session s to invalid. At this time, you need to destroy and recreate the encoder. if (status != kCVReturnSuccess) { if (status == kVTInvalidSessionErr) { [self destroySession]; [self createSession:self.config]; } return kLWVideoEncodeStatus_Err_Encode; }
In this way, the CMSampleBufferRef obtained from the collection can be used directly. After coding, we can get the corresponding output in compressSessionOutputCallback. At this point, the hard coded process runs through.
Reset encoder
This piece is commonly used. It should be used to set the bit rate. You can also see from the project:
// Dynamic setting of coding rate - (void)setBitrate:(int)bitrate { bitrate = bitrate * 1024; OSType status = VTSessionSetProperty(_encoderSession, kVTCompressionPropertyKey_AverageBitRate, (__bridge CFTypeRef)@(bitrate)); if (status != kCVReturnSuccess) { NSLog(@"%s:%d error with %d", __func__, __LINE__, status); } }
Otherwise, hard coding can only reset relevant properties by restarting the encoder.
Destroy encoder
- (void)destroySession { if (_encoderSession) { VTCompressionSessionCompleteFrames(_encoderSession, kCMTimeInvalid); VTCompressionSessionInvalidate(_encoderSession); CFRelease(_encoderSession); _encoderSession = nil; } NSLog(@"%s:%d", __func__, __LINE__); }
summary
So far, all the knowledge related to iOS hard coding has been sorted out. Let's do it. In the next issue, we will explain how to extract the output data and convert it into AVC or HEVC. Interested partners can also directly consult the data first.