iOS audio and video development II Implementation of iOS hard coding


In the previous chapter, we introduced the details of iOS acquisition. This time, let me introduce the knowledge of iOS hard coding.

First of all, why coding is needed? Last time, we mentioned something, CMSampleBuffer, which can be used to encapsulate ImageBuffer and store raw stream data. It is a general structure. I'll first introduce what this is, and then I'll understand why it needs coding.

First acquaintance with CVPixelBufferRef

In our usual color world, we all know RGB three primary colors. Using the mixture of these three colors can form the vast majority of colors in the world.
In the computer, we call this format kCVPixelFormatType_32RGBA, that is, every 32-bit data contains 8-bit R, 8-bit G, 8-bit B and 8-bit A. Usually, we use another one instead of this one, namely kCVPixelFormatType_32BGRA.

Here, we can understand that if RGBA is used to represent a graph, a pixel contains 4 bytes, and each byte represents the primary color of a word bit. If we need to watch a 60 second 30fps720p video, we need 720 * 1280 * 4 * 60 * 30 = 6635520000b data, which is equivalent to 791MB data. I think most people's network speed may not reach 10MB/s, not to mention 791MB/s. if we don't go through video coding, we can't reproduce and watch the video, let alone live broadcast.

How is this data stored in iOS? That object is CVPixelBufferRef. CVPixelBufferRef is actually the pixel layout of a picture. It can be one of the following types:

image typemeaning
kCVPixelFormatType_32RGBARGBA32 bit, layout RGBA
kCVPixelFormatType_32BGRABGRA32 is and the layout is BGRA
kCVPixelFormatType_420YpCbCr8Planar / kCVPixelFormatType_420YpCbCr8PlanarFullRangeI420 data arrangement: the front height * stripes [0] is the Y component, the middle height * stripes [1] / 2 is the U component, and the rear data is the V component
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRangeNV12 data arrangement, with the layout of height * stripes [0] as Y component and height * stripes [0] as UV component, is a commonly used type, which is used in coding, decoding and rendering. The UV data range is (luma = [16235] chroma = [16240])
kCVPixelFormatType_420YpCbCr8BiPlanarFullRangeNV12 data arrangement, with the layout of height * stripes [0] as Y component and height * stripes [0] as UV component, is a commonly used type, which is used in coding, decoding and rendering. UV data range is (luma = [0255] chroma = [0255])
kCVPixelFormatType_OneComponent8Gray component, i.e. Y component, is often used as input source for image quality enhancement on the terminal.
kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange / kCVPixelFormatType_420YpCbCr10BiPlanarFullRangeHDR10 related data source. Later, we will explain how to decode this data source.

We know what CVPixelBufferRef is. How do we create a CVPixelBuffer? Let's look at the relevant function prototypes

 * @param CFAllocatorRef The memory application function can use the default kcunallocatordefault.
 * @param width The width of the canvas you want to create.
 * @param height The height of the canvas you want to create.
 * @param pixelFormatType The type of canvas can be one of the several mentioned above or defined.
 * @param pixelBufferAttributes Properties of canvas, you can specify iOSSurface or OpenGL properties.
 * @param pixelBufferOut Reference pointer to the canvas instance created.
CV_EXPORT CVReturn CVPixelBufferCreate(
    CFAllocatorRef CV_NULLABLE allocator,
    size_t width,
    size_t height,
    OSType pixelFormatType,
    CFDictionaryRef CV_NULLABLE pixelBufferAttributes,

Example code:

int width = 1280;
int height = 720;
OSType format = kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange;

const void *keys[] = {
const void *values[] = {
    (__bridge void *)[NSNumber numberWithBool:YES],
    (__bridge void *)[NSDictionary dictionary]
CFDictionaryRef attributes = CFDictionaryCreate(NULL, keys, values, 2, NULL, NULL);
CVPixelBufferRef pixelBuffer = nil;
CVReturn ret = CVPixelBufferCreate(kCFAllocatorDefault, 

We can already create CVPixelBufferRef ourselves, year ~ however, it seems that it has nothing to do with the topic of this chapter.
In fact, it is not. In audio and video, it is often necessary to cut or beautify the image, and the image needs to be processed. An in-depth understanding of CVPixelBufferRef is helpful for us to deal with such an image source. Next, let's access the source data related to CVPixelBufferRef.
Function prototype:

 * @brief Gets the memory pointer offset of Buffer for accessing memory.
 * @param pixelBuffer buffer to access memory
 * @param planeIndex The component of the buffer that needs to access memory.
CV_EXPORT void * CV_NULLABLE CVPixelBufferGetBaseAddressOfPlane(CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0);

 * @brief It is used to obtain the number of elements in a row of pixels, similar to stripe, because the underlying layer will use 32-bit or 64 bit to align the data.
 * @param pixelBuffer buffer to access
 * @param planeIndex Which component to access
CV_EXPORT size_t CVPixelBufferGetBytesPerRowOfPlane( CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0);

 * @brief Gets the width of a component in the Buffer
CV_EXPORT size_t CVPixelBufferGetWidthOfPlane( CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0);

 * @brief Gets the height of a component in the Buffer
CV_EXPORT size_t CVPixelBufferGetHeightOfPlane( CVPixelBufferRef CV_NONNULL pixelBuffer, size_t planeIndex ) __OSX_AVAILABLE_STARTING(__MAC_10_4,__IPHONE_4_0);

Example code:

// Get the pointer of the nth component, similar to YUV420. The 0th component is Y, the 1st component is U, and the 2nd component is V
void* address = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, n);
size_t bytes = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, n);
size_t height = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, n);
for (size_t i = 0; i < (n == 0 ? height : height / 2); ++i) {
    for (size_t j = 0; j < bytes; ++j) {
        // todo update address

As for CVPixelBufferRef, it is originally a c-layer pointer data. It does not have the automatic memory recovery mechanism of oc. It needs to release the memory itself. The code for releasing the memory is as follows:


So far, you have preliminarily known how to generate CVPixelBufferRef, how to operate CVPixelBufferRef, and how to destroy CVPixelBufferRef. This is very important for our subsequent access to x264 soft encoder.


Now that we understand how iOS stores images, let's talk about iOS's hard encoder.
In fact, no matter which encoder, it can't change the problem of process. The coding process is as follows:

  • Create encoder
  • Configure encoder
  • Start coding
  • Reset encoder
  • Destroy encoder

According to the above process, whether Android encoder, soft encoder or other encoders, it is basically the same. As long as we can find the relevant process code, we can complete the relevant process code.

Create encoder

 * @brief Create VTB encoder function prototype
 * @param allocator Specifies the associated memory request function
 * @param width Coded width
 * @param height Coded high
 * @param codecType The encoding type can be AVC (kcmvideocodectype)_ H264), or hevc (kcmvideocodectype)_ HEVC)
 * @param encoderSpecification
 * @param compressedDataAllocator Used to specify whether to use iOSSurface and the input image type. Refer to the previous kCVPixelFormatType
 * @param outputCallback Set the callback function to accept the encoded data source.
 * @param compressionSessionOut Encoder pointer
	CM_NULLABLE CFAllocatorRef							allocator,
	int32_t												width,
	int32_t												height,
	CMVideoCodecType									codecType,
	CM_NULLABLE CFDictionaryRef							encoderSpecification,
	CM_NULLABLE CFDictionaryRef							sourceImageBufferAttributes,
	CM_NULLABLE CFAllocatorRef							compressedDataAllocator,
	CM_NULLABLE VTCompressionOutputCallback				outputCallback,
	void * CM_NULLABLE									outputCallbackRefCon,
	CM_RETURNS_RETAINED_PARAMETER CM_NULLABLE VTCompressionSessionRef * CM_NONNULL compressionSessionOut) API_AVAILABLE(macosx(10.8), ios(8.0), tvos(10.2));

The above is the creation function prototype of the encoder. For how to use it, please refer to the createSession function in LWVideoEncoder.m. the following is a specific code example:

void compressSessionOutputCallback(void *opaque,
                                   void *sourceFrameRef,
                                   OSStatus compressStatus,
                                   VTEncodeInfoFlags infoFlags,
                                   CMSampleBufferRef sampleBuf) {

// ......
long pixelBufferType = self.config.pixelBufferType;
CFDictionaryRef ioSurfaceValue = CFDictionaryCreate(kCFAllocatorDefault,
CFTypeRef pixelBufferFormatValue = (__bridge CFTypeRef)@(pixelBufferType);
CFTypeRef keys[3] = {
    kCVPixelBufferOpenGLESCompatibilityKey, kCVPixelBufferIOSurfacePropertiesKey, kCVPixelBufferPixelFormatTypeKey
CFTypeRef values[3] = {
    kCFBooleanTrue, ioSurfaceValue, pixelBufferFormatValue
CFDictionaryRef sourceAttributes = CFDictionaryCreate(kCFAllocatorDefault, keys, values, 3, &kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks);
if (ioSurfaceValue) {
    ioSurfaceValue = nil;

status = VTCompressionSessionCreate(nil,
                                    (__bridge void * _Nullable)(self),

Configure encoder

Configuring the encoder is the main task of the encoder. Of course, some developers need to modify and optimize the encoder kernel. Of course, as the development of the end-to-end coding SDK, it is only the debugging parameters. Since it comes to debugging parameters, I have to list all the parameters of iOS hard encoder, which can be learned later.

Configuration variablemeaning
kVTCompressionPropertyKey_RealTimeReal time coding, 1 is on, 0 is off.
When on, the encoder will give priority to speed rather than image quality. In this mode, there may be low and medium bit rate mosaic problems, but the coding speed is a little higher than that when off. It is commonly used in the live broadcasting industry with low delay requirements.
When off, the speed will be limited, but the image quality is met, which can alleviate the mosaic problem of low and medium bit rate and provide image quality, which is commonly used for end-to-end transcoding.
kVTCompressionPropertyKey_AllowFrameReorderingWhether to allow B-frame coding. If it is enabled, it is allowed; otherwise, it is turned off
kVTCompressionPropertyKey_AllowTemporalCompressionWhether compression is allowed. The default is yes. P and B frames can be encoded. Otherwise, only I frames are encoded
kVTCompressionPropertyKey_MaxKeyFrameIntervalIt can be approximately regarded as GOP, that is, how many frames come out after an I frame
kVTCompressionPropertyKey_AllowOpenGOPOpen GOP, allowing forward reference I frame during encoding
kVTCompressionPropertyKey_NumberOfPendingFramesSet the size of the buffer queue. If it is set to 1, each frame will be output in real time
kVTCompressionPropertyKey_ProfileLevelSet the profile and level of the session, which are generally set to kVTProfileLevel_H264_Baseline_AutoLevel, kVTProfileLevel_H264_Main_AutoLevel, kVTProfileLevel_H264_High_AutoLevel, kVTProfileLevel_HEVC_Main_AutoLevel
kVTCompressionPropertyKey_ExpectedFrameRateThe expected frame rate is actually useless. The output is set according to the set dts/pts
kVTEncodeFrameOptionKey_ForceKeyFrameForce to I frame
kVTCompressionPropertyKey_AverageBitRateCoding rate configuration

The above is the relevant configuration used at present. Here are some simple example codes:

// Configure cfboolean ref
VTSessionSetProperty(_encoderSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);

// Configuration number
VTSessionSetProperty(_encoderSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(value));

Start coding

CMTime pts = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
CMTime duration = CMSampleBufferGetDuration(sampleBuffer);
CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
VTEncodeInfoFlags infoFlags = 0;
// Parameter 1 specifies the encoder session.
// Parameter 2: specify the PixelBuffer to be encoded.
// Parameter 3 specifies the related pts, which affects the image quality + frame rate, and is brought out by the callback data SampleBuffer.
// Parameter 4: specify the relevant duration.
// Parameter 5, you can specify whether to output keyframes.
// Parameter 6: transparent data transmission.
// Parameter 7: a reference to info. The callback will prompt whether the frame is discarded or waiting for synchronization.
OSStatus status = VTCompressionSessionEncodeFrame(_encoderSession, pixelBuffer, pts, duration, nil, sourceFrameRef, &infoFlags);

// If not, judge whether the current session has expired
// Due to the background cutting, iOS will reset all session s to invalid. At this time, you need to destroy and recreate the encoder.
if (status != kCVReturnSuccess) {
    if (status == kVTInvalidSessionErr) {
        [self destroySession];
        [self createSession:self.config];
    return kLWVideoEncodeStatus_Err_Encode;

In this way, the CMSampleBufferRef obtained from the collection can be used directly. After coding, we can get the corresponding output in compressSessionOutputCallback. At this point, the hard coded process runs through.

Reset encoder

This piece is commonly used. It should be used to set the bit rate. You can also see from the project:

// Dynamic setting of coding rate
- (void)setBitrate:(int)bitrate {
    bitrate = bitrate * 1024;
    OSType status = VTSessionSetProperty(_encoderSession, kVTCompressionPropertyKey_AverageBitRate, (__bridge CFTypeRef)@(bitrate));
    if (status != kCVReturnSuccess) {
        NSLog(@"%s:%d error with %d", __func__, __LINE__, status);

Otherwise, hard coding can only reset relevant properties by restarting the encoder.

Destroy encoder

- (void)destroySession {
    if (_encoderSession) {
        VTCompressionSessionCompleteFrames(_encoderSession, kCMTimeInvalid);
        _encoderSession = nil;
    NSLog(@"%s:%d", __func__, __LINE__);


So far, all the knowledge related to iOS hard coding has been sorted out. Let's do it. In the next issue, we will explain how to extract the output data and convert it into AVC or HEVC. Interested partners can also directly consult the data first.

Relevant codes can be queried here

Keywords: iOS

Added by rtsanderson on Fri, 03 Sep 2021 07:37:49 +0300