[Video and Audio Data Processing] H.264 video stream analysis

0. Basic concepts

The location of the 0.1 video stream in the video player is as follows:


H.264 original code stream (also known as "naked stream") is composed of one NALU. Their structure is shown in the figure below.

0.2 to be more accurate, the original NALU unit is composed of:

[start code] + [NALU header] + [NALU payload]

[start code] takes up 3 or 4 bytes, which is 0x000001 or 0x00000001.

And [NALU header] is composed of the following:

forbidden_zero_bit(1bit) + nal_ref_idc(2bit) + nal_unit_type(5bit)

0.3 NALU type

0.3.1 forbidden_zero_bit:

The forbidden bit, initialized to 0, can be set to 1 when the network finds that the NAL unit has a bit error,
So that the receiver can correct or lose the unit.

0.3.2 nal_ref_idc:

Nal importance indicates the importance of the nal unit. The larger the value is, the more important it is. When the decoder cannot decode it,
You can lose a NALU with an importance of 0.

0.3.3 nal_unit_type:

The syntax table of NALU is as follows:

Generally, the first two nalus of H.264 are SPS and PPS, and the third is IDR. SPS, PPS and SEI are three kinds of NALU that do not belong to frame category. Their definitions are as follows:

SPS(Sequence Parameter Sets): a set of sequence parameters, which acts on a series of consecutive encoded images.
PPS(Picture Parameter Set): set of image parameters, which acts on one or more independent images in the encoded video sequence.
SEI (supplementary enhancement information): additional enhancement information, including video picture timing and other information, is generally placed before the main coding image data. In some applications, it can be omitted.
IDR (instant decoding refresh): instant decoding refresh.
HRD(Hypothetical Reference Decoder): hypothetical stream scheduler.

1. Code

Or learn the thunderobot code, I wrote a further detailed note.

extern "C"
{
#ifdef __cplusplus
#define __STDC_CONSTANT_MACROS

#endif

}
extern "C" {

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
}







typedef enum {
	NALU_TYPE_SLICE = 1,
	NALU_TYPE_DPA = 2,
	NALU_TYPE_DPB = 3,
	NALU_TYPE_DPC = 4,
	NALU_TYPE_IDR = 5,
	NALU_TYPE_SEI = 6,
	NALU_TYPE_SPS = 7,
	NALU_TYPE_PPS = 8,
	NALU_TYPE_AUD = 9,
	NALU_TYPE_EOSEQ = 10,
	NALU_TYPE_EOSTREAM = 11,
	NALU_TYPE_FILL = 12,
} NaluType;

typedef enum {
	NALU_PRIORITY_DISPOSABLE = 0,
	NALU_PRIRITY_LOW = 1,
	NALU_PRIORITY_HIGH = 2,
	NALU_PRIORITY_HIGHEST = 3
} NaluPriority;


typedef struct
{
	int startcodeprefix_len;      
	//! 4 for parameter sets and first slice in picture, 3 for everything else (suggested)
//In H264 bitstream, the start code is "0x00 0x00 0x01" or "0x00 0x00 0x01"
//startcodeprefix_len can be three bytes or four bytes

	unsigned len;         
	//! Length of the NAL unit (Excluding the start code, which does not belong to the NALU)
	unsigned max_size;            //! Nal Unit Buffer size
	int forbidden_bit;            //! should be always FALSE
//Forbidden bit_ Bit, initially 0, can be set to 1 when the network finds that there is a bit error in the NAL unit,
//So that the receiver can correct or discard the modification unit
	int nal_reference_idc;        //! NALU_PRIORITY_xxxx
//nal_reference_idc, this is an indication of the importance of the nal unit. The higher the value, the more important it is,
//When the decoder fails to decode, it can lose the NALU with the importance of 0.
	int nal_unit_type;            //! NALU_TYPE_xxxx    
	char* buf;                    //! contains the first byte followed by the EBSP
} NALU_t;

//h264bitstream is a global file pointer
FILE* h264bitstream = NULL;                //!< the bit stream file

int info2 = 0, info3 = 0;


//In H264 bitstream, the start code is "0x00 0x00 0x01" or "0x00 0x00 0x01"

//The first three bytes of data read out is 0x00 0x00 0x01, return true, otherwise return false
static int FindStartCode2(unsigned char* Buf) {
	if (Buf[0] != 0 || Buf[1] != 0 || Buf[2] != 1) return 0; //0x000001?
	else return 1;
}

//The first four bytes of data read out is 0x00 0x00 0x00 0x01, return true, otherwise return false
static int FindStartCode3(unsigned char* Buf) {
	if (Buf[0] != 0 || Buf[1] != 0 || Buf[2] != 0 || Buf[3] != 1) return 0;//0x00000001
	else return 1;
}


int GetAnnexbNALU(NALU_t* nalu) {
	int pos = 0;
	int StartCodeFound, rewind;
	unsigned char* Buf;

	//Allocate 100000 bytes of space
	if ((Buf = (unsigned char*)calloc(nalu->max_size, sizeof(char))) == NULL)
		printf("GetAnnexbNALU: Could not allocate Buf memory\n");

	//Default initialization is 3
	nalu->startcodeprefix_len = 3;

	//Read the data of the first three bytes. If it cannot be read, it indicates that the data is abnormal and returns directly.
	if (3 != fread(Buf, 1, 3, h264bitstream)) {
		free(Buf);
		return 0;
	}
	info2 = FindStartCode2(Buf);
	if (info2 != 1) { // If the first three bytes read are not 0x00 0x00 0x01, enter the judgment statement
		if (1 != fread(Buf + 3, 1, 1, h264bitstream)) { //Read another byte of data
			free(Buf);
			return 0; //This is to avoid null pointer
		}
		info3 = FindStartCode3(Buf);
		//When info3 is 1, the data of the first four bytes is 0x00 0x00 0x01
		if (info3 != 1) {
			free(Buf); //Both are not satisfied, indicating that the data is abnormal. return directly
			return -1;
		}
		else {
			pos = 4;
			nalu->startcodeprefix_len = 4;//The code can go here to show that the start code is four bytes
		}
	}
	else {
		nalu->startcodeprefix_len = 3;//Otherwise, the start code is three bytes
		pos = 3;
	}
	StartCodeFound = 0;
	info2 = 0;
	info3 = 0;
	while (!StartCodeFound) {
		if (feof(h264bitstream)) {
			nalu->len = (pos - 1) - nalu->startcodeprefix_len;
			memcpy(nalu->buf, &Buf[nalu->startcodeprefix_len], nalu->len);
			nalu->forbidden_bit = nalu->buf[0] & 0x80; //1 bit
			nalu->nal_reference_idc = nalu->buf[0] & 0x60; // 2 bit
			nalu->nal_unit_type = (nalu->buf[0]) & 0x1f;// 5 bit
			free(Buf);
			return pos - 1;
		}

		Buf[pos++] = fgetc(h264bitstream);
		info3 = FindStartCode3(&Buf[pos - 4]);
		if (info3 != 1)
			info2 = FindStartCode2(&Buf[pos - 3]);

		StartCodeFound = (info2 == 1 || info3 == 1);
	}
	// Here, we have found another start code 
	//and read length of startcode bytes more than we should
	// have.  Hence, go back in the file
	rewind = (info3 == 1) ? -4 : -3;

	//Start code is 4 bytes, backward 4 positions, start code is 3 bytes, backward 3 positions

	if (0 != fseek(h264bitstream, rewind, SEEK_CUR)) {
		//Back file pointer h264bitstream to location seek_ Where cur + rewind,
		//Note that when the pointer is offset to the file header, it does not exceed the file header and returns 0. If it exceeds the file header, the file pointer remains unchanged and returns - 1
		free(Buf);
		printf("GetAnnexbNALU: Cannot fseek in the bit stream file");
	}

	// Here the Start code, the complete NALU, and the next start code is in the Buf.  
	// The size of Buf is pos, pos+rewind are the number of bytes excluding the next
	// start code, and (pos+rewind)-startcodeprefix_len is the size of the NALU 
	// excluding the    start code

	nalu->len = (pos + rewind) - nalu->startcodeprefix_len;
	memcpy(nalu->buf, &Buf[nalu->startcodeprefix_len], nalu->len);
	//Copy the NALU data after the start code to the buff

	nalu->forbidden_bit = nalu->buf[0] & 0x80; //1 bit
	nalu->nal_reference_idc = nalu->buf[0] & 0x60; // 2 bit
	nalu->nal_unit_type = (nalu->buf[0]) & 0x1f;// 5 bit
	free(Buf);

	return (pos + rewind);
}

/**
 * Analysis H.264 Bitstream
 * @param url    Location of input H.264 bitstream file.
 */
int simplest_h264_parser(const char* url) {

	NALU_t* n;
	int buffersize = 100000;

	//FILE *myout=fopen("output_log.txt","wb+");
	FILE* myout = stdout;

	h264bitstream = fopen(url, "rb+");
	//h264bitstream is a file pointer initialized to NULL
	if (h264bitstream == NULL) {
		printf("Open file error\n");
		return 0;
	}

	n = (NALU_t*)calloc(1, sizeof(NALU_t));
	//Assign 1 sizeof (Nalu) length_ t) The first address of the space is given to n
	if (n == NULL) {
		printf("Alloc NALU Error\n");
		return 0;
	}

	//Nal unit buffer size 100000
	n->max_size = buffersize;

	// buf is a 100000 byte buffer
	n->buf = (char*)calloc(buffersize, sizeof(char));

	// The following code avoids null pointer
	if (n->buf == NULL) {
		free(n);
		printf("AllocNALU: n->buf");
		return 0;
	}

	int data_offset = 0;
	int nal_num = 0;
	printf("-----+-------- NALU Table ------+---------+\n");
	printf(" NUM |    POS  |    IDC |  TYPE |   LEN   |\n");
	printf("-----+---------+--------+-------+---------+\n");

	while (!feof(h264bitstream))
	{
		int data_lenth;
		data_lenth = GetAnnexbNALU(n);

		char type_str[20] = { 0 };
		switch (n->nal_unit_type) {
		case NALU_TYPE_SLICE:sprintf(type_str, "SLICE"); break;
		case NALU_TYPE_DPA:sprintf(type_str, "DPA"); break;
		case NALU_TYPE_DPB:sprintf(type_str, "DPB"); break;
		case NALU_TYPE_DPC:sprintf(type_str, "DPC"); break;
		case NALU_TYPE_IDR:sprintf(type_str, "IDR"); break;
		case NALU_TYPE_SEI:sprintf(type_str, "SEI"); break;
		case NALU_TYPE_SPS:sprintf(type_str, "SPS"); break;
		case NALU_TYPE_PPS:sprintf(type_str, "PPS"); break;
		case NALU_TYPE_AUD:sprintf(type_str, "AUD"); break;
		case NALU_TYPE_EOSEQ:sprintf(type_str, "EOSEQ"); break;
		case NALU_TYPE_EOSTREAM:sprintf(type_str, "EOSTREAM"); break;
		case NALU_TYPE_FILL:sprintf(type_str, "FILL"); break;
		}
		char idc_str[20] = { 0 };
		switch (n->nal_reference_idc >> 5) {
			// 0x60 corresponds to 0110 0000, and this shift 5 bit to the right is nal_reference_idc value, value range is 0-3
		case NALU_PRIORITY_DISPOSABLE:sprintf(idc_str, "DISPOS"); break;
		case NALU_PRIRITY_LOW:sprintf(idc_str, "LOW"); break;
		case NALU_PRIORITY_HIGH:sprintf(idc_str, "HIGH"); break;
		case NALU_PRIORITY_HIGHEST:sprintf(idc_str, "HIGHEST"); break;
		}

		fprintf(myout, "%5d| %8d| %7s| %6s| %8d|\n", nal_num, data_offset, idc_str, 
		type_str, 
		n->len);
		//nal_num is the length of data, data_offset is the length of the whole data, which keeps increasing,
		//idc_str indicates the importance of the nal unit. The higher the value, the more important it is,
		//When the decoder fails to decode, it can lose the NALU with the importance of 0.
		//type_str represents the type of NALU unit
		//N - > len is determined by the subfunction NALU - > len. This is the space occupied by each NALU unit (number of bytes),
		//Note that this does not contain a start code


		data_offset = data_offset + data_lenth;

		nal_num++;
	}

	//Free
	if (n) {
		if (n->buf) {
			free(n->buf);
			n->buf = NULL;
		}
		free(n);
	}
	return 0;
}




int main()
{
	simplest_h264_parser("sintel.h264");
	return 0;
}

The code can be successfully compiled on visual studio 2019.

2. Key explanation

while (!StartCodeFound) {
	if (feof(h264bitstream)) {
		nalu->len = (pos - 1) - nalu->startcodeprefix_len;
		memcpy(nalu->buf, &Buf[nalu->startcodeprefix_len], nalu->len);
		nalu->forbidden_bit = nalu->buf[0] & 0x80; //1 bit
		nalu->nal_reference_idc = nalu->buf[0] & 0x60; // 2 bit
		nalu->nal_unit_type = (nalu->buf[0]) & 0x1f;// 5 bit
		free(Buf);
		return pos - 1;
	}

	Buf[pos++] = fgetc(h264bitstream); 
	info3 = FindStartCode3(&Buf[pos - 4]);
	if (info3 != 1)
		info2 = FindStartCode2(&Buf[pos - 3]);

	StartCodeFound = (info2 == 1 || info3 == 1);
}

This code is the key point. I didn't understand it for a while. nalu->forbidden_ bit,nalu->nal_ reference_ idc,nalu->nal_ unit_ There is nothing to say about type. The information related to the name header is stored in a byte after the start code.

if (feof(h264bitstream)) {

This judgment needs to wait until the end of the file is found. After the previous general situation, the ALU header related information and the acquired code are written
Once.

Buf[pos++] = fgetc(h264bitstream); 
info3 = FindStartCode3(&Buf[pos - 4]);
if (info3 != 1)
	info2 = FindStartCode2(&Buf[pos - 3]);

StartCodeFound = (info2 == 1 || info3 == 1);

These are the key points. Our current start code is four bytes, which can be seen from the previous analysis (see the code notes I wrote for details),
At this point, before the pos runs to this code, the value is 4.

Buf[0], Buf[1], Buf[2] and Buf[3] have stored the data of four bytes of the start code respectively. The data is stored in Buf through the h264bitstream file pointer. At this time, Buf[pos++] = fgetc(h264bitstream); read the data of the next byte Buf[4].

This byte of data corresponds to the name header.

At this time, POS is running, and its value is 5. Then the data of the next four addresses after the start of & buf [pos-4] must not be the start code:

0x00 0x00 0x00 0x01, so the while (! Startcode found) {loop jump condition must not be met.

Further example understanding

// Here the Start code, the complete NALU, and the next start code is in the Buf.
// The size of Buf is pos, pos+rewind are the number of bytes excluding the next
// start code, and (pos+rewind)-startcodeprefix_len is the size of the NALU excluding the start code

Let's assume that a NALU is 99 bytes long, and the first four bytes are the start code. pos before entering the while loop
The values of are 4. Buf[0], Buf[1], Buf[2], and Buf[3] have stored the data of four bytes of start code respectively

info3 = FindStartCode3(&Buf[pos - 4]);

To make this info3 return 1, you need to find the start code before the second NALU, that is, pos-4 is Buf[102]
According to the code Buf[pos++] = fgetc(h264bitstream), pos has obtained Buf[102], Buf[103]
Buf[104], Buf[105]; note that Buf[102] is the data after the first start code and the first NALU,
These four bytes are the four bytes of the next start code.

So the code annotation says that pos is the current buf length, and pos+rewind is the length of the next start code (see the code for details)
, rewind's value is - 4, because the start code is four bytes)

pos+rewind-startcodeprefix_len is the length of NALU in addition to the start code.

3. Reference link

Thank you for reading. The reference link of this article is as follows:

  1. https://blog.csdn.net/leixiaohua1020/article/details/50534369
  2. https://www.jianshu.com/p/5ec31394649a
  3. https://github.com/leixiaohua1020/simplest_mediadata_test

Keywords: network github

Added by Termina on Wed, 24 Jun 2020 10:42:32 +0300