An occasional MP4 recording problem solving process record

preface

LG found an occasional recording problem that must be solved in order to go online. Here is a record of the problem-solving process.

The phenomenon of the problem is that the recorded video can't be played occasionally, and some mobile phones can play that video, and some can't play it. It's very strange.

Solution ideas

Due to the problem of playing with MediaPlayer, it can't be played on Pixel 3 (Android 11) compiled by myself. Then debug and check the error on it. After finding the error, you can locate the problem when parsing the specific piece of data in the MP4 file. Then look at the code that generates this data, read and understand the relevant logic, and see what went wrong. Just think about it for a while and start acting.

1. Locate which piece of data in MP4 has a problem

Since there are still a lot of MP4 data, it is unrealistic to check one by one. So locate the problem according to the error log.

a. View adb error log

$ adb logcat *:E 

02-24 17:59:30.569   352   352 E Utils   : did not find width and/or height
02-24 17:59:30.570   352   352 E Utils   : did not find width and/or height
02-24 17:59:30.583   354  1906 E Utils   : b/23680780
02-24 17:59:30.584  1878  1896 E MediaPlayerNative: error (1, -22)
02-24 17:59:30.634  1878  1878 E MediaPlayer: Error (1,-22)

Errors found in E utils: B / 23680780

According to the process number 354, it is found that it is a mediaserver process

$ adb shell ps | grep 354
media           354      1   93744  26112 binder_thread_read  0 S mediaserver

b. Locate the source location of AOSP error

Search b/23680780 in AOSP source code and locate the source file at / home / Kevin / extraspace / AOSP / frameworks / AV / media / libstagefright / utils cpp

This log has been printed in several places, so there's nothing to say. Set all breakpoints

Then debug. Click to play MP4 with problems and find that the breakpoint stops at the place where hvcc is parsed

c. Print hvcc in MP4 with gdb

In fact, you can also view it through MP4 analysis tool without gdb printing, but I'm debugging here, so I can output it directly with gdb. As for MP4 analysis tools, I still use a lot. Let me talk about Linux. Recommended use MediaParser , there is a problem with the original compilation on qt6. I fixed it. If necessary, just download and compile it yourself.

All right, let's get back to the point. For the convenience of follow-up, it is called:

good indicates the hvcc of a normal mp4, and bad indicates the hvcc of a problematic mp4

Print out the two for subsequent comparison.

(gdb) x/110x data
good hvcC

0xee100380:	0x01	0x01	0x40	0x00	0x00	0x00	0x80	0x00
0xee100388:	0x00	0x00	0x00	0x00	0x7b	0xf0	0x00	0xfc
														(numOfArrays==3)
0xee100390:	0xfd	0xf8	0xf8	0x00	0x00	0x0f	0x03	0x20
0xee100398:	0x00	0x01	0x00	0x17	0x40	0x01	0x0c	0x01

bad hvcC

0xea580e10:	0x01	0x00	0x01	0x03	0x00	0x00	0x00	0x18
0xea580e18:	0x00	0x10	0x00	0x00	0x2d	0x00	0x00	0x00
														(numOfArrays==255 overflow)
0xea580e20:	0xff	0xff	0xff	0xff	0xff	0xff	0xff	0x20
0xea580e28:	0x00	0x01	0x00	0x18	0x40	0x01	0x0c	0x01

bad2 hvcC(This is a problem that reappears later MP4 (file)
0xe6d40c10:	0x01	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xe6d40c18:	0x00	0x00	0x00	0x00	0x00	0x49	0x00	0xe7
														(numOfArrays==0)
0xe6d40c20:	0xff	0xa4	0x00	0x00	0x00	0x00	0x00	0x20
0xe6d40c28:	0x00	0x01	0x00	0x18	0x40	0x01	0x0c	0x01

d. Read hvcc according to ISO/IEC 14496-15

In this comparison, bad The hvcc of MP4 does have a problem.

This is the corresponding data composition in the ISO document

aligned(8) class HEVCDecoderConfigurationRecord { 
    unsigned int(8) configurationVersion = 1; 

    //1 byte (2nd byte)
    unsigned int(2) general_profile_space; 
    unsigned int(1) general_tier_flag; 
    unsigned int(5) general_profile_idc; 

    //4 bytes (3rd to 6th bytes)
    unsigned int(32) general_profile_compatibility_flags; 

    //6 byte s (7th to 12th bytes)
    unsigned int(48) general_constraint_indicator_flags; 

    //(13th byte)
    unsigned int(8) general_level_idc; 

    //(14th to 15th byte s)
    bit(4) reserved = '1111'b; 
    unsigned int(12) min_spatial_segmentation_idc;

    //(16th byte)
    bit(6) reserved = '111111'b; 
    unsigned int(2) parallelismType; 

    //(17th byte)
    bit(6) reserved = '111111'b; 
    unsigned int(2) chroISO/IEC 23008-2 ma_format_idc; 

    //(18th byte)
    bit(5) reserved = '11111'b; 
    unsigned int(3) bit_depth_luma_minus8; 

    //(19th byte)
    bit(5) reserved = '11111'b; 
    unsigned int(3) bit_depth_chroma_minus8; 

    //(20th to 21st byte s)
    bit(16) avgFrameRate; 

    //(22nd byte)
    bit(2) constantFrameRate; 
    bit(3) numTemporalLayers; 
    bit(1) temporalIdNested; 
    unsigned int(2) lengthSizeMinusOne; 

    //(23rd byte)
    unsigned int(8) numOfArrays; 

    for (j=0; j < numOfArrays; j++) { 
        //1 byte
        bit(1) array_completeness; 
        unsigned int(1) reserved = 0; 
        unsigned int(6) NAL_unit_type; 

        //2 byte s
        unsigned int(16) numNalus; 

        for (i=0; i< numNalus; i++) { 
            //2 byte s
            unsigned int(16) nalUnitLength; 

            bit(8*nalUnitLength) nalUnit; 
        }
    } 
}

According to the ISO document, the hvcc data is parsed and it is found that bad is abnormal between [general_profile_space, numOfArrays].

numOfArrays should be 3 (vps, sps, pps, 3 in total)

good

0xee101350:	0x01	0x01	0x40	0x00	0x00	0x00	0x80	0x00
0xee101358:	0x00	0x00	0x00	0x00	0x7b	0xf0	0x00	0xfc
													  (numOfArrays)(32,vps)
0xee101360:	0xfd	0xf8	0xf8	0x00	0x00	0x0f	0x03	0x20
           (numNalus == 1) (nalUnitLength==23)(nalUnit
0xee101368:	0x00	0x01	0x00	0x17	0x40	0x01	0x0c	0x01
0xee101370:	0xff	0xff	0x01	0x40	0x00	0x00	0x03	0x00
0xee101378:	0x80	0x00	0x00	0x03	0x00	0x00	0x03	0x00
								)(33,sps)  (numNalus == 1)(nalUnitLength==33)
0xee101380:	0x7b	0xac	0x09	0x21	0x00	0x01	0x00	0x21
			(nalUnit
0xee101388:	0x42	0x01	0x01	0x01	0x40	0x00	0x00	0x03
0xee101390:	0x00	0x80	0x00	0x00	0x03	0x00	0x00	0x03
0xee101398:	0x00	0x7b	0xa0	0x02	0x80	0x80	0x2d	0x16
0xee1013a0:	0x5a	0xe4	0xb2	0xb6	0x6b	0x95	0x44	0xd8
				) (34,pps)(numNalus == 1)(nalUnitLength==8)(nalUnit
0xee1013a8:	0x02	0x22	0x00	0x01	0x00	0x3d	0x44	0x01
														)
0xee1013b0:	0xc0	0xe3	0x0f	0x03	0x32	0x40


bad
				  (Abnormal data
0xea580a90:	0x01	0x00	0x01	0x03	0x00	0x00	0x00	0x18
0xea580a98:	0x00	0x10	0x00	0x00	0x2d	0x00	0x00	0x00
														 (numOfArrays))(32,vps)
0xea580aa0:	0xff	0xff	0xff	0xff	0xff	0xff	0xff	0x20
		   (numNalus==1)(nalUnitLength==24)(nalUnit
0xea580aa8:	0x00	0x01	0x00	0x18	0x40	0x01	0x0c	0x01
0xea580ab0:	0xff	0xff	0x01	0x60	0x00	0x00	0x03	0x00
0xea580ab8:	0x00	0x03	0x00	0x00	0x03	0x00	0x00	0x03
										)  (33,sps)(numNalus == 1)(nalUnitLength==41
0xea580ac0:	0x00	0x96	0xac	0x09	0x21	0x00	0x01	0x00
				) (nalUnit
0xea580ac8:	0x29	0x42	0x01	0x01	0x01	0x60	0x00	0x00
0xea580ad0:	0x03	0x00	0x00	0x03	0x00	0x00	0x03	0x00
0xea580ad8:	0x00	0x03	0x00	0x96	0xa0	0x05	0x02	0x01
0xea580ae0:	0x69	0x63	0x6b	0x92	0x4c	0x9a	0xe5	0x9c
0xea580ae8:	0x02	0x00	0x00	0x07	0xd2	0x00	0x00	0x9c
						) (34,pps) (numNalus==1)(nalUnitLength==7)(nalUnit
0xea580af0:	0x68	0x10	0x22	0x00	0x01	0x00	0x07	0x44
														)
0xea580af8:	0x01	0xe0	0x76	0xb0	0x26	0x40

2. Locate the problem in the code

It took a lot of time to locate the code problem. The specific source code analysis is not carried out here, which is of little reference significance. The process can be summarized as follows

  1. Read the relevant code, understand how numOfArrays is generated and written into MP4, and find that there is no problem

  2. Suspected data problem. Let the test reproduce, get the original data, and find that the generated MP4 can be played. (it took almost 2 days for the test to reappear. It's hard)

  3. Suspected memory related error. (I didn't expect this at first. Later, I thought while reading the source code. Finally, I thought it might be this problem)

    1. Read uninitialized variables
    2. Wild pointer / dangling pointer read / write
    3. Bad pointer type conversion
    4. Read / write from the tail of the allocated memory block (array and other types of read / write are out of bounds)
    5. Use malloc/new/new [] and free/delete/delete [] mismatches
    6. wait
  4. So first try to use valgrind to check and find that the uninitialized variables are read, and this is where hvcc generates the assignment!!!

    The above circled variable and another variable (no screenshot) are not initialized. (in fact, the last variable circled here is wrong, but it is not the cause of the problem)

  5. It should be here, but what kind of value will cause the problem? After carefully reading the code and testing and verification. Is it uint8 m_ Problems occur when spscount = 255 (0xff)

summary

It takes a lot of time. The reason is that I didn't think of it at the first time. It may be the problem of reading uninitialized variables and lack of experience. However, after the investigation, the real cause of the error was finally found.

Through the solution of this problem, combined with the experience of solving occasional problems in the past. The occasional problem is either caused by some special input, or there is a problem with the state of the process at the time of the occasional problem (that is, the dependent related variables), resulting in the program not executing as it seems.

Generally speaking, if you encounter problems and constantly eliminate possible causes, it is not far from the real cause.

Added by le007 on Thu, 24 Feb 2022 17:03:09 +0200