Practical solution to obtain mp4 format file information, duration, etc. and exploration of mp4 storage structure

1, Know mp4

The format of MP4 file is mainly defined by MPEG-4 Part 12 and MPEG-4 Part 14. Among them, MPEG-4 Part 12 defines the ISO basic media file format to store time-based media content. MPEG-4 Part 14 actually defines the MP4 file format, which is extended on the basis of MPEG-4 Part 12.

Composition of MP4

MP4 file is composed of multiple boxes. Different boxes store different information and adopt tree structure

box main types:

  • 1.ftyp: File Type Box, which describes the MP4 specification and version that the file complies with;
  • 2.moov: Movie Box, the metadata information of the media. There is only one.
  • 3.mdat: Media Data Box, which stores the actual media data. Generally, there are multiple media data boxes;

Introduction to BOX

  • is mainly composed of two parts: box header and box body.
    • box header: metadata of box, such as box type and box size.
    • box body: the data part of the box. The content actually stored is related to the box type, such as the media data stored in the body part of the mdat.
  • 2. In box header, only type and size are required fields. When size==0, the largesize field exists. In some boxes, there are also version and flags fields. Such boxes are called full boxes. When other boxes are nested in the box body, such boxes are called container box es.

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-nendf9xv-1645614591154)( )]

The fields are defined as follows:

  • Type: box type, including "predefined type" and "custom extension type", accounting for 4 bytes;
    • Predefined types: such as ftyp, moov, mdat and other predefined types;
    • User defined extension type: if type==uuid, it means user-defined extension type. size (or largesize), followed by 16 bytes, is an extended_type value
  • Size: the size of the entire box including the box header, in bytes. When the size is 0 or 1, special processing is required:
    • Size equals 0: the size of the box is determined by the subsequent largesize (generally, only the mdat box loaded with media data will use largesize);
    • size equals 1: the current box is the last box of the file, which is usually contained in mdat box;
  • Enlarge: the size of the box, accounting for 8 bytes;
  • extended_type: user defined extension type, accounting for 16 bytes;

Movie Box, which stores the metadata of mp4, is usually located at the beginning of the mp4 file.

The two most important box es in moov are mvhd and trak:

  • mvhd: Movie Header Box, the overall information of mp4 file, such as creation time, file duration, etc;
  • Track: Track Box. An mp4 can contain one or more tracks (such as video track and audio track). Track related information is in the track. trak is a container box, including at least two boxes, tkhd and mdia;

The meaning of the field is as follows:

  • creation_time: file creation time;
  • modification_time: file modification time;
  • Timescale: the time unit (integer) contained in one second. For example, if timescale is equal to 1000, then a second contains 1000 time units (the time of track and so on should be converted by this. For example, if the duration of track is 10000, then the actual duration of track is 10000 / 1000 = 10s);
  • Duration: the duration of the movie (integer), which is derived from the track information in the file, and is equal to the duration of the track with the longest time;
  • Rate: recommended playback rate, 32-bit integer. The high 16 bits and the low 16 bits represent the integer part and the decimal part ([16.16]) respectively. For example, 0x00010000 represents 1.0 and the normal playback speed;
  • Volume: play volume, 16 bit integer. The upper 8 bits and the lower 8 bits respectively represent the integer part and the decimal part ([8.8]). For example, 0x01 00 represents 1.0, that is, the maximum volume;
  • Matrix: the conversion matrix of video, which is generally negligible;
  • next_track_ID: 32-bit integer, non-zero, generally negligible. When you want to add a new track to this movie, the track id you can use must be larger than the currently used track id. That is, when adding a new track, you need to traverse all tracks and confirm the available track id;

2, Obtain mp4 duration and meta information through tools

Analyze the video through ffprobe

Implementation code

func ParseEndTime(dir, filename string) (int, error) {

	arg := []string{"-show_format", dir + filename}
	cmd := exec.Command("ffprobe", arg...)
	if stout, err := cmd.CombinedOutput(); err != nil {
		return 0, err
	} else {
		exec.Command(`printf "\033[2J\033[3J\033[1;1H"`)
		stoutSplit := strings.Split(string(stout), "duration=")
		if len(stoutSplit) == 2 {
			endTime := strings.Replace(strings.Split(stoutSplit[1], "size")[0], "\n", "", -1)
			return stringToInt(endTime)
		parseErr := errors.New("this video not exist duration !")
		return 0, parseErr


3, Parsing mp4 to obtain key BOX meta information

Obtain duration information through format
// GetVideoTime gets the video duration based on the file
func GetVideoTime(dir, files string) (uint32, error) {
	file, err := os.Open(dir + files)
	if err != nil {
		return 0, err
	duration, err := GetMP4Duration(file)
	if err != nil {
		return 0, err
	return duration, nil

// GetMP4Duration get video duration, in seconds
func GetMP4Duration(reader io.ReaderAt) (lengthOfTime uint32, err error) {
	var info = make([]byte, 0x10)
	var boxHeader BoxHeader
	var offset int64 = 0
	// Get moov structure offset
	for {
		_, err = reader.ReadAt(info, offset)
		if err != nil {
		boxHeader = getHeaderBoxInfo(info)
		fourccType := getFourccType(boxHeader)
		if fourccType == "moov" {
		// Some mp4 mdat are too large and need special treatment
		if fourccType == "mdat" {
			if boxHeader.Size == 1 {
				offset += int64(boxHeader.Size64)
		offset += int64(boxHeader.Size)
	// Gets the first part of the moov structure
	moovStartBytes := make([]byte, 0x100)
	_, err = reader.ReadAt(moovStartBytes, offset)
	if err != nil {
	// Define timeScale and Duration offsets
	timeScaleOffset := 0x1C
	durationOffest := 0x20
	timeScale := binary.BigEndian.Uint32(moovStartBytes[timeScaleOffset : timeScaleOffset+4])
	Duration := binary.BigEndian.Uint32(moovStartBytes[durationOffest : durationOffest+4])
	lengthOfTime = Duration / timeScale

// getHeaderBoxInfo get header information
func getHeaderBoxInfo(data []byte) (boxHeader BoxHeader) {
	buf := bytes.NewBuffer(data)
	binary.Read(buf, binary.BigEndian, &boxHeader)
	// binary.Read(buf, binary.LittleEndian, &boxHeader)

// getFourccType get header type
func getFourccType(boxHeader BoxHeader) (fourccType string) {
	fourccType = string(boxHeader.FourccType[:])

Keywords: Go s3

Added by sunnypal on Wed, 23 Feb 2022 14:20:26 +0200