
MP4 Box Introduction

A box consists of two parts: the box header and the box body.
![]()
box header: the box metadata, such as the type and size of the box.
frame body: the data part of the frame, the actual content stored is related to the frame type, such as the media data stored in the body part of mdat.
In the box header, only type and size are required fields. When size==0, there is a large field. In some boxes, there are also version fields and flags, these boxes are called Full Boxes. When other boxes are nested in the body of the box, the box is called a containing box.
box header
The fields are defined as follows:
type: frame type, including “default type” and “custom extension type”, occupying 4 bytes;
Predefined types: such as ftyp, moov, mdat, and other predefined types;
Custom Extension Type: If type==uuid, it means this is a custom extension type. size (or large size) followed by 16 bytes, the value of the custom type (extended_type)
size – The size of the entire frame, including the frame header, in bytes. When the size is 0 or 1, special handling is required:
size equals 0: the size of the frame is determined by the subsequent large size (generally only the mdat frame that loads media data will use the large size);
size equal to 1: the current frame is the last frame in the file, usually contained in the mdat frame;
largesize: the size of the box, occupying 8 bytes;
extended_type: type of custom extension, which occupies 16 bytes;
The Box pseudocode is as follows:
aligned(8) class Box (unsigned int(32) boxtype, optional unsigned int(8)[16] extended_type) {
unsigned int(32) size;
unsigned int(32) type = boxtype;
if (size==1) {
unsigned int(64) largesize;
} else if (size==0) {
// box extends to end of file
}
if (boxtype==’uuid’) {
unsigned int(8)[16] usertype = extended_type;
}
}
box body
The data body of the table, different tables contain different content, you need to refer to the definition of the specific table. Some box bodies are very simple, like ftyp. Some boxes are more complex and may have other nested boxes, like moov.
Box vs Full Box
Based on Box, the FullBox type is extended. Compared to Box, FullBox has more version fields and flags.
version: The version of the current box, ready for expansion, occupies 1 byte;
flags: flag bits, occupying 24 bits, the meaning is defined by the specific box itself;






