
Talking about some basic differences between VBR and CBR in mp3 files Part 3

For VBR encoded mp3 files, since the bit rate of each frame is not fixed, the data size of each frame is arbitrary.

Obviously, the size of the data reproduced per second is different. In this way, the duration of all the audio cannot be calculated with the above formula and other data fields are needed, which is one of the shortcomings of VBR technology: it is relatively difficult and complicated to calculate the duration of the audio.
There is another disadvantage of VBR technology. When playing an audio file, there will inevitably be an operation to jump to the position of the specified time to play (ie, the so-called seek operation). At this time, it is necessary to convert the time position of the target to the position of the file. Then jump to this file position offset to read and decode. If it is a download and play network playback mode, you must first calculate the position of the file during the search operation. Jump to this position and download a paragraph before continuing to play. . For CBR encoding, the conversion to file position offset is also very simple, using the following formula:
file position (byte) = target time position ( s ) * bitrate (kbps) * 1000/8 + id3v2 field size (if any)
But for VBR encoding, it is obviously impossible to use this formula to convert file position. The reason is also very simple: the bit rate of each frame is not fixed and the length of data per second is not average. Therefore, just like calculating duration, other data fields are needed.
The method to calculate the duration of the audio and implement the seek operation using VBR encoding
To solve the above two problems, VBR encoding adds some data fields. At present, there are mainly two types of VBR encoding technologies, one is the Xing specification proposed by the Xing Company, and the other is the VBRI specification of the Fraunhofer encoder. This article only presents how the Xing specification solves the audio duration computation and the implementation of the seek operation.
The main content of the Xing specification is the Xing header, which means that the first audio frame at the beginning of the VBR-encoded mp3 is not used to store specific audio data, but to store additional audio information. This information is marked with the four characters of “Xing” as the beginning of the field (some files also use the four characters of “Info” as the beginning of the Xing header).
The position of the Xing header in the first audio frame is after the standard 4-byte mp3 audio frame header. Between the table header and the Xing header, there will be a blank part with all 0 data content. This blank The length of the section is specified. After the decoder parses the frame header of the first audio frame, it skips the blank part of the specified length, and then judges whether the next content is the four characters of ‘Xing’ or ‘Info’ to judge the audio If the VBR encoding.
The length of the blank part is determined by the mpeg version and the number of channels, as shown in the following table (unit is byte):
non-mono infectious mononucleosis MPEG version
MPEG 1 18 32 (most common)
MPEG2 9 18
The following figure is an example of the field structure of the first VBR-encoded mp3 data frame:
The field structure and the content of the information stored in the Xing header are as follows:
Location (from marker ‘Xing’) longitude direction Example
0 4 VBR header tag, 4-byte ASCII characters, content is ‘Xing’ or ‘Info’ ‘Xing’
0 4 A flag indicating the specific content of the VBR header, the combination is logical OR. The area is
mandatory
. exists, excluding tags;
0x0004 – TOC index storage area set to exist;
0x0008 – Quality Indication Storage Area set to exist 0x0007 (meaning total number of frames, file length and TOC storage area are valid)
8 4 Stores the Big-Endian value of the total number of frames 7344
8 or 12 4 Stores the Big-Endian value of the file length, in Bytes 45000
8, 12, or 16 100 The TOC table, which is a byte array with a length of 100, is a positional index used for fast addressing in the file and is primarily used to resolve the implementation of the seek operation.




/CBR-vs-VBR-069b3b5e6d554d53841e7e525092d25b.jpg)
/CBR-vs-VBR-069b3b5e6d554d53841e7e525092d25b.jpg)

