MP3 File Structure Analysis Part 2


Free Download Mp4Gain
picture

MP3 File Structure Analysis Part 2

mp3

Sounds in nature are very complex and waveforms are extremely complex.

Mp3

Usually we use pulse code modulation coding, that is, PCM coding. PCM converts continuously changing analog signals into digital codes through three steps of sampling, quantizing, and encoding.

u Decode:

Reverse encoding process

1.1.2 Brief introduction of MP3
The full name of MP3 is MPEG Audio Layer 3. It is an efficient computer audio coding scheme. It converts audio files into smaller files with a .mp3 extension with a higher compression ratio, essentially maintaining the sound quality of the source file. MP3 is part of the ISO/MPEG standard,

The ISO/MPEG standard describes audio compression using a high performance perceptual coding scheme. This standard has been continuously updated to meet the pursuit of “high quality and low quality”. Three audio codec schemes, MPEG Layer1, Layer2 and Layer3, have been formed, respectively, corresponding to the three sound files MP1, MP2 and MP3

MPEG (Moving Picture Experts Group) is a group of moving picture experts under ISO. The MPEG standard it specifies is widely used in various multimedia. The MPEG standard includes video and audio standards. Audio standards have developed MPEG-1, MPEG -2, MPEG-2 ACC, MPEG-4. The MPEG-1 and MPEG-2 standards use the same family of Layer1, 2, 3 audio codecs, and most MP3s use the MPEG1 standard.

MP3 audio compression consists of two parts: encoding and decoding. Encoding is the process of converting the original signal to a level signal, and decoding is the reverse process. MP3 uses the PerceptualAudio Coding distortion algorithm. The frequency range of sound perceived by the human ear is 20 Hz to 20 kHz. MP3 cuts out a lot of redundant signals and irrelevant signals. The encoder transforms the original sound into the frequency domain through a mixed filter bank and uses a psychoacoustic model. to estimate that it may be only The perceived noise level is quantized and converted to Huffman coding to form an MP3 bit stream. The decoder is much simpler, its task is to extract the sound signal from the encoded spectral line components through inverse quantization and inverse transformation.

MP3 file data consists of multiple frames, and a frame is the smallest unit of an MP3 file. Each frame, in turn, consists of a frame header, additional information, and sound data. The playback time of each frame is 0.026 seconds and its duration varies with the bit rate. Some MP3 files have extra bytes at the end that contain description information for non-audio data.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

MP3 file structure analysis

MP3 file structure analysis

MP3 FORMAT

ü ID3:

mp3 format

 

Usually located in several bytes at the beginning or end of an mp3 file, it records the singer, title, album name, era, style, and other mp3 file information.

ID3 is divided into two versions, the V1 ID3 version is fixed at the end of the 128-word file section, it begins with the TAG character, if there is no ID3V1 information, it is considered that there is no ID3V1 information, the V2 ID3 version is found. at the beginning of mp3 and the length is variable.

ü Sampling rate:

The number of samples extracted from a continuous signal to form a discrete signal per second. It is expressed in Hertz (Hz). Sampling rate refers to the sampling frequency when converting an analog signal to a digital signal, i.e. how many points are sampled per unit of time. The higher the sample rate, the more realistic and natural the sound will be. On today’s major capture cards, the sample rate is generally divided into three levels: 22.05 KHz, 44.1 KHz, and 48 KHz. 22.05 KHz can only achieve the sound quality of FM radio, and 44.1 KHz is the theoretical limit of CD sound quality, and 48 KHz is more accurate.

ü Bit rate:

Bit rate refers to the number of bits (bits) transmitted per second. The unit is bps (bit per second). The higher the bit rate, the more information transmitted. In the audio and video fields, bit rate often translates to bit rate. The bit rate indicates how many bits per second the encoded (compressed) audio and video data should represent, and a bit is the smallest unit in binary. 0 or 1. The relationship between bitrate and audio and video compression is simply that the higher the bitrate, the better the quality of the audio and video, but the larger the encoded file; if the bitrate is lower, the situation is just the opposite.

Bit rate = sample rate * number of samples * number of channels

ü Bitrate/Stream/Bitrate:

It refers to the data stream used by audio and video files in a unit of time. The popular understanding is the sample rate, which is the most important part of quality control in audio and video encoding. Generally, the units we use are Kb/s and Mb/s. . Generally speaking, the higher the code stream, the lower the compression ratio and the higher the quality. The higher the code stream, the higher the sampling rate per unit time, the higher the data stream, the higher the accuracy, and the closer the processed file is to the original file.

ü Code:

From the point of view of information theory, the data that describes the source of information is the sum of the redundancy of information and data, namely: data = information + data redundancy. The audio signal has correlation in the time domain and the frequency domain, that is, there is data redundancy. Taking audio as the source, the essence of audio encoding is to reduce redundancy in the audio.

MP3 – the most popular digital audio format

MP3 – the most popular digital audio format

Initial release 1986

MPEG-1 Audio Layer 3, better known as MP3, is a lossy compressed digital audio format developed by the Moving Picture Experts Group (MPEGH) to be part of version 1 (and later expanded to version 2) of the MPEG video. The standard mp3 is 144 kHz and a bitrate of 317 kbps for the quality / size ratio. Its name is the acronym for MPEG-1 Audio Layer 3 and the term should not be confused with that of MP3 player.

Mp3 – History

This format was mainly developed by Karlheinz Brandenburg, director of electronic media technologies at the Fraunhofer IIS Institute, part of the Fraunhofer-Gesellschaft – network of German research centers – which together with Thomson Multimedia controls the bulk of MP3-related patents. The first one was registered in 1986 and several more in 1991. But it was not until July 1995 when Brandenburg first used the .mp3 extension for the MP3-related files he kept on his computer. A year later, his institute paid 1.2 million euros for patents. Ten years later this amount has reached 26.1 million.

The MP3 format became the standard used for streaming audio and compression of high-quality audio (with loss in hi-fi equipment) thanks to the possibility of adjusting the quality of the compression, proportional to the size per second (bitrate), and therefore the final size of the file, which could occupy 12 and even 15 times less than the original uncompressed file.

It was the first audio compression format popularized thanks to the Internet, since it made possible the exchange of music files. The legal proceedings against companies like Napster and AudioGalaxy are the result of the ease with which this type of files are shared.

After the development of autonomous, portable or integrated players in music (stereo) channels, the MP3 format reaches beyond the world of computing.

At the beginning of 2002, other compressed audio formats such as Windows Media Audio and Ogg Vorbis began to be massively included in programs, operating systems and autonomous players, which made it foresee that MP3 would gradually fall into disuse, in favor of other formats, such as the mentioned ones, of much better quality. One of the factors that influences the decline of MP3 is that it has a patent. Technically, it does not mean that its quality is inferior or superior, but it prevents the community from continuing to improve it and can compel paying for the use of some codec, this is what happens with MP3 players. Even so, in late 2009, the mp3 format continues to be the most used and the most successful.

Mp3 player

Mp3 – Technical details

In this layer there are several differences with respect to the MPEG-1 and MPEG-2 standards, among which is the so-called hybrid filter bank that makes its design more complex. This improvement in frequency resolution worsens temporal resolution by introducing pre-echo problems that are predicted and corrected. Additionally, it enables audio quality at rates as low as 64 kbps.

Mp3 Filter bank

The filter bank used in this layer is the so-called hybrid multiphase / MDCT filter bank. It is responsible for mapping the time domain to the frequency domain for both the encoder and the decoder reconstruction filters. The bench output samples are quantized and provide variable frequency resolution, 6×32 or 18×32 subbands, adjusting much better to the critical bands of different frequencies. Using 18 points, the maximum number of frequency frequency components is: 32 x 18 = 576. Resulting in a frequency resolution of: 24000/576 = 41.67 Hz (if fs = 48 kHz.). If 6 frequency lines are used, the frequency resolution is lower, but the temporal resolution is higher, and it is applied in those areas where pre-echo effects are expected (abrupt transitions of silence at high energy levels).

The psychoacoustic model

Compression is based on the reduction of the irrelevant dynamic range, that is, on the inability of the auditory system to detect quantification errors under masking conditions. This standard divides the signal into frequency bands that approximate the critical bands, and then quantizes each subband based on the noise detection threshold within that band. The psychoacoustic model is a modification of the one used in Scheme II, and uses a method called polynomial prediction. It analyzes the audio signal and calculates the amount of noise that can be introduced as a function of frequency, that is, it calculates the “amount of masking” or masking threshold as a function of frequency.

The encoder uses this information to decide the best way to spend the available bits. This standard provides two psychoacoustic models of different complexity: model I is less complex than psychoacoustic model II and greatly simplifies calculations. Studies show that the distortion generated is imperceptible to the experienced ear in an optimal environment from 256 kbps and under normal conditions. For the inexperienced or common ear, with 128 kbps or up to 96 kbps it is enough to make you hear “well” (unless you have high quality audio equipment where the lack of bass is excessively noticeable and the sound stands out of “frying” in the treble). In people who listen to a lot of music or who have experience in the listening part, from 192 or 256 kbps it is enough to hear well. The music that circulates on the Internet, for the most part, is encoded between 128 and 192 kbps.

Coding and quantification

The solution proposed by this standard regarding the distribution of bits or noise is made in an iteration cycle that consists of an internal and an external cycle. Examines both the filter bank output samples and the signal-to-mask ratio (SMR) provided by the psychoacoustic model, and adjusts the bit or noise allocation, depending on the scheme used, to simultaneously satisfy the bit rate requirements and masking. These cycles consist of:

Internal cycle

The internal cycle performs non-uniform quantization according to the floating point system (each MDCT spectral value is raised to the 3/4 power). The cycle chooses a certain quantization interval and Huffman coding is applied to the quantized data in the next block. The cycle ends when the quantized values ​​that have been encoded with Huffman use less or equal number of bits than the maximum number of bits allowed. lokaS

External cycle

Now the external cycle is in charge of verifying if the scale factor for each subband has more distortion than allowed (noise in the encoded signal), comparing each band of the scale factor with the data previously calculated in the psychoacoustic analysis. The external cycle ends when one of the following conditions is met:

Neither scale factor band has much noise.
If the next iteration amplifies one of the bands more than is allowed.
All bands have been amplified at least once.
Bitstream packaging or formatter

This block takes the quantized samples from the filter bank, along with the bit / noise allocation data and stores the encoded audio and some additional data in the frames. Each frame contains information from 1152 audio samples and consists of a header, the audio data along with error checking by CRC and auxiliary data (the latter two optional). The header describes what layer, bit rate, and sample rate are being used for the encoded audio. Frames start with the same synchronization and differentiation header and their length may vary. In addition to dealing with this information, it also includes variable length Huffman encoding, an entropic encoding method that without loss of information eliminates redundancy. It acts at the end of compression to encode the information. Variable length methods are generally characterized by assigning short words to the most frequent events, leaving long words for the most infrequent.

Structure of an MP3 file

An Mp3 file is made up of different MP3 frames which in turn are made up of an Mp3 header and MP3 data. This data stream is called “elemental stream”. Each of the frames is independent, that is, a person can cut the frames of an MP3 file and then play them on any MP3 player on the market. The header consists of a sync word that is used to indicate the beginning of a valid frame. Following are a series of bits that indicate that the analyzed file is a Standard MPEG file and whether or not it uses layer 3.