
How is an mp3 analyzed inside?
MP3 is the acronym for MPEG 1 Layer 3 and is a lossy digital audio format developed by MPEG (Moving Picture Experts Group) in conjunction with the Franunhofer Institute of Technology to include it as an audio format for the MPEG-video format. 1. It is currently an ISO (International Organization for Standardization) standard. The reason it has become so popular is that it allows for high sound quality in very little storage space: About 650 songs can be recorded on a 650MB CDROM, in instead of the 15 that we could store following the format of traditional CD-Audio. Furthermore, it is possible to adjust the quality of the output file by adjusting the bitrate (sampling rate and number of bits per sample), which will be proportional to the size of the output file. Thanks to its small size, high quality and versatility, it became a standard for streaming.
It was said at the beginning that MP3 is a lossy algorithm, this means that the original and encoded sound are not exactly the same. For this, the MP3 takes advantage of the “deficiencies” of the human ear, specifically 3 of them:
Limits of hearing in frequency: The human ear is only capable of hearing frequencies that are approximately between 20 and 20,000 KHz, with which the rest are filtered and discarded as they would not add relevant information to the encoded signal. Also, the closer you are to the 2-4 Khz range (and harder to hear as the frequency gets closer to the extremes of hearing), the more audible it will be.
Masking effect: When 2 signals of similar frequency overlap, human hatred is only able to hear the one with the highest power (volume), therefore, the rest can be eliminated without appreciable loss of quality.
Stereo redundancy: Sometimes there is redundancy between the 2 channels and, furthermore, below a certain frequency, the human ear is not able to distinguish the directionality of the sound with which a single channel can be encoded and add to the other certain complementary information to not lose the spatial sensation of the other channel.
To carry out the three previous proposals, a system based on subbands is used in which the signal is filtered using several filters in order to have the signal separated into sub-signals, each covering a frequency range. Each of these bands is compared to a psychoacoustic model that determines which bands are important and which can be removed.
Specifically, a hybrid polyphase / MDCT (Modified Discrete Cosine Transform) filter bank is used: A filter bank is a set of band-pass filters that aim to separate the original signal into several frequency bands; A multiphase / MDCT hybrid filter bank is nothing more than a normal filter bank together with a block capable of doing the discrete cosine transform (MDCT).
The choice of which bands are maintained and which are removed is made by calculating the masking threshold, that is, it analyzes each audio sub-signal and calculates the amount of noise that can be input (signal is replaced by noise to save storage space) in function of the frequency, taking into account that a frequency masks signals of a higher frequency than yours rather than lower, without being noticeable to the human ear.
The following figure outlines the process described above:
The following figure represents the structure of an mp3 file:
As can be seen, an Mp3 file is made up of different frames which in turn are made up of an Mp3 header and MP3 data. Each of the frames is independent, that is, a person can cut the frames of an MP3 file and then play them back. The graph shows that the header consists of a sync word that is used to indicate the beginning of a valid frame. Following are a series of bits that indicate that the analyzed file is a standard MPEG file and whether or not it uses layer 3.
MP3 undoubtedly owes its success to Internet music downloads and portable audio players capable of playing the format. First, Discman compatible with MP3 were born, which allowed transporting 175 songs per cd instead of the usual 6. Subsequently, MP3 players based on a (small back then) flash memory were born. These had the advantage of being much smaller and lighter than portable CD players, but with the initial disadvantage that flash memory was small and expensive. Initially these devices had 64 or 128 MB memory, which allowed them to store between 16 and 32 songs. Currently these devices are sold with a memory of 1,2,4 or even 8GB. This allows them to store between 256 (for the 1Gb model) and 2048 (for the 8GB model)





