
MP3 digital audio format

High-quality digitized audio requires a large amount of disk space.
Attempts to reduce the size of files using standard archivers (RAR, GZIP, etc.) do not generate significant gains due to the specificity of the sound data. However, it is possible to achieve a fairly significant level of compression of the audio information using special methods based on the analysis of the data structure and subsequent compression with some loss.
The real possibility of sound processing comparable in quality to existing analog examples did not appear until the late 1980s.
In 1988, the International Organization for Standardization (ISO) formed the MPEG (Moving Picture Experts Group) committee, whose main task is to develop standards for the encoding of moving pictures, sound and their combination. During the ten years of its existence, the committee has developed a series of norms on this subject. As a result, summarizing the extensive research in this area, several specific formats were recommended for storing data, which are excellent in quality of results and data flow.
There are currently three video storage standards: MPEG-1, MPEG-2, and MPEG-4.
Within the first two formats, there are also formats for storing audio information: Layer-1, Layer-2 and Layer-3. These three audio formats are defined for MPEG-1 and minor extensions are used in MPEG-2. The three formats are similar to each other, but use different levels of trade-off between compression and complexity.
Layer-1 is the simplest, it does not require significant compression costs, but it also provides a negligible compression ratio.
Layer-3 is the most time consuming and provides the best compression. Recently, this format has gained immense popularity. It is often called MP3. This name is associated with the extension of the audio files stored in this format.
The underlying idea behind all lossy audio compression techniques is to neglect the subtle details of the original sound that are beyond the reach of the human ear. Here several points can be highlighted.
Noise level . Sound compression is based on a simple fact: if a person is near a loud siren, they are unlikely to hear the conversation of the people who are nearby. And this happens not because a person pays close attention to a loud sound, but to a greater extent because the human ear actually misses out sounds that are in the same frequency range as a louder sound. This effect is called masking, it changes with the difference in volume and frequency of the sound.
The second point is the division of the audio frequency band into subbands, each of which is further processed separately. The encoding program extracts the loudest sounds in each band and uses this information to determine an acceptable noise level for that band. The best encoding programs also take into account the influence of adjacent bands. A very loud sound in one band can affect the masking effect and nearby bands.
Another point of the codification is the use of a psychoacoustic model based on the peculiarities of the human perception of sound. The compression used by this model is based on removing frequencies known to be inaudible, while more carefully preserving sounds that can be easily heard by the human ear. Unfortunately, there can be no exact mathematical formulas here.
The human perception of sound is a complex process, not fully understood, so the choice of compression methods is based on analyzing listening and comparing compressed sounds differently by teams of experts. But here there are practically limitless possibilities in the field of improving psychoacoustic models. Most of the existing algorithms to encode the human voice are based on the high predictability of said signal; Universal MPEG compression algorithms have tried to apply this technique with variable success.
Another compression technique is the use of so-called joint stereo. It is known that the human hearing aid can only determine the direction of the mid frequencies, the high and low sound, so to speak, separately from the source. This means that these background frequencies can be encoded into a mono signal. In addition to all this, compression uses the difference in the complexity of the flows in the channels.










