Mp3, how it compresses the sound and why it needs to be normalized

Free Download Mp4Gain

The mp3 bases its effectiveness on that it is based on human hearing. That is, from knowing the limitations and behaviors of the human ear, it is that they have managed to eliminate information without this fact affecting the quality, if other values, such as bitrate and sample rate, are kept at adequate levels.

Characteristics of human hearing

Human hearing is not perfect. In addition to the physical limitations of the ear, sound has to travel through the nerves to the auditory cortex of the brain, where it is transformed into different perceptions of which we are aware.

Volume:

Two sounds with the same amplitude can be perceived with different intensity depending on the frequencies they have. The perception of the intensity of a sound is not constant with frequency. The human ear has a greater sensitivity to sound between 1000 and 5000 Hz. All the points of the curve are perceived with the same volume (volume), but the necessary sound pressure is not the same.

Frequency range

Human beings can perceive sounds in the frequency range of 20 Hz to 20 kHz due to the physical limitations of the ear. The frequency range changes with age, we lose the ability to hear the higher frequencies as we age.

Dynamic range

The smallest variation in air pressure that a human can detect (20 micropascals) measured at the frequencies where we are most sensitive, is used as a reference (0 dB) to measure the intensity of other sounds.

Power in dB (decibels) =, where P is the power considered and is the power corresponding to 20 micropascals.

A normal conversation is between 50-60 dB and the sound of car traffic is approximately 80 dB. The maximum sound that the ear can tolerate is 130 dB, which provides a dynamic range of 0 to 130 dB.

Auditory masking

Hearing masking is defined as the “decreased audibility of one sound due to the presence of another.” Auditory masking consists of frequency masking and temporal masking:

Frequency masking:

Also called simultaneous masking, it is best explained with an example. If you have a loud sound with a frequency of 1000 Hz, and also a sound at the 1100 Hz frequency that is 18 dB below the above, the 1100 Hz sound cannot be heard because it is being masked by the louder sound of 1000 Hz. This is because the 1000 Hz sound is louder and has a close frequency. The closer they are in frequency, the louder the sounds that can be masked by the louder sound. (Figure 2)

Temporary masking: occurs before and after a loud sound. If a sound is masked after a louder sound, it is called post-masking, and if it is masked in advance it is called pre-masking. Previous masking only exists for a brief moment (20 ms). Subsequent masking takes effect up to 200 ms. (Figure 3).

By exploring both masks (in frequency and time) it is possible to substantially reduce the audio information, without an audible change.

That is, there are at least four facts that allow the information to be reduced without the ear detecting it.

1.- The human ear does not detect the stereo in the low frequencies.

2.- If two or more sounds occur at nearby frequencies, the human ear will only listen to the loudest sound.

3.- The sounds before and especially after a loud sound are also masked or “covered” by the loudest sound.

4.- The ear does not receive the same volume at all frequencies.

All this allows the mp3 to discard information, a lot of information, that the human ear will not detect, if a suitable bitrate and samplerate are used.

Waveform and perceptual encoders

There are two types of audio encoders. First we have the waveform encoders, which try to reconstruct the signal as exactly as possible after encoding and decoding.

Perceptual encoders do not attempt to keep the signal exactly as it was before the encoding and decoding step. They seek to ensure that the human ear perceives the output as the original. Taking advantage of knowledge about the properties of hearing and the limitations of human hearing, the perceptual encoder removes part of the signal that we cannot perceive.

Almost all perceptual encoders transform the sound from the time domain to the frequency domain, and they soon separated the different frequencies into subbands. Then he uses his knowledge of how the ear works to remove unnecessary information. The chewing effect is the most commonly explored hearing phenomenon.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

Author: R. Arias

R. Arias is the author of this article and has extensive experience for more than 30 years as a recording engineer and audio specialist, as well as more than 20 years of experience creating algorithms related to audio and video. Linkedin View all posts by R. Arias