

Characteristics of human hearing
Human hearing is not perfect. In addition to the physical limitations of the ear, sound has to travel through the nerves to the auditory cortex of the brain, where it is transformed into different perceptions of which we are aware.

Volume:
Two sounds with the same amplitude can be perceived with different intensity depending on the frequencies they have. The perception of the intensity of a sound is not constant with frequency. The human ear has a greater sensitivity to sound between 1000 and 5000 Hz. All the points of the curve are perceived with the same volume (volume), but the necessary sound pressure is not the same.
Frequency range
Human beings can perceive sounds in the frequency range of 20 Hz to 20 kHz due to the physical limitations of the ear. The frequency range changes with age, we lose the ability to hear the higher frequencies as we age.
Dynamic range
The smallest variation in air pressure that a human can detect (20 micropascals) measured at the frequencies where we are most sensitive, is used as a reference (0 dB) to measure the intensity of other sounds.
Power in dB (decibels) =, where P is the power considered and is the power corresponding to 20 micropascals.
A normal conversation is between 50-60 dB and the sound of car traffic is approximately 80 dB. The maximum sound that the ear can tolerate is 130 dB, which provides a dynamic range of 0 to 130 dB.
Auditory masking
Hearing masking is defined as the “decreased audibility of one sound due to the presence of another.” Auditory masking consists of frequency masking and temporal masking:
Frequency masking:
Also called simultaneous masking, it is best explained with an example. If you have a loud sound with a frequency of 1000 Hz, and also a sound at the 1100 Hz frequency that is 18 dB below the above, the 1100 Hz sound cannot be heard because it is being masked by the louder sound of 1000 Hz. This is because the 1000 Hz sound is louder and has a close frequency. The closer they are in frequency, the louder the sounds that can be masked by the louder sound. (Figure 2)
Temporary masking: occurs before and after a loud sound. If a sound is masked after a louder sound, it is called post-masking, and if it is masked in advance it is called pre-masking. Previous masking only exists for a brief moment (20 ms). Subsequent masking takes effect up to 200 ms. (Figure 3).
By exploring both masks (in frequency and time) it is possible to substantially reduce the audio information, without an audible change.
That is, there are at least four facts that allow the information to be reduced without the ear detecting it.
1.- The human ear does not detect the stereo in the low frequencies.
2.- If two or more sounds occur at nearby frequencies, the human ear will only listen to the loudest sound.
3.- The sounds before and especially after a loud sound are also masked or “covered” by the loudest sound.
4.- The ear does not receive the same volume at all frequencies.
All this allows the mp3 to discard information, a lot of information, that the human ear will not detect, if a suitable bitrate and samplerate are used.
Waveform and perceptual encoders
There are two types of audio encoders. First we have the waveform encoders, which try to reconstruct the signal as exactly as possible after encoding and decoding.
Perceptual encoders do not attempt to keep the signal exactly as it was before the encoding and decoding step. They seek to ensure that the human ear perceives the output as the original. Taking advantage of knowledge about the properties of hearing and the limitations of human hearing, the perceptual encoder removes part of the signal that we cannot perceive.
Almost all perceptual encoders transform the sound from the time domain to the frequency domain, and they soon separated the different frequencies into subbands. Then he uses his knowledge of how the ear works to remove unnecessary information. The chewing effect is the most commonly explored hearing phenomenon.



