audio masking perception github Archives

Free Download Mp4Gain

How to make canned sound more tasty? – Talking about audio compression and encoding Part 2

SMR (signal-to-maskratio): refers to the distance from the masking curve to the threshold generated by the masking within a critical frequency band.

SNR (signal-to-noise ratio): refers to the signal-to-noise ratio of the signal after quantization of m bits, which is equal to the ratio of the variance of the signal before quantization to the variance of the quantization noise .

MNR (mask-to-noise): refers to the amount of distortion that the human ear can perceive after processing. The value is equal to the difference between SNR and SMR.

The gray area Critical Band refers to the critical band, Masking Threshold is the masking threshold curve, and SMR in the figure refers to the maximum value of SMR in the critical band.

It is worth noting that the SMR, SNR, and NMR we discussed above are all based on critical bands, but the masking effect not only affects their own critical bands, but also affects nearby critical bands, which is called ductility; The figure shows a curve masking in a critical frequency band; As for the real situation, there are often multiple masking curves, ie polyphonic masking curves. This masking curve can be considered as the superposition of the masking curves corresponding to each fundamental tone. . For more specific content in this part, see books related to psychoacoustics.

“Time masking” means that a loud sound appears before or after a weak sound, and the loudest sound will be masked at this time. The main reason for temporal masking is that the human brain needs some time to process the information. The specific effect of this phenomenon can be shown in this image.

Figure 3: Schematic diagram of time masking, you can see different masking effects caused by different sound emission times

Another important concept for compression algorithms is the “critical bandwidth” of masking: Harvey Fletcher conducted an experiment [3] to discover the effect of noise bands on pitch masking. In experiments, the fixed-pitch signal has various noise bandwidths centered on it. His research shows that the critical bandwidth of the noise generates the greatest masking effect and that energy outside this band does not affect the masking. This can be explained by an auditory system with an auditory filter focused on the pitch frequency. The bandwidth of the masker inside this auditory filter effectively masks the tone, but the part of the masker outside the filter has no effect.

This effect can be used in some compression algorithms to reduce the total amount of data by reducing the accuracy of representing part of the signal outside of the critical bandwidth (note that general algorithms like MP3 do not completely rule out masked sound, instead they reduce the accuracy of the masked sound accuracy to avoid unnatural sound).

Figure 4: Schematic diagram of the masking effect, showing the relationship between critical bandwidth (Critical Bandwidth) and frequency (Signal Tone)

hearing threshold

Under specific conditions, the minimum listening intensity at which the subject can perceive more than half of the multiple stimulus signals provided in the test is called the threshold of hearing [4]. The “hearing threshold” shown in the figure is the minimum sound intensity that humans can perceive at the corresponding frequency. Some compression encoding formats remove frequency points below the threshold of hearing to perform audio data compression.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

How to make canned sound more tasty? – Talking about compression and audio encoding

audio masking perception

Last time I talked about [How to make canned goods sound more delicious? ——A brief talk about audio compression and encoding (Part 1) ], the main job of the audio compression algorithm is to compress the part of the frequency beyond the range of the human ear and the part that is “inefficient” for the human perception For these contents, What we need to know: auditory masking effects

audio masking perception

auditory masking effect

In 1894, the American physicist Alfred M. Mayer reported that a tone could be masked by another tone of lower frequency. In 1959, Richard Ehmer described a complete set of auditory curves for this phenomenon. Between 1967 and 1974, Eberhard Zwicker worked in the field of tuning and masking of key frequency bands. Auditory masking effects can be divided into “simultaneous masking” and “temporary masking”.

“Simultaneous masking” refers to the phenomenon that two sounds of different frequencies are played at the same time. Due to the characteristics of human perception of sound, a weaker signal is masked by another signal. In summary, pure tone masking has the following characteristics: bass is easy to mask from treble, and treble is more difficult to mask from bass; pure tones with similar frequencies are easy to mask each other; when the sound pressure level of the masking sound increases, the masking threshold will increase and the masked sound will increase. The frequency range will expand. Here is a schematic diagram to describe this phenomenon

as shown in the figures, the masking effect masks signals at approximately 0.7 kHz, 1.6 kHz, and 2.3 kHz, where the SPL of the 0.7 kHz signal is below the threshold of listening and cannot be heard without masking.

On this basis, three important concepts SMR, SNR and MNR are introduced. mutual relationship