
How to make canned sound more tasty? – Talking about audio compression and encoding Part 2

SMR (signal-to-maskratio): refers to the distance from the masking curve to the threshold generated by the masking within a critical frequency band.

SNR (signal-to-noise ratio): refers to the signal-to-noise ratio of the signal after quantization of m bits, which is equal to the ratio of the variance of the signal before quantization to the variance of the quantization noise .
MNR (mask-to-noise): refers to the amount of distortion that the human ear can perceive after processing. The value is equal to the difference between SNR and SMR.
The gray area Critical Band refers to the critical band, Masking Threshold is the masking threshold curve, and SMR in the figure refers to the maximum value of SMR in the critical band.
It is worth noting that the SMR, SNR, and NMR we discussed above are all based on critical bands, but the masking effect not only affects their own critical bands, but also affects nearby critical bands, which is called ductility; The figure shows a curve masking in a critical frequency band; As for the real situation, there are often multiple masking curves, ie polyphonic masking curves. This masking curve can be considered as the superposition of the masking curves corresponding to each fundamental tone. . For more specific content in this part, see books related to psychoacoustics.
“Time masking” means that a loud sound appears before or after a weak sound, and the loudest sound will be masked at this time. The main reason for temporal masking is that the human brain needs some time to process the information. The specific effect of this phenomenon can be shown in this image.
Figure 3: Schematic diagram of time masking, you can see different masking effects caused by different sound emission times
Another important concept for compression algorithms is the “critical bandwidth” of masking: Harvey Fletcher conducted an experiment [3] to discover the effect of noise bands on pitch masking. In experiments, the fixed-pitch signal has various noise bandwidths centered on it. His research shows that the critical bandwidth of the noise generates the greatest masking effect and that energy outside this band does not affect the masking. This can be explained by an auditory system with an auditory filter focused on the pitch frequency. The bandwidth of the masker inside this auditory filter effectively masks the tone, but the part of the masker outside the filter has no effect.
This effect can be used in some compression algorithms to reduce the total amount of data by reducing the accuracy of representing part of the signal outside of the critical bandwidth (note that general algorithms like MP3 do not completely rule out masked sound, instead they reduce the accuracy of the masked sound accuracy to avoid unnatural sound).
Figure 4: Schematic diagram of the masking effect, showing the relationship between critical bandwidth (Critical Bandwidth) and frequency (Signal Tone)
hearing threshold
Under specific conditions, the minimum listening intensity at which the subject can perceive more than half of the multiple stimulus signals provided in the test is called the threshold of hearing [4]. The “hearing threshold” shown in the figure is the minimum sound intensity that humans can perceive at the corresponding frequency. Some compression encoding formats remove frequency points below the threshold of hearing to perform audio data compression.



