
Audio Coding

Sampling rate and sample size
Sound is actually a type of energy wave, so it also has the characteristics of frequency and amplitude, with frequency corresponding to the time axis and amplitude corresponding to the level axis.

The wave is infinitely smooth and the chain can be considered to be made up of innumerable points. Since the storage space is relatively limited, in the process of digital encoding, the points of the chain must be sampled. The sampling process consists of extracting the frequency value of a certain point. Obviously, the more points that are extracted in one second, the richer the frequency information that can be obtained. To restore the waveform, there must be two sampling points in one vibration. The highest frequency that can be felt is 20kHz, so to meet the hearing requirements of the human ear, at least 40k samples per second are required, expressed in 40kHz, and these 40kHz are the sampling frequency. Our common CD has a sample rate of 44.1 kHz. It is not enough to have only frequency information, we must also obtain and quantify the energy value of this frequency to represent the strength of the signal. The number of quantization levels is an integer power of 2, and the sample size of our common CD bit is 16 bits, that is, 2 to the power of 16. Sample size is more difficult to understand than frequency. sampling, because it makes it seem abstract. For example, suppose a wave is sampled 8 times and the energy values corresponding to the sample points are A1-A8, but only use a sample size of 2 bits, as a result we can only keep the values of 4 points in A1 -A8 and discard the other 4. If we use the sample size of 3bit, all the information of 8 points is recorded. The higher the sample rate and sample size values, the closer the recorded waveform is to the original signal.
lossy and lossless
According to the sample rate and sample size, it can be known that compared to the natural signal, the audio encoding can only be infinitely close at most, at least the current technology can only do this. Compared to the natural signal, any digital audio encoding scheme is lossy because it cannot be fully restored. In computer applications, PCM encoding can achieve the highest level of fidelity, which is widely used for material preservation and music appreciation. It is used on CDs, DVDs, and our common WAV files. Therefore, PCM has become lossless encoding by convention, because PCM represents the best level of fidelity in digital audio, it does not mean that PCM can guarantee the absolute fidelity of the signal, and PCM can only be infinitely close in the greater extent. We usually include MP3 in the category of lossy audio encoding, which is relatively PCM encoding. The purpose of emphasizing the relativity of lossy and lossless encoding is to tell everyone that it’s hard to achieve true lossless, just like expressing pi with numbers, no matter how high the precision is, it’s infinitely close, not really equal to pi value.
Reasons to use audio compression technology
It is very easy to calculate the bit rate of a PCM audio stream, the value of the sample rate × the value of the sample size × the number of bps of the channel. A WAV file with a sample rate of 44.1 KHz, a sample size of 16 bits, and two-channel PCM encoding has a data rate of 44.1 K×16×2 = 1411.2 Kbps. We usually say that 128K MP3, the corresponding WAV parameter, is this 1411.2 Kbps, this parameter is also called data bandwidth, it is a concept with the bandwidth in ADSL. Divide the bit rate by 8 to get the data rate for this WAV, which is 176.4 KB/s. This means storing a 1-second sample rate of 44.1 KHz, a 16-bit sample size, and a two-channel PCM-encoded audio signal, which requires 176.4 KB of space, which is approximately 10.34 M in 1 minute, which is unacceptable. For most users, especially friends who like to listen to music on the computer, to reduce disk usage, there are only 2 ways to downsample or compress.



