Audio Coding Part 4


Free Download Mp4Gain
picture

Audio Coding Part 4

Audio Coding

Transmission encoding format

Audio Coding

PCM encoding
PCM Pulse Code Modulation is short for Pulse Code Modulation. In the text above, we mentioned the general PCM workflow. We don’t need to care which calculation method is used in the final PCM encoding. We just need to know the advantages and disadvantages of the PCM encoded audio stream. The biggest advantage of PCM encoding is good sound quality and the biggest disadvantage is its large size. Our common audio CD uses PCM encoding, and the capacity of one CD can only hold 72 minutes of music information.
WAV format
This is an old audio file format, developed by Microsoft. WAV is a file format that complies with the RIFF (Resource Interchange File Format) specification. All WAVs have a file header that contains encoding parameters for the audio stream. WAV does not have strict rules for encoding audio streams. In addition to PCM, almost all encodings that support the ACM specification can encode WAV audio streams. Many friends do not have this concept. Let’s take AVI as an example, because AVI and WAV are very similar in file structure, but AVI has one more video stream. There are many types of AVIs we have come into contact with, so we often need to install some decoders to watch some AVIs. DivX, which we have come into contact with a lot, is a type of video encoding. AVI can use DivX encoding to compress video streams, of course we can also use other code compression. Similarly, WAV can also use a variety of audio codecs to compress its audio stream, but we commonly use WAV whose audio stream is processed by PCM encoding, but this does not mean that WAV can only use PCM codec, it is also you can use MP3 codec. in WAV Just like AVI, as long as the corresponding Decode is installed, you can enjoy these WAVs.
On the Windows platform, WAV based on PCM encoding is the best supported audio format. All audio programs can support it perfectly. Because it can meet higher sound quality requirements, WAV is also the preferred format for music creation and editing. Suitable for storing musical material. Therefore, WAV based on PCM encoding is used as an intermediate format, and is often used in the mutual conversion of other encodings, such as MP3 to WMA.
MP3 encoding
As the most popular audio compression format, MP3 is widely accepted by everyone. Various MP3-related software products emerge in a never-ending stream, and more hardware products start to support MP3 as well. Many VCD/DVD players that we can buy are compatible with MP3. , and there are more portable MP3 players, etc. Although several of the major music companies are extremely displeased with this open format, they cannot prevent the survival and spread of this compressed audio format. MP3 has been in development for 10 years and is short for MPEG (MPEG: Moving Picture Experts Group) Audio Layer-3, which is an encoding scheme derived from MPEG1. MP3 can achieve an incredible 12:1 compression ratio and still maintain basically audible sound quality. In the days when the hard drive was expensive, users quickly accepted MP3. With the popularity of the Internet, hundreds of millions of users accepted MP3. users At the beginning of the release of MP3 encoding technology, it was actually very imperfect. Due to a lack of research on sound and human hearing, almost all early mp3 encoders were crudely encoded and the sound quality was severely damaged. With the continuous introduction of new technologies, mp3 encoding technology has been improved over and over again, including two major technical improvements.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Audio Coding Part 3

Audio Coding Part 3

Audio Coding

flow characteristics

Audio Coding

With the development of the Internet, people have put forward requirements to listen to music online, so it is also required that the audio file can be played while reading, without the need to read the entire file and then play it, so that listening is can achieve without downloading. It can also be done while encoding and playing. It is this feature that you can live broadcast online and set up your own digital radio station has become a reality.
Transmission classification code
According to different coding methods, audio coding techniques are divided into three types: waveform coding, parametric coding, and hybrid coding. Generally speaking, waveform coding has high voice quality, but the coding rate is also high; parametric coding has a very low coding rate and the quality of the resulting synthesized speech is not high; hybrid coding uses parametric coding technology and waveform coding technology, coding rate and sound quality among them.
1. Waveform coding
Waveform coding refers to directly transforming the time-domain signal into a digital code without using any parameters of the generated audio signal, so that the reconstructed speech waveform is as consistent as possible with the waveform. waveform of the original speech signal. The basic principle of waveform coding is to sample the analog speech signal at a certain rate on the time axis and then quantize the amplitude samples hierarchically and represent them with codes.
The waveform coding method is simple, easy to implement, strong in adaptability, and good in voice quality. However, because the compression method is simple, it also has some problems: the compression ratio is relatively low, resulting in a higher encoding rate. Generally speaking, the complexity of waveform coding is relatively low and the coding rate is relatively high. Generally, the audio quality is quite high when the encoding rate is higher than 16 kbit/s. When the coding rate is less than 16 kbit/s, the sound quality is drastically reduced.
The simplest waveform coding method is PCM (Pulse Code Modulation), which just samples and quantizes the speech signal. The advantages are that the coding method is simple, the delay time is short, the sound quality is high, and the reconstructed speech signal is almost indistinguishable from the original speech signal. The disadvantage is that the coding rate is relatively high (64 kbit/s) and it is more sensitive to errors in the transmission channel.
2. Parameter coding
Parametric coding consists of extracting the parameters of the generated speech from the speech waveform signal and using these parameters to reconstruct the speech through the speech generation model, so that the reconstructed speech signal can maintain the semantics of the original speech signal as much as possible. . That is, the parameter encoding is based on the digital model generated by the voice signal, and then the model parameters are obtained from the digital model, and then the digital model is restored according to these parameters, and then the talks.
The coding rate of parametric coding is low, which can reach 2.4 kbit/s. The generated speech signal is restored using the established digital model. Therefore, the waveform of the reconstructed speech signal may be quite different from the waveform of the original speech signal. The distortion will be larger AND due to the limitations of the speech generation model, increasing the data rate does not improve the quality of the synthesized speech. However, although the sound quality of the parameter encoding is relatively low, the confidentiality is very good, and it has been used in the military. A typical parameter coding method is LPC (Linear Predictive Coding).
3. Mixed coding
Hybrid encryption refers to the simultaneous use of two or more encryption methods for encryption. This coding method overcomes the weakness of waveform coding and parametric coding, and combines the high quality of waveform coding and the low coding rate of parametric coding, and can achieve better results.

Audio Coding Part 2

Audio Coding Part 2

Audio Coding

Reasons to use audio compression technology.

audio coding

It is very easy to calculate the bit rate of a PCM audio stream, the value of the sample rate × the value of the sample size × the number of bps of the channel. A WAV file with a sample rate of 44.1 KHz, a sample size of 16 bits, and two-channel PCM encoding has a data rate of 44.1 K×16×2 = 1411.2 Kbps. We usually say that 128K MP3, the corresponding WAV parameter, is this 1411.2 Kbps, this parameter is also called data bandwidth, it is a concept with the bandwidth in ADSL. Divide the bit rate by 8 to get the data rate for this WAV, which is 176.4 KB/s. This means storing a 1-second sample rate of 44.1 KHz, a 16-bit sample size, and a two-channel PCM-encoded audio signal, which requires 176.4 KB of space, which is approximately 10.34 M in 1 minute, which is unacceptable. For most users, especially friends who like to listen to music on the computer, to reduce disk usage, there are only 2 ways to downsample or compress. Lowering the index is not advisable, so experts have developed various compression schemes. Due to different uses and target markets, the sound quality and compression ratio achieved by various audio compression encodings are different, and we will mention them one by one in the following articles. One thing is for sure, they are all compressed.
Frequency vs. Sampling Rate
The sample rate represents the number of times the original signal is sampled per second. The sample rate of most of the audio files that we see regularly is 44.1 KHz. What does this mean? Suppose we have 2 segments of sine wave signals, 20 Hz and 20 KHz respectively, each lasting one second, to correspond to the lowest and highest frequencies we can hear, and we sample these two signals at 40 KHz respectively. , we can get what kind of result? The result is: the 20 Hz signal is sampled 40K/20=2000 times per vibration, while the 20K signal is only sampled 2 times per vibration. Obviously, under the same sample rate, the low-frequency information is much more detailed than the high-frequency information. This is also the reason why some audiophiles accuse CDs of digital sound not being real enough, and 44.1KHz CD sampling cannot guarantee that high-frequency signals are recorded well. To better record high-frequency signals, a higher sample rate seems to be required, so some folks use a 48KHz sample rate when capturing audio tracks from CDs, which is undesirable! Actually, this is not good for sound quality. For the ripping software, keeping the same sample rate as the 44.1 KHz provided by the CD is one of the guarantees for the best sound quality, rather than improving it. A higher sample rate is only useful for analog signals, if the signal being sampled is digital, do not try to increase the sample rate.

Audio Coding

Audio Coding

Sampling rate and sample size
Sound is actually a type of energy wave, so it also has the characteristics of frequency and amplitude, with frequency corresponding to the time axis and amplitude corresponding to the level axis.

Advanced Audio Coding

 

The wave is infinitely smooth and the chain can be considered to be made up of innumerable points. Since the storage space is relatively limited, in the process of digital encoding, the points of the chain must be sampled. The sampling process consists of extracting the frequency value of a certain point. Obviously, the more points that are extracted in one second, the richer the frequency information that can be obtained. To restore the waveform, there must be two sampling points in one vibration. The highest frequency that can be felt is 20kHz, so to meet the hearing requirements of the human ear, at least 40k samples per second are required, expressed in 40kHz, and these 40kHz are the sampling frequency. Our common CD has a sample rate of 44.1 kHz. It is not enough to have only frequency information, we must also obtain and quantify the energy value of this frequency to represent the strength of the signal. The number of quantization levels is an integer power of 2, and the sample size of our common CD bit is 16 bits, that is, 2 to the power of 16. Sample size is more difficult to understand than frequency. sampling, because it makes it seem abstract. For example, suppose a wave is sampled 8 times and the energy values ​​corresponding to the sample points are A1-A8, but only use a sample size of 2 bits, as a result we can only keep the values ​​of 4 points in A1 -A8 and discard the other 4. If we use the sample size of 3bit, all the information of 8 points is recorded. The higher the sample rate and sample size values, the closer the recorded waveform is to the original signal.
lossy and lossless
According to the sample rate and sample size, it can be known that compared to the natural signal, the audio encoding can only be infinitely close at most, at least the current technology can only do this. Compared to the natural signal, any digital audio encoding scheme is lossy because it cannot be fully restored. In computer applications, PCM encoding can achieve the highest level of fidelity, which is widely used for material preservation and music appreciation. It is used on CDs, DVDs, and our common WAV files. Therefore, PCM has become lossless encoding by convention, because PCM represents the best level of fidelity in digital audio, it does not mean that PCM can guarantee the absolute fidelity of the signal, and PCM can only be infinitely close in the greater extent. We usually include MP3 in the category of lossy audio encoding, which is relatively PCM encoding. The purpose of emphasizing the relativity of lossy and lossless encoding is to tell everyone that it’s hard to achieve true lossless, just like expressing pi with numbers, no matter how high the precision is, it’s infinitely close, not really equal to pi value.
Reasons to use audio compression technology
It is very easy to calculate the bit rate of a PCM audio stream, the value of the sample rate × the value of the sample size × the number of bps of the channel. A WAV file with a sample rate of 44.1 KHz, a sample size of 16 bits, and two-channel PCM encoding has a data rate of 44.1 K×16×2 = 1411.2 Kbps. We usually say that 128K MP3, the corresponding WAV parameter, is this 1411.2 Kbps, this parameter is also called data bandwidth, it is a concept with the bandwidth in ADSL. Divide the bit rate by 8 to get the data rate for this WAV, which is 176.4 KB/s. This means storing a 1-second sample rate of 44.1 KHz, a 16-bit sample size, and a two-channel PCM-encoded audio signal, which requires 176.4 KB of space, which is approximately 10.34 M in 1 minute, which is unacceptable. For most users, especially friends who like to listen to music on the computer, to reduce disk usage, there are only 2 ways to downsample or compress.