Basics of digital sound theory Part 3


Free Download Mp4Gain
picture

Basics of digital sound theory Part 3

Sample Rate

Compression algorithms

Sample Rate

Let’s try to calculate how much disk space an average CD-quality digitized music composition will occupy. Obviously, for this it is necessary to use the formula t KBF size ⋅ ⋅ ⋅ = where F is the sampling frequency, B is the sample capacity, K is the number of strings, t is the time.

Assuming 44.1 kHz herbal, B = 2 bytes, K = 2 channels, and t = 300 seconds, we get that the digitized song will occupy approximately 50MB.

This means that only about 10 uncompressed songs can be burned to CD. Since every second of digitized CD quality sound takes up almost 200 Kb, this sound will be very problematic to use on telephony, radio or the Internet. Even if you digitize the sound as a single channel with a sample rate of 11.05 kHz and a bit depth of 8 bits, each second will occupy 11 KB.

For ordinary telephone networks, this is too much for sound to be transmitted in a continuous stream. A problem arises: somehow it is necessary to reduce the size of the sound files.

It is solved quite effectively by using various compression algorithms.
Flash Player supports the following types of compression.

• ADPCM (Adaptive Differential Pulse Code Modulation – Adaptive Difference Pulse Code Modulation). This type of compression is based on two ideas. First, it was found that in the vast majority of sounds we perceive, slowly changing low-frequency components prevail. From this fact it follows that the difference between adjacent samples is often small (or rather, significantly less than the absolute value of the samples themselves).

This means that the digitized audio signal can be represented not by the samples themselves, but by the differences between them, which are smaller in magnitude and therefore require fewer bits for description. Second, the coding of the difference between adjacent samples is done taking into account the magnitude of the amplitude and frequency composition, since the human ear has sensitivity limits (the so-called adaptation).

The ADPCM algorithm is actively used in IP telephony. It is poorly suited for streaming music due to the significant distortions it introduces into sound (distortions, of course, get into speech, but are hardly noticeable in speech). The compression ratio for ADPCM is typically low, ranging from 8: 1 to 3: 1. The ADPCM Flash Player codec allows 2, 3.4, or 5 bits to represent the difference between samples. Actually, you can achieve acceptable sound quality with a bit rate (bit rate, that is, the “weight” of a second of sound) of 16 Kbit.

The ADPCM algorithm is significantly inferior to MP3, so it is not worth using such compression in principle. MP3 compression will provide an order of magnitude better quality with the same bit depth. The presence of the corresponding codec is explained by the principles of backward compatibility: the MP3 codec is built into the player only in Flash 4. Before that, only the ADPCM codec was used, which is probably due to the free distribution of this algorithm. The reason ADPCM is still used in IP telephony is that it does not require as extensive math calculations as MP3, so compression can be done on the fly.

• MP3. One of the first and most common compression algorithms based on the so-called psychoacoustic compression. It uses the following characteristics of the human ear:

or if a soft sound follows a very strong one, then we don’t hear it. Therefore, it can be discarded;

or a sound component with a large amplitude masks components close to it in frequency, but with smaller amplitudes. Therefore, they can be slaughtered without noticeable loss of quality;

or the ear’s sensitivity to frequency distortion is low, therefore, if the components are close, they can be considered the same;

o We misperceive very low and very loud sounds, so fewer bits can be allocated for their encoding than for sounds with an average frequency.

Technically, the MP3 algorithm is implemented as follows. The sound is divided into chunks of a certain length called frames, and a forward Fourier transform is applied to each set of samples. Its result is the decomposition of a sound wave into elementary sinusoids of different frequencies: harmonics. The harmonic coefficient determines its contribution to the resulting wave. Harmonic coefficients are compared and the least significant are discarded.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Author: R. Arias

R. Arias is the author of this article and has extensive experience for more than 30 years as a recording engineer and audio specialist, as well as more than 20 years of experience creating algorithms related to audio and video. Linkedin