How is analog audio converted to digital?

Sound is a complex analog signal. To analyze such signals a technique widely used in electronics is used. Using the Fourier transform, a complex signal is converted into a harmonic series, consisting of sinusoids with different frequencies and amplitudes. But in practice the signal we are dealing with is of course very different from the sinusoidal one.

Musicians call the first harmonic in this spectrum the fundamental tone, and harmonics with higher frequencies are called harmonics. The main tone determines the pitch and the harmonics give it a certain color, creating the timbre of a voice or musical instrument.
To study the spectra of audio signals, complex and expensive instruments are used – spectrum analyzers.
With the help of such devices, it can be established that some musical instruments, such as a violin, have a relatively uniform spectrum and some wind spectra with pronounced maxima and minima, called formants.
There are no terms that directly describe the coloration of the timbre of a human voice or musical instruments, so it is necessary to resort to various metaphors such as “deep timbre”, “hard timbre”, “metallic” sound or even “transistor”.
Digital information processing methods were attempted many times in connection with sound recording, but the first serious results were achieved in the early 1980s, coinciding with the rapid development of computers and the success of the microminiaturization of radio components. The use of digital sound processing techniques has opened up exciting new possibilities.
To process sound on a computer, it must first be converted to a digital, encoded format. An analog signal is encoded by devices called analog-to-digital converters (ADCs). The main method of encoding an analog signal is pulse code modulation, which consists of three operations: sampling, quantizing, and encoding.
We will not go into coding theory now, especially since it is quite complex and requires higher math skills. It is important for us to understand that the quality of the digitized sound and the resulting file size depend on the sample rate and bit depth.
The sample rate is the frequency at which the characteristics of an audio signal are measured. It follows from Kotelnikov’s sampling theorem that to obtain an undistorted digital signal, the sampling frequency must be at least twice the highest frequency of the encoded signal. Therefore, when encoding an audio signal, the sample rate must be at least 40 kHz. In digital communication systems, the sampling frequency is 32 kHz, in laser CD players and consumer digital tape recorders – 44.1 kHz. In digital studio equipment, the sample rate is even higher: 48 kHz.
The bit depth of the recorded sound is the number of memory bits that are allocated to record each value of the amplitude of the sound signal at the time of its measurement. Modern sound cards use 8 or 16 bits of memory per dimension, and higher quality 32-bit cards are available. The higher the bit depth, the higher the quality of the digitized sound.
As already mentioned, the size of an audio file depends on the sample rate and bit depth of the sound. So with a sample rate of 44 kHz and a sound depth of 16 bits, one minute of sound requires a file size of 5.3 MB and with a sample rate of 11 kHz and 8 bits, 660 Kb.
It is clear that such a waste of disk space turned out to be unacceptable, and special algorithms and formats were created for cheaper storage of audio files.
When comparing different compression formats, the parameter “sound quality at a certain bit rate” is often used.
Bit rate is a parameter that indicates how much disk space is used to store 1 second of music. For example, a bit rate of 128 Kbps means that a three-minute song will occupy about 2.8 MB.
In principle, all programs for encoding audio (also called encoders) use algorithms of two types: for lossless audio compression and for lossy compression.
Lossless compression algorithms, in fact, are well-known archivers for PC users, specially modified to work with an audio stream. When playing sound on the fly, the archive is decompressed from the archive.