
Audio digitization: how it works

How to translate sound into 0s and 1s without soul? . Let’s take a look at familiar devices: how computer sound, video, MP3s, streaming and streaming work, various algorithms, and more.

a bit of physics
Sounds are vibrations in the air. Like waves in the water, in the air. Air pressure enters the ear, which has sensitive parts that can subtly sense vibrations in the air. These vibrations are perceived by people as sounds. There is no sound in outer space because there is no air.
frequency. The faster the vibration, the weaker the sound we perceive. A person perceives vibrations that range between 20 and 20,000 vibrations per second. In other words, this is called the oscillation frequency: Hertz. That is, the range we hear is from 20 Hz to 20 kHz.
By comparison, dogs hear frequencies from 40 Hz to 60 kHz, so humans don’t perceive a dog’s whistle, but dogs can hear it. The sound of a dog whistle is only in the 23-54 kHz range.
amplitude. The stronger the vibration, the stronger the sound and vice versa. You can think of this as the height of the waves on the surface of the pond: there may be small ripples (soft sounds) or there may be large powerful waves.
Divide the sound into segments.
Now let’s do this: We divide the second part into 4 parts and find the magnitude value for each part:
We measure the state of the quadratic wave in one second. This is called sampling.
We measured the magnitude of each of the four points and, in relative terms, we got four numbers: +30, -50, -50 and -60. In theory, if we were to pass current and apply these four voltages to the speaker, we would be able to reproduce the same sound. But there are several problems:
• Since we only measure in four places, all oscillation is lost.
• We ended up with a very distorted sound compared to the original.
Sampling at a rate of 4 is too little for the sound. To get at least intelligible speech, one second must be divided into 8,000 segments, and for music, 41,000 segments are usually sufficient.
Let’s increase the sample rate: cut the sound into smaller parts in the same unit of time:
Measurements are now more accurate and the resulting sound is more natural.
convert to number
After dividing the sound into small segments and measuring the amplitude value of each segment, we can record it in table form:
Time ⠀⠀⠀⠀⠀ Amplitude
0.01 seconds. ⠀⠀⠀⠀ 5
0.02 seconds. ⠀⠀⠀⠀ 7
0.03 seconds. ⠀⠀⠀⠀ 10
If we divide the whole sound into equal segments, then the time cannot be written, since we know how it changes, it is enough to write the amplitude value on a line:
5 7 10 … −21



