
How sound is encoded

Sound is a wave that travels more frequently in air, water, or other medium with a continuously changing intensity and frequency.
![]()
A person can perceive sound waves (air vibrations) with the help of hearing in the form of sound, while distinguishing between volume and pitch.
The higher the intensity of the sound wave, the louder the sound, the higher the frequency of the wave, the higher the pitch of the sound.
We previously wrote in more detail about the human perception of sound, you can read it here.
How audio is encoded (digital encoding and audio processing)
Dependence of the loudness, as well as the tone of the sound on the intensity and frequency of the sound wave.
Hertz (denoted by Hz or Hz) is a unit of measurement for the frequency of periodic processes (eg, oscillations).
1 Hz means an execution of said process in one second: 1 Hz = 1 / s.
If we have 10 Hz, this means that we have ten executions of said process in one second.
The human ear can perceive sound at frequencies ranging from 20 vibrations per second (20 Hertz, low sound) to 20,000 vibrations per second (20 KHz, high sound).
In addition, a person can perceive sound in a wide range of intensities, in which the maximum intensity is 1014 times greater than the minimum (one hundred thousand billion times).
To measure the volume of sound, a special unit of “decibels” (dB) was invented and used.
A decrease or increase in sound volume by 10 dB corresponds to a decrease or increase in sound intensity by 10 times.
Characteristic sound Loudness measured in decibels
Lower limit of human ear sensitivity 0
Leaf whisper ten
Conversation 60
Horn 90
Jet engine 120
Pain threshold 140
Sound volume in decibels
Sync Audio Sampling
In order for computer systems to process sound, a continuous audio signal must be converted to a discrete digital form by time sampling.
For this, a continuous sound wave is divided into separate small time sections, for each section a certain value of sound intensity is set.
Therefore, the continuous dependence of the loudness of the sound at time A (t) is replaced by a discrete sequence of loudness levels. On the graph, this appears to replace a smooth curve with a sequence of “steps.”
How audio is encoded (digital encoding and audio processing)
Sync Audio Sampling
A microphone connected to the sound card is used to record analog audio and convert it to digital format.
The denser the discrete strips are located on the graphic, the better it will be to ultimately recreate the original sound.
The resulting digital sound quality depends on the number of sound volume level measurements per unit time, that is, the sampling frequency.
Audio sample rate is the number of audio volume measurements in one second.
The more measurements that are made in one second (the higher the sampling frequency), the more accurately the “ladder” of the digital audio signal repeats the curve of the analog signal.
Each “step” of the graph is assigned a certain value for the sound volume level. Loudness levels can be thought of as a set of possible N states (gradations), which require a certain amount of I information to encode, which is called audio encoding depth.
Audio encoding depth is the amount of information required to encode the discrete volume levels of digital audio.
If the known encoding depth, the number of digital audio volume levels can be calculated by the general formula N = 2 I.
For example let the audio encoding depth be 16 bit, in this case the number of audio volume levels is:
N = 2I = 2 16 = 65 536.
During the encoding process, each sound volume level is assigned its own 16-bit binary code, the smallest sound level will correspond to the code 0000000000000000, and the highest – 1111111111111111.
Digitized audio quality
Therefore, the higher the sample rate and depth of audio encoding, the better the digitized sound will sound and the better you can bring the digitized sound closer to the original sound.
The lowest quality of digitized sound, corresponding to the quality of telephone communication, is obtained at a sampling rate of 8000 times per second, a sampling rate of 8 bits, and by recording an audio track (“mono” mode).
But it should be remembered that devices that resemble speech synthesizers and speech coders are used to improve this sound in telephony. About speech coders, this article also















