sound coding strategies Archives

Free Download Mp4Gain

Sound coding

Sound is physical in nature. Any sound is vibrations in space (in this case, in the air), which are captured by our ears. The oscillations are continuous and can be described by mathematical models. We, of course, will not do this, but we will pose the question: how can vibrations of a continuous nature be written to a machine that operates only with zeros and ones?

Sound Coding

1.1. No compression, no loss

The WAV (WAVE) format preserves the audio track in its true quality, without any manipulation of the audio file itself.

To record sound, we need to convert it to a set of zeros and ones. In the case of the WAV format, this is done in the stupidest way: the incoming audio stream is divided into the smallest segments (how many, hence the terms “sample rate”, “sample rate” or “sample rate”). sampling “) and in each time interval the current value of the analog signal is written in binary. to form. WAV files can be recorded at sample rates of, for example, 8 kHz to 192 kHz, but the de facto standard is 44.1 kHz.

It should be noted that WAV, as a container, also supports other ways of storing audio information: for example, ADPCM which is capable, depending on the bandwidth, of encoding audio data with a variable sample rate.

The 44.1 kHz frequency was no accident. If we admit inaccuracies in the description, then this figure was produced as a statement of Kotelnikov’s theorem: to maintain the most correct waveform at frequencies up to 20 kHz (the theoretical limit of audibility of the human ear), a frequency of Sampling twice as high: 40 kHz. Actually, the frequency at 44.1 kHz is due to technical aspects, details of which can be read here.

In each segment of this type, the actual voltage of the analog signal is coded in binary form: the highest level can be represented as “1111”, the lowest as “0000”. And here the second parameter comes into play: the depth of the sound, which determines how precisely the wave value will be digitized over a period of time. WAV files are often written in 16 or 32 bit. Higher bit depth means more accurate recording.

By the way, about PCM. What is burning to an ordinary CD, so popular after audio cassettes? That is, a sequence of uncompressed zeros and ones in PCM format. Bit depth: 16 bits, Sampling frequency: 44.1 kHz. What then will be the bit rate of said recording?

A 16-bit number is written 44100 times per second. 44100 * 16 = 705600 bps for one channel;
for stereo recording, this value is multiplied by 2 – 1411200 bps or our ~ 1411 kbps;
for 32-bit recording, this value will be twice as high: ~ 2822 kbps.

Conclusion: hence the gluttony of these files for free space on the hard drive, but as a benefit: the total absence of losses when recording and listening to an audio file.

1.2. Lossless compression

I won’t write much about lossless compression. This term can be found here. In fact, this method generally consists of archiving audio recordings using algorithms built into the codec, but the data is not lost and the ability to restore the audio recording with bit-by-bit precision is maintained. By decoding these formats, we actually get the same WAVE format, only it takes up less disk space; compression is approximately twice and depends on the nature of the encoded composition. When listening to the recording, the codec “decompresses” the composition and sends a sequence of uncompressed zeros and ones for the sound card to process.

There are many codecs of this type: this is FLAC (Free Lossless Audio Codec), developed by Xiph organization (also developed Opus), ALAC (Apple Lossless) from the company of the same name, APE (Monkey’s audio), WV (WavPack) and other lesser known lossless audio compression formats.

1.3. Lossy compression: fooling our ears

Scientists began to think that, in principle, it often does not make sense to save complete information about an audio recording, since our hearing is imperfect. You may not hear soft sounds after loud sounds, you may not hear frequencies that are too high or too low, etc. These phenomena are called the masking effect.

As a result, we understood: after all, you can throw it here, cut it there, and the listener will practically not notice anything – an imperfect ear will simply give the listener the opportunity to fool himself. Therefore, it is possible to get rid of the psychoacoustic redundancy in the file.

Actually, psychoacoustics exists as a discipline and studies the psychological and physiological characteristics of human perception of sounds.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain