audio encoding formats explained Archives

Audio encoding.

Free Download Mp4Gain

Audio encoding.

AUDIO ENCODING

Digital audio is an analog audio signal represented by discrete numerical values of its amplitude.

audio encodig

Sound digitization is a technology with a divided time step and subsequent recording of the values obtained in numerical form.

Another name for digitizing audio is analog to digital audio conversion.

Sound digitization involves two processes:

sample (sample) a signal over time
amplitude quantification process.
Meanwhile, there is no need to worry about it. ”

Discretization of time.

Meanwhile, there is no need to worry about it. ”

The time sampling process is the process of obtaining the values of the signal that is being converted, with a certain time step: the sampling step. The number of measurements of the magnitude of the signal, carried out in one second, is called the sampling frequency or the sampling rate, or sampling frequency (from the English “sampling” – “sampling”). The lower the sampling step, the higher the sampling frequency and the more accurate representation of the signal that we will obtain.

This is confirmed by Kotelnikov’s theorem (in foreign literature it is found as Shannon’s theorem, Shannon). According to him, an analog signal with a limited spectrum can be accurately described by a discrete sequence of values of its amplitude, if these values are taken with a frequency that is at least twice the highest frequency in the spectrum of the signal. That is, an analog signal in which the highest spectrum frequency is F m can be accurately represented by a sequence of discrete amplitude values if F d> 2F m is satisfied for the sampling frequency F d.

In practice, this means that for the digitized signal to contain information on the full audible frequency range of the original analog signal (0 – 20 kHz), it is necessary that the selected sample rate be at least 40 kHz. The number of amplitude measurements per second is called the sampling rate (if the sampling step is constant).

The main difficulty of digitization is the inability to record the measured signal values with perfect precision.

Analog to digital converters (ADC).

Meanwhile, there is no need to worry about it. ”

The above process of digitizing sound is done using analog-to-digital converters (ADCs).

This transformation includes the following operations:

Bandwidth limiting is done by a low pass filter to suppress spectral components that are more than half the sample rate.
Discretization in time, that is, substitution of a continuous analog signal with a sequence of its values at discrete moments in time: samples. This problem is solved by using a special circuit at the input of the ADC – a sample and hold device.
Level quantization is the replacement of the signal’s reference value with the closest value of a set of fixed values: quantization levels.
Encoding or digitization, as a result of which the value of each quantized sample is represented as a number corresponding to the ordinal number of the quantization level.
This is done as follows: a continuous analog signal is “cut” into sections with a sample rate, a discrete digital signal is obtained, which goes through a quantization process with a certain bit depth, and is then encoded, that is, it is replaced by a sequence of code symbols. To record sound in a frequency band of 20-20,000 Hz, a sampling frequency of 44.1 and higher is required (today there are ADCs and DACs with a sampling frequency of 192 and even 384 kHz). To obtain a high-quality recording, 16 bits are sufficient, however, to expand the dynamic range and improve the quality of sound recording, 24 (less often 32) bits are used.

Meanwhile, there is no need to worry about it. ”

Encoding methods.

Frequency modulation.

Sound coding methods (of course we mean the electrical signal coming from the microphone) are based on the fact that, in theory, any complex sound can be broken down into a sequence of the simplest harmonic signals of different frequencies, each one of which is a sinusoid, called the original signal spectrum. The task of encoding sound, like any other analog signal, is to represent it in the form of another analog or digital signal, more convenient for its transmission or storage in each specific case.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

Audio encoding and processing.

Audio encoding and processing.

Parameters that affect digital sound quality Minimum and maximum sound quality.

Audio encoding and processing

My grandfather was listening to a gramophone. My father’s youth turned to music coming from the speaker of a reel-to-reel tape recorder. The heyday and decline of cassette recorders fell upon my youth. My son is growing up in the age of digital audio. To keep up to date and give my son a good “sound”, I decided to find out what determines the quality of the digital audio signal reproduction.

I talked to my music loving friends. He did an information search on the Internet. As a result, I came to the conclusion that high-quality sound can be achieved in the digital age by choosing the right 7 basic elements of modern music centers:

the format in which the music is recorded;
player;
digital to analog converter;
amplifier;
acoustics;
cables;
food.

Below I will share my observations and conclusions on achieving high quality sound recordings in digital formats.

Lyrical digression, experts don’t need to read.

In a nutshell, I will explain where digital sound comes from. During the recording process, the microphone converts mechanical vibrations (the sound itself) into an analog electrical signal. An analog signal is, in the most general case, similar to a sinusoid that has been familiar to all of us since high school. In the age of analog sound, it was this signal that was recorded on various media and then played back.

With the development of microprocessor technology, it became possible to record and store audio information in digital formats. These formats are obtained through an analog-to-digital conversion (ADC) process.

During the ADC, the analog signal (our high school sine wave) becomes a discrete one (in other words, it is cut into pieces). In the next stage, the discrete signal is quantized, that is, each resulting segment of the sinusoid is assigned a digital value. In the third step, the quantized signal is digitized, ie encoded in the form of a sequence of 0 and 1. With respect to digital sound recording, the information about the amplitude and frequency of the sound is digitized.

To record and store digital audio information, digital audio formats are used. The audio format is understood as a set of requirements for the digital representation of audio data.

When it comes to sound quality, digital formats are divided into 3 categories:

Formats without additional compression (CDDA, DSD, WAV, AIFF, etc.);
Lossless compressed formats (FLAC, WavPack, ADX, etc.);
Lossy compression formats (MP3, AAC, RealAudio, etc.).

High-quality sound is obtained when playing music saved in formats of the first and second category. In the formats of the third category, to reduce the amount of data, part of the information is deliberately excluded. For example, information about hidden frequencies.

Latent frequencies are those that are outside the range of perception of the average person: 20 Hz – 22 kHz. For audiophiles, this range is wider due to individual psychophysiological characteristics.

To complete your home audio library, you must select records saved in files with the following extensions:

* .wav, * .dff, * .dsf, * .aif, * .aiff are uncompressed sound files;
* .mp4, * .flac, * .ape, * .wma are the most common lossless compressed audio files.
From history. They say that the first experiments on the preservation of sound were carried out by the ancient Greeks. They tried to keep the sound in amphorae. It looked something like this: words were spoken into the amphora and it was quickly sealed. Unfortunately, none of those records have survived to this day.

Digital Audio – Quality Issues

Digital Audio – Quality Issues

Relatively recently, the concept of “multimedia” was included in our discourse, and now the computer is increasingly used as an entertainment center. Now the computer is forced to reproduce the sound that exists in it in the form of numbers.

Just as some connoisseurs of sound argue about the advantages of “tube” sound over “transistor” sound, there is an endless debate about which is better: digital or analog sound. Let’s try to figure it out.

For our ears, sound is air vibrations with a frequency of 20 Hz to 20 kHz, and the upper limit depends on age: in children it is 22-24 kHz, and in old age the perceived frequency decreases, up to 8 -12 kHz.

The frequencies of the indicated limits are perceived as vibrations, higher, they are not perceived by a person.

However, not all the detection bandwidth is used with the same intensity, so speech is clearly perceived in the range of 500 to 3500 Hz. But for listening to music, this is not enough. Ideally, the reproduced sound should not differ from the sound field of the microphone. That is, the recording and playback equipment must not introduce distortions within the limits of human perception.

The sound we hear from the speaker is electromechanically converted to an electrical signal during recording; then there is the amplification and processing of the analog electrical signal; analog to digital conversion; digital signal processing; frequency correction; recording procedure.

After the digitized sound is stored and transmitted. During playback, digital signal processing occurs first; follows the conversion from digital to analog; analog signal processing and amplification; electromechanical conversion to sound vibrations.

All of these procedures introduce their own distortions. The process of recording and sound processing takes place, as a rule, on studio equipment, which performs much better than home audio equipment. Therefore, although there are distortions, they are significantly less than the distortions introduced by home equipment at the playback stage. With amateur sound recording, errors appear in the recording stages.

The electromechanical conversion produced by the studio microphone produces a very weak signal that needs amplification.

Even in the ideal conditions of a professional recording studio, due to acoustic noise, the dynamic range of recorded music can be narrower than that provided by 16-bit audio.

When recording from multiple microphones, the signal is necessarily processed: channel volume levels are selected, noise is filtered, etc. Furthermore, the dynamic range of the signal is reduced, which leads to a significant increase in noise. But without this procedure, it would sound unsatisfactory when playing back the recording on a home computer.

The sound path has its own distortions, which can be divided into three groups:

1. Linear distortions are caused by the amplitude-frequency characteristic of the sound path and are a change in the ratio of the amplitudes and phases of various frequency components. Frequencies that were originally missing from the signal do not appear.

2. Non-linear distortion: a change in the shape of the original signal, which leads to the appearance of frequencies that are absent in the incoming signal, but depend on it.

3. Interference: the appearance of strange frequencies in the sound path that are not associated with the useful signal. Interference appears, for example, by electromagnetic interference, penetration into the sound path of the frequency of the supply voltage, etc.

However, all these distortions occur only in analog circuits (hence speculation about the frequency response of a digital output makes specialists smile). But don’t forget about the superficial defects of CDs, DVDs, and other optical storage media that store sound, leading to data loss.

The digitization of the signal is also associated with a lot of distortion, but first let’s look at the difference between analog and digital signals.

In an analog signal, the voltage changes smoothly over time, the signal is continuous. The digital signal is discrete, its value changes instantly. Furthermore, discretion is manifested in both frequency and amplitude region. Any change in signal value is sampled, and as a result, the values are rounded to the nearest whole number.

Audio encoding: secrets revealed

Audio encoding: secrets revealed

audio encoding

Audio settings for video capture and transmission.
As people directly related to the AV sphere, we constantly talk about audio coding and audio codecs, but what is it?

Audio Encoding

An audio codec is essentially a device or algorithm that can encode and decode a digital audio signal.

In practice, the audio waves that are transmitted over the air are continuous analog signals. The signals are converted to digital format by a device called an analog-to-digital converter (ADC), and the reverse conversion device is called a digital-to-analog converter (DAC). The codec is located between these two functions and it is it that allows you to adjust some important parameters for the successful capture, recording and transmission of an audio signal: codec algorithm, sample rate, bit depth and data transfer rate.

The three most popular audio codecs are Pulse-Code Modulation (PCM), MP3, and Advanced Audio Coding (AAC). The choice of codec determines the compression rate and the recording quality. PCM is a codec used by computers, CDs, digital phones, and sometimes SACD. The source of the PCM signal is sampled at regular intervals, and each sample is the digital magnitude of the analog signal. PCM is the simplest option for digitizing an analog signal.

With the correct parameters, this digitized signal can be completely converted back to analog without any loss. Unfortunately, this codec, which provides almost complete identity with the original audio, is not very cheap, which results in large files, and these files are not suitable for streaming. We recommend using PCM to record digital images for your sources or when doing audio post-processing.

Fortunately, we always have the option of choosing a different codec that can compress digital data (compared to PCM) based on some helpful observations on the behavior of sound waves. But in this case, you have to make a compromise: all alternative algorithms are associated with “losses”, since it is impossible to completely restore the original signal, but nevertheless the result is so good that most users will not be able to notice the difference.

MP3 is an audio encoding format that uses a digital data compression algorithm that allows you to save the audio signal in smaller files. The MP3 codec is the most used by users to record and store music files. We recommend using MP3 to stream audio content as it requires less network bandwidth.

AAC is a newer audio encoding algorithm that is the successor to MP3. AAC has become the standard for MPEG-2 and MPEG-4 formats. In fact, this is also a digital data compression codec, but with less quality loss than MP3, when encoded with the same bit rate. We recommend using this codec for online streaming.

Sampling frequency (kHz, kHz)
Sample rate (or sample rate): the frequency with which the signal is digitized, stored, processed, or converted from analog to digital. Time sampling means that the signal is represented by a number of its samples (samples) taken at regular intervals.

Measured in hertz (Hz, Hz) or kilohertz (kHz, kHz,) 1 kHz equals 1000 Hz. For example, 44100 samples per second can be labeled 44100 Hz or 44.1 kHz. The selected sample rate will determine the maximum playback frequency and, as follows from Kotelnikov’s theorem, to fully restore the original signal, the sample rate must be twice the highest frequency in the signal spectrum.

As you know, the human ear can pick up frequencies between 20 Hz and 20 kHz. Given these parameters and the values shown in the table below, you can understand why 44.1 kHz was chosen as the sampling frequency for CD and is still considered a very good frequency for recording.

Sound file resolution. Audio encoding and processing

Sound file resolution. Audio encoding and processing

Digital audio

Basic concepts

udio encoding

The sampling frequency (f) determines the number of samples stored in 1 second;

1 Hz (one hertz) is one count per second,

and 8 kHz is 8000 samples per second

The encoding depth (b) is the number of bits required to encode the level of

Memory capacity for data storage 1 channel (mono)

(to store information about a sound with a duration of t seconds, encoded with a sampling rate of f Hz and a encoding depth of b bits, 1 bit of memory is required)
For 2-channel (stereo) recording, the amount of memory required to store data for one channel is multiplied by 2

I = f b t 2

Units of measurement I – bits, b – bits, f – Hertz, t – seconds Sampling frequency 44.1 kHz, 22.05 kHz, 11.025 kHz

Audio encoding
Basic theoretical provisions

Sound time sampling. In order for a computer to process sound, a continuous audio signal must be converted to a discrete digital form using time sampling. A continuous sound wave is divided into separate small time sections, for each section a certain value of sound intensity is set.

Therefore, the continuous dependence of the loudness of the sound at time A (t) is replaced by a discrete sequence of loudness levels. On the graph, this appears to replace a smooth curve with a sequence of “steps.”

Sampling frequency. A microphone connected to the sound card is used to record analog audio and convert it to digital format. The quality of the digital sound obtained depends on the number of measurements of the sound volume level per unit time, that is, sampling rate. The more measurements are made in 1 second (the higher the sampling frequency), the more accurately the “ladder” of the digital audio signal repeats the curve of the analog signal.

Audio sample rate is the number of measurements of the volume of a sound per second, measured in Hertz (Hz). Let us denote the sampling frequency with the letter f.

The audio sample rate can vary between 8000 and 48000 sound volume measurements per second. One of three frequencies is selected for encoding: 44.1 KHz, 22.05 KHz, 11.025 KHz.

Audio encoding depth. Each “step” is assigned a specific value for the sound volume level. Loudness levels can be seen as a set of possible states N, for which encoding a certain amount of information b is required, which is called the audio encoding depth.

Audio encoding depth is the amount of information required to encode the discrete volume levels of digital audio.

If the encoding depth is known, then the number of digital audio loudness levels can be calculated using the formula N = 2b. Let the audio encoding depth be 16 bit, then the number of sound volume levels is:

N = 2 b = 2 16 = 65 536.

During the encoding process, each sound volume level is assigned its own 16-bit binary code, the lowest sound level will correspond to the code 0000000000000000 and the highest – 1111111111111111.

The quality of digitized sound. The higher the sampling frequency and depth of the sound, the better the sound of the digitized sound. The lowest quality of digitized sound, corresponding to the quality of telephone communication, is obtained at a sampling rate of 8000 times per second, a sampling rate of 8 bits, and by recording an audio track (“mono” mode). The highest quality of digitized sound, corresponding to the quality of an audio CD, is achieved with a sampling rate of 48,000 times per second, a sampling rate of 16 bits and the recording of two audio tracks (stereo mode) .

Video codecs and containers.

Video codecs and containers.

Video Codec

This article is intended to refer here to those who are trying to “convert” something, without understanding what they are doing and why.

Video Codecs

To work as efficiently as possible with any object, you need to understand how it works. If the video file is for you a mysterious black box, inside which mysterious things happen, perhaps not without the help of black magic, then your effectiveness will be minimal.

So. All information on the computer is in the form of files. This, I hope, is not a surprise to anyone. Here we will start from this basic concept.

Any video file must be a container. A container is a repository of content. There are multi-structure storages – these are container formats. For example, a bento box is an example of a container. You can put sushi or tempura on it. What can you put in a video container? Well, at least image and sound, one at a time. This is a set without which there is nothing to do. What can you put to the maximum? The modern Matryoshka container allows you to put various video and audio tracks, text and graphic subtitles, fonts to display them, images and I don’t know what else.

Going back to the bento box example, note that miso cannot be poured into it; will flow in fig. Not all containers can accept all flows. There are compatibility restrictions that make life difficult.

Container examples: mpeg, avi, mkv, mp4, ogm, vob, mov, rm, divx, asf. You don’t have to look closely at the list to understand that these are standard file extensions. Of course. Because file = container.

Streams or tracks are stored inside the container. These streams have a format called a codec. And this difference must be understood with particular clarity. The container is a file format. And the codec is the stream format it contains. They are two independent things. Yes, there are some inextricably linked containers and codecs. For example, the Real Media container can only store real video and real audio streams. And vice versa, these formats cannot be stored in any other container (almost, as I have already been corrected). But they are still different concepts that should not be confused.

The codec concept usually includes the following aspects:
1) The actual data storage format.
2) Software that allows you to encode information in this format and / or decode it from it.

Examples of video codecs: divx, xvid, avc, x264, vp6, vp7, mpeg-1, mpeg-2, huffyuv.
Examples of audio codecs: mp3, ogg, ac3, aac.

While containers are generally distinguished by file extensions, codecs are distinguished by the four-character FourCC code.

The codec concept is usually associated with a kind of compression. Raw (uncompressed) streams also have their own formats, but they do not require decoding, and therefore the concept of codec is generally not applied to them.

Now let’s take a look at the most popular containers, codecs, and related issues. As a general rule, the problems we have are of two types: related to reproduction and related to editing.

MPEG is one of the oldest containers. It can store only video in mpeg-1 format and audio in mp2 format. And in a friendly way, with quite strict restrictions on the size of the image and the bitrate of the sound. Due to the age and primitiveness of the format, almost all players and publishers understand it. But for the same reasons, it became almost impossible to meet him. Nobody needs these things.

AVI is also quite old, but it is still a very useful container. It’s good because, again, all the players and all the editors get it. Almost all mpeg-based formats fit into it, as well as many that support them. The following video formats do not fit avi: avc (aka Nero AVC or Nero H.264), wmv below version 9, as well as any tinsel like actual video, which was originally designed to be incompatible with anything in the world. By sounds, supposedly anything, except Vorbis ogg.

OGM is where Vorbis ogg goes. Because the format was created on the basis of this very ogg. At the moment, he is practically ousted by the matryoshka because he can do the same, only better. It is also not compatible with any conventional software.

MKV is a nesting doll that can fit just about anything except flash video. But due to its complexity and versatility, it is still possible to do with it only things like: mount, look and dismount.

MP4 is actually modern MPEG. It only takes things that are compatible with the MPEG standard, but at the same time includes its latest updates.

Compressed audio encoding formats.

Compressed audio encoding formats.

audio encoding

MP3 (or rather, MPEG 1 Audio Level 3): no comment, compatible everywhere and by everyone, the lack of this “eternal” format is one: only two channels, which limits its use in cinema systems at home modern.
Multi-channel MP3 (5.1) MPEG 2 Audio Level 3.

audio encoding
WMA: Windows Media Audio, formally a better and more modern competitor to Microsoft’s mp3. It is not used much, although it is widely compatible with hardware.
OGG Vorbis is a best modern mp3 competitor from the open source community. Deprived of any license restrictions, it is used more and more frequently.
AAC: Advanced Audio Coding is Apple’s main audio format built into all of its iPads, iPhones, iTunes, etc. The main advantage is that it is technically more advanced than mp3, allowing sample rates of up to 96 kHz and theoretically a completely insane number of channels in one file, up to 48. It is also used in digital satellite radio. Just as mp3 is a compressed format, the quality of 96Kbps AAC is comparable to the quality of 128Kbps of mp3 (we are talking about two channels in both cases).
Dolby Digital (AC-3) is probably the most popular standard for digital audio in cinematography, due to the fact that it appeared on the market as early as 1995, it exists in two versions: DD2.0 (for high-quality stereo sound) and DD5 .1 – five full channels and one defective for a subwoofer. Players are compatible with all of them for obvious reasons, the bitrate is 640Kbps in all cases.
Dolby Digital Plus or E-AC-3 is an attempt to improve on the usual Dolby Digital, but the previous generation decoders and receivers do not support tracks in the Dolby Digital Plus format, the reasons for this are radical changes: the number of channels increased to 7.1, the bit rate – to 1, 7 Mbps This will not go through S / PDIF (when transmitting via such a cable, you will have to use downmix on DD5.1 or on DTS with quality loss), but HDMI normally copes with Dolby Digital Plus as of version 1.3, you can find such tracks on Blu-Ray discs …
Dolby TrueHD – We practically have 8 tracks almost uncompressed at 96 KHz / 24 bits or 6 at 192 KHz / 24 bits, the total bit rate reaches 18 Mbit / sec, which requires decoding in the player and transmission to the receiver in the analog path, or using HDMI 1.3 or higher. For Blu-Ray, this audio coding system is optional.
DTS is a lossy digital audio coding system for cinemas, which later appeared on DVD, it is analogous to Dolby Digital 5.1, but somewhat more flexible, allowing in addition to 2.0 and 5.1 to use other schemes, such as 4.0 and 4.1, there is also a choice between two fixed bit rates of 1500 Kbps and 750 Kbps. In the first case, DTS clearly outperforms Dolby Digital in sound quality; in the second, the difference between systems is controversial.
DTS-HD is a further evolution of DTS, the number of channels has been brought to 7.1 in 96KHz / 24bit mode, the bit rate can be selected between 6Mbps and 3Mbps, it is an optional audio format for Blu-Ray. The situation with the sound transmission to the receiver is almost the same as with DolbyTrueHD.

Lossless or uncompressed compressed audio encoding formats.

LPCM is simply uncompressed audio. It is usually stereo. It should not be confused with a WAV file, it is a container and there may be something other than PCM WAV inside.
APE is a specific lossless audio compression format. Loved by audiophiles.
Flac is its competitor and analog, the differences between them are beyond the scope of this review.
Lossless audio
Lossless apple

Subtitle formats.
SRT: text format, can be attached as a separate file with the same extension. Compared to the first versions of this format, the design possibilities have been significantly increased. It can also exist within MKV.
SUB / IDX is a graphic subtitle format extracted from DVD. It can fit MKV or MP4.
s2k, ssa, ass: some more advanced text formats, ass can be placed inside MKV.
smi is a textual format based on SGML, the direct ancestor of HTML.
PGS is a graphical subtitle format, the main one for Blu-Ray, but it can also exist in ts and MKV containers.

Audio encoding and processing

Audio encoding and processing

Audio processing

Sound information. Sound is a wave that travels through air, water, or other medium with a continuously changing intensity and frequency.

Audio processing

A person perceives sound waves (air vibrations) with the help of hearing in the form of sound of different volume and pitch. The higher the intensity of the sound wave, the louder the sound, the higher the frequency of the wave, the higher the pitch of the sound

The human ear perceives sound at a frequency of 20 vibrations per second (low sound) to 20,000 vibrations per second (high sound).

A person can perceive sound in a wide range of intensities, in which the maximum intensity is 10 14 times greater than the minimum (one hundred thousand billion times). To measure the volume of sound, a special unit “decibel” (dbl) is used (Table 5.1). Decreasing or increasing the sound volume by 10 dB corresponds to a decrease or increase in sound intensity by 10 times.

Table 5.1. Sound volume
Sound Volume in decibels
Lower limit of human ear sensitivity 0
Leaf whisper ten
Conversation 60
Horn 90
Jet engine 120
Pain threshold 140
Sound time sampling. In order for a computer to process sound, a continuous audio signal must be converted to a discrete digital form using time sampling. A continuous sound wave is divided into separate small time sections, for each section a certain value of sound intensity is set.

Sampling frequency. A microphone connected to the sound card is used to record analog sound and convert it to digital format. The quality of the digital sound obtained depends on the number of measurements of the sound volume level per unit of time, that is, the sampling frequency. The more measurements that are made in 1 second (the higher the sampling frequency), the more accurately the “ladder” of the digital audio signal repeats the curve of the dialogue signal.

Audio sample rate is the number of audio volume measurements in one second.

The audio sample rate can vary between 8000 and 48000 sound volume measurements per second.

Audio encoding depth. Each “step” is assigned a specific value for the sound volume level. Loudness levels of sound can be viewed as a set of possible states N, for which a certain amount of information I is required, which is called audio coding depth.

Audio encoding depth is the amount of information required to encode the discrete volume levels of digital audio.

If the known encoding depth, the number of digital audio volume levels can be calculated using the formula N = 2 I. Let the audio encoding depth be 16 bit, then the number of sound volume levels is:

N = 2 I = 2 16 = 65 536.

During the encoding process, each sound volume level is assigned its own 16-bit binary code, the lowest sound level will correspond to the code 0000000000000000 and the highest – 1111111111111111.

It should be remembered that the higher the quality of the digital sound, the greater the volume of information in the audio file. It is possible to estimate the volume of information of a digital stereo sound file with a duration of 1 second with an average sound quality (16 bits, 24,000 measurements per second). To do this, the encoding depth must be multiplied by the number of measurements in 1 second and multiplied by 2 (stereo sound):

16 bits × 24,000 × 2 = 768,000 bits = 96,000 bytes = 93.75 KB.

Sound editors. Sound editors allow you not only to record and play sound, but also to edit it.

Audio encoding: secrets revealed

Audio encoding: secrets revealed

audio encoding

Audio settings for video capture and transmission.
As people directly connected to the AV sphere, we constantly talk about audio coding and audio codecs, but what is it? An audio codec is essentially a device or algorithm that can encode and decode a digital audio signal.

Audio Encoding

In practice, the audio waves that are transmitted over the air are continuous analog signals. Signals are converted to digital format by a device called an analog-to-digital converter (ADC), and the reverse conversion device is a digital-to-analog converter (DAC). The codec is between these two functions and it is he who allows you to adjust some important parameters for the successful capture, recording and transmission of an audio signal: codec algorithm, sample rate, bit depth and data rate.

The three most popular audio codecs are Pulse-Code Modulation (PCM), MP3, and Advanced Audio Coding (AAC). The choice of codec determines the compression rate and the recording quality. PCM is a codec used by computers, CDs, digital phones, and sometimes SACD. The source of the PCM signal is sampled at regular intervals and each sample is the digital amplitude of the analog signal. PCM is the simplest option for digitizing an analog signal.

Fortunately, we always have the option of choosing a different codec that can compress digital data (rather than PCM) based on some helpful observations on the behavior of sound waves. But in this case, you have to make a compromise: all alternative algorithms are associated with “losses”, since it is impossible to completely restore the original signal, but nevertheless the result is so good that most users will not be able to notice the difference.

AAC is a newer audio encoding algorithm that is the successor to MP3. AAC has become the standard for MPEG-2 and MPEG-4 formats. In fact, this is also a digital data compression codec, but with less quality loss than MP3 when encoded with the same bit rate. We recommend using this codec for online streaming.

Sampling frequency (kHz, kHz)
Sample rate (or sample rate): the frequency with which the signal is digitized, stored, processed or converted from analog to digital. Time sampling means that the signal is represented by a number of its samples (samples) taken at regular intervals.

Measured in hertz (Hz, Hz) or kilohertz (kHz, kHz,) 1 kHz equals 1000 Hz. For example, 44,100 samples per second can be labeled 44,100 Hz or 44.1 kHz. The selected sample rate will determine the maximum playback frequency and, as follows from Kotelnikov’s theorem, to fully restore the original signal, the sample rate must be twice the highest frequency in the signal spectrum.

As you know, the human ear is capable of picking up frequencies between 20 Hz and 20 kHz. Given these parameters and the values shown in the table below, you can understand why 44.1 kHz was chosen as the sampling frequency for CD and is still considered a very good frequency for recording.

There are several reasons for choosing a higher sample rate, although it may seem like a waste of time and effort to reproduce sound outside the range of human hearing. At the same time, 44.1 – 48 kHz will suffice for the average listener for a high-quality solution to most problems.

Bit depth
Along with the sample rate, there is the bit depth or depth of the sound. Bit depth is the number of bits of digital information to encode each sample. Simply put, bit depth determines the “accuracy” of the input signal measurement. The larger the digit capacity, the smaller the error for each individual conversion from the magnitude of an electrical signal to a number and vice versa.

Most popular audio formats

There is a huge amount of audio formats. The most common are formats such as MP3 (MPEG-2 Audio Layer III) and WAV. Usually, the type of format corresponds to the file extension (the letters of the file name after the period, for example .mp3, .wav, .ogg, .wma).

A codec is an algorithm for encoding and compressing data in an audio format. Some file types are assigned a specific codec. For example, the MP3 format always uses the MPEG Layer-3 codec, while the MP4 format can use a range of different codecs.

Many times, the notions of codec and format are used as interchangeable. Especially when a format always uses a single codec. However, it is necessary to understand the difference between a format and a codec. In simple terms, a format can be compared to a container in which a sound or a video signal that uses a particular codec can be stored.

Some formats, such as MP4 or FLV, can store both audio and video sequences.

If you don’t know what program you should use to open one format or another audio, we recommend that you use our audio converter. It is compatible with almost all existing formats.

Depending on the type of compression, two types of codecs can be distinguished:

Lossless Codecs

This group of formats records and compresses a sound in such a way that it allows the preservation of its exact original quality when decoded.

The most common lossless coding formats are:

FLAC (Free Lossless Audio Codec – Audio codec, lossless and free),
APE (Monkey’s Audio – Mono Audio),
ALAC (Apple Lossless Audio Codec – Audio codec, lossless Apple).

Loss Codecs

When compressed with loss, a sound undergoes some modification. For example, compression cuts the sound frequencies that are inaudible to the human ear. When decoded, the file will be different from the original in terms of the information stored in it, but it sounds almost the same.

Some of the most common lossy formats are:

MP3
Wma
OGG
AAC

WAV is one of the first audio formats. It is mainly used to store uncompressed audio tracks (PCM) that are identical to audio CDs in terms of quality. On average, a minute of WAV format sound requires about 10 megabytes of memory. CDs are usually digitized in WAV format and then can be converted to MP3 with an audio converter.

MP3 (MPEG Layer-3) is the most widespread sound format in the world. MP3, like many of the other lossy formats, compresses the file size by reducing sounds inaudible to the human ear. Currently, MP3 is not the best format in terms of file size for sound quality, but since it is the most widespread and compatible with most devices, many people prefer to save their files in this format.

WMA (Windows Media Audio) is a format owned by Microsoft Corporation. It was initially introduced as the substitute for the MP3 format, with the highest compression characteristics. However, this fact has been compromised by some independent tests. In addition, the WMA format is compatible with data protection through DRM.

OGG is an open format that supports audio coding by several codecs. The Vorbis codec is the most commonly used in OGG. The quality of compression can be compared to the MP3 format, but it is less widespread in terms of compatibility with various audio players and devices.

AAC is a patented audio format system that has higher capabilities (number of channels, frequency of discretion) compared to the MP3 format. Usually, it achieves better sound quality with the same file size. AAC is currently one of the loss-coding algorithms that offers the highest quality. A file encoded with this format can have the following extensions: .aac, .mp4, .m4a, .m4b, .m4p, .m4r.

FLAC is a common lossless format. It does not modify the audio sequence and the sound encoded with this format is identical to the original. It is frequently used to reproduce sound in high-end audio systems. Its playback compatibility on devices and players is limited, therefore, if desired, it often becomes other formats before listening to it on a player.