Digital audio encoding


Free Download Mp4Gain
picture

Digital audio encoding

Digital audio encoding

PC-based audio coding is based on the process of converting air vibrations into electrical current fluctuations and the subsequent sampling of an analog electrical signal.

DIGITAL AUDIO ENCODING

The encoding and reproduction of audio information is carried out using special programs. The quality of reproduction of the encoded sound depends on the sampling frequency and its resolution (sound encoding depth – the number of levels).

Digital audio is an analog audio signal represented by discrete numerical values ​​of its amplitude.

Sound digitization is a technology with a divided time step and subsequent recording of the values ​​obtained in numerical form. Another name for digitizing audio is analog to digital audio conversion, which includes the following operations:

Bandwidth limiting is done by using a low pass filter to suppress spectral components that are more than half the sample rate.

Time sampling, that is, replacing a continuous analog signal with a sequence of its values ​​at discrete moments of time: samples.

Level quantization is the replacement of the signal’s reference value with the closest value of a set of fixed values: quantization levels.

Encoding or digitization, as a result of which the value of each quantized sample is represented as a number corresponding to the ordinal number of the quantization level.

This is done as follows: a continuous analog signal is “cut” into sections with a sample rate, a discrete digital signal is obtained, which goes through the quantization process with a certain bit depth, and is then encoded, that is, it is replaced by a sequence of code symbols. To record sound in a 20-20,000 Hz frequency band, a sampling frequency of 44.1 and higher is required (today there are ADCs and DACs with a sampling frequency of 192 and even 384 kHz). To obtain a high-quality recording, 16-bit is sufficient, however, to expand the dynamic range and improve the quality of the sound recording, 24 (less often 32) bits are used.

Sound coding methods (of course an electrical signal coming from a microphone) are based on the fact that, theoretically, any complex sound can be decomposed into a sequence of simpler harmonic signals of different frequencies, each of which it is a sinusoid, called the spectrum of the original signal. The task of encoding sound, like any other analog signal, is to represent it in the form of another analog or digital signal, which is more convenient for its transmission or storage in each specific case. Real sound sources have a limited spectrum width, therefore, for encoding, transformation methods are used that transform the original signal into one, the spectrum of which is more suitable for transmission on the selected channel. Representing an analog signal as another analog signal is commonly referred to as modulation and digitally as encoding. This division is very arbitrary. An analog signal can be represented as a harmonic signal (that is, a sinusoid), the parameters of which change depending on the value of the original signal. In the event that the amplitude of the sinusoid changes with a change in the original signal, it is amplitude modulation (AM). If, depending on the value of the original signal, the frequency or phase of the sinusoid changes, we are dealing with frequency modulation (FM) or phase modulation (PM). Amplitude and frequency modulation, for example, is widely used to transmit sound by radio. These types of modulation, of course, are not the decomposition of the original signal into harmonics. The development of digital technology and the use of computer processing and information storage has led to the widespread use of pulse encoding or modulation methods. Such types of modulation are, for example, pulse code modulation, in which the value of the original signal at regular intervals is represented in code form. The vast majority of “computer sound” is precisely the recording of the binary code of the received signal in short equal time intervals, determined by the sampling frequency. For storage and transmission through communication channels, this signal is usually compressed (reducing the volume by discarding unnecessary or insignificant information). In addition to pulse code modulation, other types of digital modulation (pulse width, pulse frequency, etc.) are also used to encode sound.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Audio encoding.

Audio encoding.

AUDIO ENCODING

Digital audio is an analog audio signal represented by discrete numerical values ​​of its amplitude.

audio encodig

Sound digitization is a technology with a divided time step and subsequent recording of the values ​​obtained in numerical form.

Another name for digitizing audio is analog to digital audio conversion.

Sound digitization involves two processes:

sample (sample) a signal over time
amplitude quantification process.
Meanwhile, there is no need to worry about it. ”

Discretization of time.

Meanwhile, there is no need to worry about it. ”

The time sampling process is the process of obtaining the values ​​of the signal that is being converted, with a certain time step: the sampling step. The number of measurements of the magnitude of the signal, carried out in one second, is called the sampling frequency or the sampling rate, or sampling frequency (from the English “sampling” – “sampling”). The lower the sampling step, the higher the sampling frequency and the more accurate representation of the signal that we will obtain.

This is confirmed by Kotelnikov’s theorem (in foreign literature it is found as Shannon’s theorem, Shannon). According to him, an analog signal with a limited spectrum can be accurately described by a discrete sequence of values ​​of its amplitude, if these values ​​are taken with a frequency that is at least twice the highest frequency in the spectrum of the signal. That is, an analog signal in which the highest spectrum frequency is F m can be accurately represented by a sequence of discrete amplitude values ​​if F d> 2F m is satisfied for the sampling frequency F d.

In practice, this means that for the digitized signal to contain information on the full audible frequency range of the original analog signal (0 – 20 kHz), it is necessary that the selected sample rate be at least 40 kHz. The number of amplitude measurements per second is called the sampling rate (if the sampling step is constant).

The main difficulty of digitization is the inability to record the measured signal values ​​with perfect precision.

Analog to digital converters (ADC).

Meanwhile, there is no need to worry about it. ”

The above process of digitizing sound is done using analog-to-digital converters (ADCs).

This transformation includes the following operations:

Bandwidth limiting is done by a low pass filter to suppress spectral components that are more than half the sample rate.
Discretization in time, that is, substitution of a continuous analog signal with a sequence of its values ​​at discrete moments in time: samples. This problem is solved by using a special circuit at the input of the ADC – a sample and hold device.
Level quantization is the replacement of the signal’s reference value with the closest value of a set of fixed values: quantization levels.
Encoding or digitization, as a result of which the value of each quantized sample is represented as a number corresponding to the ordinal number of the quantization level.
This is done as follows: a continuous analog signal is “cut” into sections with a sample rate, a discrete digital signal is obtained, which goes through a quantization process with a certain bit depth, and is then encoded, that is, it is replaced by a sequence of code symbols. To record sound in a frequency band of 20-20,000 Hz, a sampling frequency of 44.1 and higher is required (today there are ADCs and DACs with a sampling frequency of 192 and even 384 kHz). To obtain a high-quality recording, 16 bits are sufficient, however, to expand the dynamic range and improve the quality of sound recording, 24 (less often 32) bits are used.

Meanwhile, there is no need to worry about it. ”

Encoding methods.

Frequency modulation.

Sound coding methods (of course we mean the electrical signal coming from the microphone) are based on the fact that, in theory, any complex sound can be broken down into a sequence of the simplest harmonic signals of different frequencies, each one of which is a sinusoid, called the original signal spectrum. The task of encoding sound, like any other analog signal, is to represent it in the form of another analog or digital signal, more convenient for its transmission or storage in each specific case.

Methods used to compress digital audio.

Methods used to compress digital audio.

Audio Encoding

Information compression methods when working with sound.

Audio Encoding

The larger the memory capacity of the WT card, the more realistic the sound will be (as more samples are stored in memory, they are recorded at a higher resolution). The General MIDI standard describes more than 200 instruments; To store your sound samples (tables), at least 8 MB of memory is required (at least 20 KB for each sample).

Known WF (Wave Form) method of sound generation, based on the transformation of sounds into complex mathematical formulas and the subsequent application of these formulas to control a powerful processor in order to reproduce the sound; from WF synthesis expect an even better reality (relative to FM and WT technologies) of musical instruments playing with limited volumes of sound files.

To reduce data flow, other analog (non-PCM) encoding methods are used. For example, a coding technique based on known characteristics of an analog signal is known to significantly reduce the amount of data stored; with the so-called -The encoding of the analog signal is converted into a digital code determined by the logarithm of the magnitude of the signal (and not by its linear transformation). The disadvantage of this method is the need to have a priori information about the characteristics of the original signal.

Conversion methods are known that do not require a priori information about the original signal. When differential pulse code modulation (DPCM, Differential Pulse Code Modulation) persists single signal difference between current and previous levels (the difference requires a digital representation of fewer bits than the full amplitude value). With delta modulation (DM, delta modulation), each sample consists of a single bit, which determines the sign of the change in the original signal (increase or decrease); Delta modulation requires a higher sample rate. Differential PCM technologies involve the accumulation of errors over time, so special measures are taken to periodically calibrate the ADC.

The most common when recording received audio is adaptive pulse code modulation (ADPCM, Adaptive Pulse Code Modulation), using 8- or 4-bit coding for the difference signals. The technology was first applied by Creative Labs and provides data compression up to 4: 1.

However, other audio information compression / decompression methods (software) are often used; Among them, the most popular lately is the MP3 format developed by Fraunhofer IIS (Fraunhofer Institute Integrierte Schaltungen, www.iis.fhg.de) and by THOMSON (the full specification of the MP3 format is published on the website www.mp3tech.org ). The full name of the MP3 standard sounds like MPEG-Audio Layer-3 (where MPEG is the essence of the Moving Picture Expert Group, not to be confused with the MPEG-3 standard designed for use in high definition television).

MP3 encoding of data occurs through the allocation of independent independent data blocks: frames. To do this, the original signal during encoding is divided into equal length parts, called frames, and encoded separately (to further reduce the amount of data, compression is applied using the Huffman algorithm); When decoding, the signal is formed from a sequence of decoded frames. The encoding process takes a significant amount of time; decoding (during playback) is done on the fly.

The MP3 format provides the best sound quality with the smallest file size. This is achieved by taking into account the peculiarities of human hearing, including the effect of masking a weak signal from one frequency range with a stronger signal from an adjacent range (when it occurs) or a strong signal from the previous frame, causing a temporary decrease in the ear’s sensitivity to the signal of the current frame (in other words, minor sounds are eliminated, which are not heard by the human ear due to the presence at this / previous moment of another – louder sound). It also takes into account the inability of most people to distinguish signals that are below a certain power level, different for different frequency ranges. This process is called adaptive coding, and it saves at least sound details that are meaningful from the point of view of human perception. The compression ratio (hence the quality) is not determined by the MP3 format, but by the width of the data stream during encoding.

Audio encoding and processing. Audio encoding

Audio encoding and processing. Audio encoding

Audio encoding

There are three main types of audio digits:

lossless & lossy audio encoding

format – no compression;
format (lossy) – lossy compression;
format (lossless): lossless compression.
Lossy compression: technology in which there is a significant reduction of the encoded file compared to the original, due to the removal of information that is not perceived by the human ear.

The downside of this technology is the fact that the compressed file will never be identical to the original.

Lossless – Lossless compressed audio formats, including:

FLAC (Free Lossless Audio Codec)
APE (mono audio)
WV (WavPack)
These formats are capable of converting CD to digital format while maintaining quality. As an example, you can take a CD, convert it to WAV, then WAV to FLAC, then go back from FLAC to WAV, and then burn it to a blank CD and you have an absolutely identical copy of your source.

What format does the music sound with the best quality?
The most popular is the lossless FLAC format, and one of the most widely used CD to FLAC conversion programs is EAC (Exact Audio Copy).

Of all the parameters of digital audio, it is necessary to pay attention first of all to the following indicators:

sampling rate (precision of digitizing an analog signal in time),
bit rate (the amount of information contained in the file in terms of one second).

The sample rate is the frequency at which digital audio is processed. The most common sample rate for quality audio formats is 44.1 kHz.

It is generally accepted that a high bit rate guarantees the best quality; this is true, but only if the source file is of good quality. A high-quality MP3 should have a bit rate of 320 kbps, but a high-quality FLAC format generally has a bit rate of 900 kbps or more.

What is the best quality music format?
In addition to the audio formats themselves, for high-quality music sound, high-quality reproduction equipment is also needed: speakers, amplifiers, headphones. In other words, if you use cheap desktop speakers and headphones, you won’t be able to fully enjoy high-quality sound and unleash the full potential of lossless formats.

Without going into technical details, the following formats can be recommended:

For listening at home, I recommend the best FLAC format in my opinion. For an audio player, the MP3 format with a bit rate of at least 320 kbps is a good solution. Personally, I only use the FLAC format on all devices, since the volume of the microSD cards allows you to store a sufficient amount of data on the player.

As for the equipment for high-quality music playback, I advise you to pay attention to the following brands:

If inexpensive acoustics do not suit you and you are a fan of high-quality sound equipment (Hi-Fi or Hi-End), then everything is in your hands and you are limited only by your budget, I will not give recommendations.

Audio encoding and processing. Audio encoding

There are three main types of audio digits:

format – no compression;
format (lossy) – lossy compression;
format (lossless): lossless compression.
Lossy compression: technology in which there is a significant reduction of the encoded file compared to the original, due to the removal of information that is not perceived by the human ear.

The downside of this technology is the fact that the compressed file will never be identical to the original.

Lossless – Lossless compressed audio formats, including:

FLAC (Free Lossless Audio Codec)
APE (mono audio)
WV (WavPack)
These formats are capable of converting CD to digital format while maintaining quality. As an example, you can take a CD, convert it to WAV, then WAV to FLAC, then go back from FLAC to WAV, and then burn it to a blank CD and you have an absolutely identical copy of your source.

What format does the music sound with the best quality?
The most popular is the lossless FLAC format, and one of the most widely used CD to FLAC conversion programs is EAC (Exact Audio Copy).

Of all the parameters of digital audio, it is necessary to pay attention first of all to the following indicators:

sampling rate (precision of digitizing an analog signal in time),
bit rate (the amount of information contained in the file in terms of one second).

The sample rate is the frequency at which digital audio is processed. The most common sample rate for quality audio formats is 44.1 kHz.

It is generally accepted that a high bit rate guarantees the best quality; this is true, but only if the source file is of good quality.

Audio encoding and processing.

Audio encoding and processing.

MP3 audio encoding process

Parameters that affect digital sound quality Minimum and maximum sound quality.

Audio encoding and processing

My grandfather was listening to a gramophone. My father’s youth turned to music coming from the speaker of a reel-to-reel tape recorder. The heyday and decline of cassette recorders fell upon my youth. My son is growing up in the age of digital audio. To keep up to date and give my son a good “sound”, I decided to find out what determines the quality of the digital audio signal reproduction.

I talked to my music loving friends. He did an information search on the Internet. As a result, I came to the conclusion that high-quality sound can be achieved in the digital age by choosing the right 7 basic elements of modern music centers:

the format in which the music is recorded;
player;
digital to analog converter;
amplifier;
acoustics;
cables;
food.

Below I will share my observations and conclusions on achieving high quality sound recordings in digital formats.

Lyrical digression, experts don’t need to read.

In a nutshell, I will explain where digital sound comes from. During the recording process, the microphone converts mechanical vibrations (the sound itself) into an analog electrical signal. An analog signal is, in the most general case, similar to a sinusoid that has been familiar to all of us since high school. In the age of analog sound, it was this signal that was recorded on various media and then played back.

With the development of microprocessor technology, it became possible to record and store audio information in digital formats. These formats are obtained through an analog-to-digital conversion (ADC) process.

During the ADC, the analog signal (our high school sine wave) becomes a discrete one (in other words, it is cut into pieces). In the next stage, the discrete signal is quantized, that is, each resulting segment of the sinusoid is assigned a digital value. In the third step, the quantized signal is digitized, ie encoded in the form of a sequence of 0 and 1. With respect to digital sound recording, the information about the amplitude and frequency of the sound is digitized.

To record and store digital audio information, digital audio formats are used. The audio format is understood as a set of requirements for the digital representation of audio data.

When it comes to sound quality, digital formats are divided into 3 categories:

Formats without additional compression (CDDA, DSD, WAV, AIFF, etc.);
Lossless compressed formats (FLAC, WavPack, ADX, etc.);
Lossy compression formats (MP3, AAC, RealAudio, etc.).

High-quality sound is obtained when playing music saved in formats of the first and second category. In the formats of the third category, to reduce the amount of data, part of the information is deliberately excluded. For example, information about hidden frequencies.

Latent frequencies are those that are outside the range of perception of the average person: 20 Hz – 22 kHz. For audiophiles, this range is wider due to individual psychophysiological characteristics.

To complete your home audio library, you must select records saved in files with the following extensions:

* .wav, * .dff, * .dsf, * .aif, * .aiff are uncompressed sound files;
* .mp4, * .flac, * .ape, * .wma are the most common lossless compressed audio files.
From history. They say that the first experiments on the preservation of sound were carried out by the ancient Greeks. They tried to keep the sound in amphorae. It looked something like this: words were spoken into the amphora and it was quickly sealed. Unfortunately, none of those records have survived to this day.

Digital Audio – Quality Issues

Digital Audio – Quality Issues

Digital Audio Quality

Relatively recently, the concept of “multimedia” was included in our discourse, and now the computer is increasingly used as an entertainment center. Now the computer is forced to reproduce the sound that exists in it in the form of numbers.

Digital Audio Quality issues

Just as some connoisseurs of sound argue about the advantages of “tube” sound over “transistor” sound, there is an endless debate about which is better: digital or analog sound. Let’s try to figure it out.

For our ears, sound is air vibrations with a frequency of 20 Hz to 20 kHz, and the upper limit depends on age: in children it is 22-24 kHz, and in old age the perceived frequency decreases, up to 8 -12 kHz.

The frequencies of the indicated limits are perceived as vibrations, higher, they are not perceived by a person.

However, not all the detection bandwidth is used with the same intensity, so speech is clearly perceived in the range of 500 to 3500 Hz. But for listening to music, this is not enough. Ideally, the reproduced sound should not differ from the sound field of the microphone. That is, the recording and playback equipment must not introduce distortions within the limits of human perception.

The sound we hear from the speaker is electromechanically converted to an electrical signal during recording; then there is the amplification and processing of the analog electrical signal; analog to digital conversion; digital signal processing; frequency correction; recording procedure.

After the digitized sound is stored and transmitted. During playback, digital signal processing occurs first; follows the conversion from digital to analog; analog signal processing and amplification; electromechanical conversion to sound vibrations.

All of these procedures introduce their own distortions. The process of recording and sound processing takes place, as a rule, on studio equipment, which performs much better than home audio equipment. Therefore, although there are distortions, they are significantly less than the distortions introduced by home equipment at the playback stage. With amateur sound recording, errors appear in the recording stages.

The electromechanical conversion produced by the studio microphone produces a very weak signal that needs amplification.

Even in the ideal conditions of a professional recording studio, due to acoustic noise, the dynamic range of recorded music can be narrower than that provided by 16-bit audio.

When recording from multiple microphones, the signal is necessarily processed: channel volume levels are selected, noise is filtered, etc. Furthermore, the dynamic range of the signal is reduced, which leads to a significant increase in noise. But without this procedure, it would sound unsatisfactory when playing back the recording on a home computer.

The sound path has its own distortions, which can be divided into three groups:

1. Linear distortions are caused by the amplitude-frequency characteristic of the sound path and are a change in the ratio of the amplitudes and phases of various frequency components. Frequencies that were originally missing from the signal do not appear.

2. Non-linear distortion: a change in the shape of the original signal, which leads to the appearance of frequencies that are absent in the incoming signal, but depend on it.

3. Interference: the appearance of strange frequencies in the sound path that are not associated with the useful signal. Interference appears, for example, by electromagnetic interference, penetration into the sound path of the frequency of the supply voltage, etc.

However, all these distortions occur only in analog circuits (hence speculation about the frequency response of a digital output makes specialists smile). But don’t forget about the superficial defects of CDs, DVDs, and other optical storage media that store sound, leading to data loss.

The digitization of the signal is also associated with a lot of distortion, but first let’s look at the difference between analog and digital signals.

In an analog signal, the voltage changes smoothly over time, the signal is continuous. The digital signal is discrete, its value changes instantly. Furthermore, discretion is manifested in both frequency and amplitude region. Any change in signal value is sampled, and as a result, the values ​​are rounded to the nearest whole number.

Audio encoding: secrets revealed

Audio encoding: secrets revealed

audio encoding

Audio settings for video capture and transmission.
As people directly related to the AV sphere, we constantly talk about audio coding and audio codecs, but what is it?

Audio Encoding

An audio codec is essentially a device or algorithm that can encode and decode a digital audio signal.

In practice, the audio waves that are transmitted over the air are continuous analog signals. The signals are converted to digital format by a device called an analog-to-digital converter (ADC), and the reverse conversion device is called a digital-to-analog converter (DAC). The codec is located between these two functions and it is it that allows you to adjust some important parameters for the successful capture, recording and transmission of an audio signal: codec algorithm, sample rate, bit depth and data transfer rate.

The three most popular audio codecs are Pulse-Code Modulation (PCM), MP3, and Advanced Audio Coding (AAC). The choice of codec determines the compression rate and the recording quality. PCM is a codec used by computers, CDs, digital phones, and sometimes SACD. The source of the PCM signal is sampled at regular intervals, and each sample is the digital magnitude of the analog signal. PCM is the simplest option for digitizing an analog signal.

With the correct parameters, this digitized signal can be completely converted back to analog without any loss. Unfortunately, this codec, which provides almost complete identity with the original audio, is not very cheap, which results in large files, and these files are not suitable for streaming. We recommend using PCM to record digital images for your sources or when doing audio post-processing.

Fortunately, we always have the option of choosing a different codec that can compress digital data (compared to PCM) based on some helpful observations on the behavior of sound waves. But in this case, you have to make a compromise: all alternative algorithms are associated with “losses”, since it is impossible to completely restore the original signal, but nevertheless the result is so good that most users will not be able to notice the difference.

MP3 is an audio encoding format that uses a digital data compression algorithm that allows you to save the audio signal in smaller files. The MP3 codec is the most used by users to record and store music files. We recommend using MP3 to stream audio content as it requires less network bandwidth.

AAC is a newer audio encoding algorithm that is the successor to MP3. AAC has become the standard for MPEG-2 and MPEG-4 formats. In fact, this is also a digital data compression codec, but with less quality loss than MP3, when encoded with the same bit rate. We recommend using this codec for online streaming.

Sampling frequency (kHz, kHz)
Sample rate (or sample rate): the frequency with which the signal is digitized, stored, processed, or converted from analog to digital. Time sampling means that the signal is represented by a number of its samples (samples) taken at regular intervals.

Measured in hertz (Hz, Hz) or kilohertz (kHz, kHz,) 1 kHz equals 1000 Hz. For example, 44100 samples per second can be labeled 44100 Hz or 44.1 kHz. The selected sample rate will determine the maximum playback frequency and, as follows from Kotelnikov’s theorem, to fully restore the original signal, the sample rate must be twice the highest frequency in the signal spectrum.

As you know, the human ear can pick up frequencies between 20 Hz and 20 kHz. Given these parameters and the values ​​shown in the table below, you can understand why 44.1 kHz was chosen as the sampling frequency for CD and is still considered a very good frequency for recording.

Sound file resolution. Audio encoding and processing

Sound file resolution. Audio encoding and processing

Digital audio

Basic concepts

udio encoding

The sampling frequency (f) determines the number of samples stored in 1 second;

1 Hz (one hertz) is one count per second,

and 8 kHz is 8000 samples per second

The encoding depth (b) is the number of bits required to encode the level of

Memory capacity for data storage 1 channel (mono)

(to store information about a sound with a duration of t seconds, encoded with a sampling rate of f Hz and a encoding depth of b bits, 1 bit of memory is required)
For 2-channel (stereo) recording, the amount of memory required to store data for one channel is multiplied by 2

I = f b t 2

Units of measurement I – bits, b – bits, f – Hertz, t – seconds Sampling frequency 44.1 kHz, 22.05 kHz, 11.025 kHz

Audio encoding
Basic theoretical provisions

Sound time sampling. In order for a computer to process sound, a continuous audio signal must be converted to a discrete digital form using time sampling. A continuous sound wave is divided into separate small time sections, for each section a certain value of sound intensity is set.

Therefore, the continuous dependence of the loudness of the sound at time A (t) is replaced by a discrete sequence of loudness levels. On the graph, this appears to replace a smooth curve with a sequence of “steps.”

Sampling frequency. A microphone connected to the sound card is used to record analog audio and convert it to digital format. The quality of the digital sound obtained depends on the number of measurements of the sound volume level per unit time, that is, sampling rate. The more measurements are made in 1 second (the higher the sampling frequency), the more accurately the “ladder” of the digital audio signal repeats the curve of the analog signal.

Audio sample rate is the number of measurements of the volume of a sound per second, measured in Hertz (Hz). Let us denote the sampling frequency with the letter f.

The audio sample rate can vary between 8000 and 48000 sound volume measurements per second. One of three frequencies is selected for encoding: 44.1 KHz, 22.05 KHz, 11.025 KHz.

Audio encoding depth. Each “step” is assigned a specific value for the sound volume level. Loudness levels can be seen as a set of possible states N, for which encoding a certain amount of information b is required, which is called the audio encoding depth.

Audio encoding depth is the amount of information required to encode the discrete volume levels of digital audio.

If the encoding depth is known, then the number of digital audio loudness levels can be calculated using the formula N = 2b. Let the audio encoding depth be 16 bit, then the number of sound volume levels is:

N = 2 b = 2 16 = 65 536.

During the encoding process, each sound volume level is assigned its own 16-bit binary code, the lowest sound level will correspond to the code 0000000000000000 and the highest – 1111111111111111.

The quality of digitized sound. The higher the sampling frequency and depth of the sound, the better the sound of the digitized sound. The lowest quality of digitized sound, corresponding to the quality of telephone communication, is obtained at a sampling rate of 8000 times per second, a sampling rate of 8 bits, and by recording an audio track (“mono” mode). The highest quality of digitized sound, corresponding to the quality of an audio CD, is achieved with a sampling rate of 48,000 times per second, a sampling rate of 16 bits and the recording of two audio tracks (stereo mode) .

Video codecs and containers.

Video codecs and containers.

Video Codec

This article is intended to refer here to those who are trying to “convert” something, without understanding what they are doing and why.

Video Codecs

To work as efficiently as possible with any object, you need to understand how it works. If the video file is for you a mysterious black box, inside which mysterious things happen, perhaps not without the help of black magic, then your effectiveness will be minimal.

So. All information on the computer is in the form of files. This, I hope, is not a surprise to anyone. Here we will start from this basic concept.

Any video file must be a container. A container is a repository of content. There are multi-structure storages – these are container formats. For example, a bento box is an example of a container. You can put sushi or tempura on it. What can you put in a video container? Well, at least image and sound, one at a time. This is a set without which there is nothing to do. What can you put to the maximum? The modern Matryoshka container allows you to put various video and audio tracks, text and graphic subtitles, fonts to display them, images and I don’t know what else.

Going back to the bento box example, note that miso cannot be poured into it; will flow in fig. Not all containers can accept all flows. There are compatibility restrictions that make life difficult.

Container examples: mpeg, avi, mkv, mp4, ogm, vob, mov, rm, divx, asf. You don’t have to look closely at the list to understand that these are standard file extensions. Of course. Because file = container.

Streams or tracks are stored inside the container. These streams have a format called a codec. And this difference must be understood with particular clarity. The container is a file format. And the codec is the stream format it contains. They are two independent things. Yes, there are some inextricably linked containers and codecs. For example, the Real Media container can only store real video and real audio streams. And vice versa, these formats cannot be stored in any other container (almost, as I have already been corrected). But they are still different concepts that should not be confused.

The codec concept usually includes the following aspects:
1) The actual data storage format.
2) Software that allows you to encode information in this format and / or decode it from it.

Examples of video codecs: divx, xvid, avc, x264, vp6, vp7, mpeg-1, mpeg-2, huffyuv.
Examples of audio codecs: mp3, ogg, ac3, aac.

While containers are generally distinguished by file extensions, codecs are distinguished by the four-character FourCC code.

The codec concept is usually associated with a kind of compression. Raw (uncompressed) streams also have their own formats, but they do not require decoding, and therefore the concept of codec is generally not applied to them.

Now let’s take a look at the most popular containers, codecs, and related issues. As a general rule, the problems we have are of two types: related to reproduction and related to editing.

MPEG is one of the oldest containers. It can store only video in mpeg-1 format and audio in mp2 format. And in a friendly way, with quite strict restrictions on the size of the image and the bitrate of the sound. Due to the age and primitiveness of the format, almost all players and publishers understand it. But for the same reasons, it became almost impossible to meet him. Nobody needs these things.

AVI is also quite old, but it is still a very useful container. It’s good because, again, all the players and all the editors get it. Almost all mpeg-based formats fit into it, as well as many that support them. The following video formats do not fit avi: avc (aka Nero AVC or Nero H.264), wmv below version 9, as well as any tinsel like actual video, which was originally designed to be incompatible with anything in the world. By sounds, supposedly anything, except Vorbis ogg.

OGM is where Vorbis ogg goes. Because the format was created on the basis of this very ogg. At the moment, he is practically ousted by the matryoshka because he can do the same, only better. It is also not compatible with any conventional software.

MKV is a nesting doll that can fit just about anything except flash video. But due to its complexity and versatility, it is still possible to do with it only things like: mount, look and dismount.

MP4 is actually modern MPEG. It only takes things that are compatible with the MPEG standard, but at the same time includes its latest updates.

Compressed audio encoding formats.

Compressed audio encoding formats.

audio encoding

MP3 (or rather, MPEG 1 Audio Level 3): no comment, compatible everywhere and by everyone, the lack of this “eternal” format is one: only two channels, which limits its use in cinema systems at home modern.
Multi-channel MP3 (5.1) MPEG 2 Audio Level 3.

audio encoding
WMA: Windows Media Audio, formally a better and more modern competitor to Microsoft’s mp3. It is not used much, although it is widely compatible with hardware.
OGG Vorbis is a best modern mp3 competitor from the open source community. Deprived of any license restrictions, it is used more and more frequently.
AAC: Advanced Audio Coding is Apple’s main audio format built into all of its iPads, iPhones, iTunes, etc. The main advantage is that it is technically more advanced than mp3, allowing sample rates of up to 96 kHz and theoretically a completely insane number of channels in one file, up to 48. It is also used in digital satellite radio. Just as mp3 is a compressed format, the quality of 96Kbps AAC is comparable to the quality of 128Kbps of mp3 (we are talking about two channels in both cases).
Dolby Digital (AC-3) is probably the most popular standard for digital audio in cinematography, due to the fact that it appeared on the market as early as 1995, it exists in two versions: DD2.0 (for high-quality stereo sound) and DD5 .1 – five full channels and one defective for a subwoofer. Players are compatible with all of them for obvious reasons, the bitrate is 640Kbps in all cases.
Dolby Digital Plus or E-AC-3 is an attempt to improve on the usual Dolby Digital, but the previous generation decoders and receivers do not support tracks in the Dolby Digital Plus format, the reasons for this are radical changes: the number of channels increased to 7.1, the bit rate – to 1, 7 Mbps This will not go through S / PDIF (when transmitting via such a cable, you will have to use downmix on DD5.1 ​​or on DTS with quality loss), but HDMI normally copes with Dolby Digital Plus as of version 1.3, you can find such tracks on Blu-Ray discs …
Dolby TrueHD – We practically have 8 tracks almost uncompressed at 96 KHz / 24 bits or 6 at 192 KHz / 24 bits, the total bit rate reaches 18 Mbit / sec, which requires decoding in the player and transmission to the receiver in the analog path, or using HDMI 1.3 or higher. For Blu-Ray, this audio coding system is optional.
DTS is a lossy digital audio coding system for cinemas, which later appeared on DVD, it is analogous to Dolby Digital 5.1, but somewhat more flexible, allowing in addition to 2.0 and 5.1 to use other schemes, such as 4.0 and 4.1, there is also a choice between two fixed bit rates of 1500 Kbps and 750 Kbps. In the first case, DTS clearly outperforms Dolby Digital in sound quality; in the second, the difference between systems is controversial.
DTS-HD is a further evolution of DTS, the number of channels has been brought to 7.1 in 96KHz / 24bit mode, the bit rate can be selected between 6Mbps and 3Mbps, it is an optional audio format for Blu-Ray. The situation with the sound transmission to the receiver is almost the same as with DolbyTrueHD.

Lossless or uncompressed compressed audio encoding formats.

LPCM is simply uncompressed audio. It is usually stereo. It should not be confused with a WAV file, it is a container and there may be something other than PCM WAV inside.
APE is a specific lossless audio compression format. Loved by audiophiles.
Flac is its competitor and analog, the differences between them are beyond the scope of this review.
Lossless audio
Lossless apple

Subtitle formats.
SRT: text format, can be attached as a separate file with the same extension. Compared to the first versions of this format, the design possibilities have been significantly increased. It can also exist within MKV.
SUB / IDX is a graphic subtitle format extracted from DVD. It can fit MKV or MP4.
s2k, ssa, ass: some more advanced text formats, ass can be placed inside MKV.
smi is a textual format based on SGML, the direct ancestor of HTML.
PGS is a graphical subtitle format, the main one for Blu-Ray, but it can also exist in ts and MKV containers.