The Science Behind Digital Audio Compression

Free Download Mp4Gain

The Science Behind Digital Audio Compression

Digital audio compression is a complex topic that is often misunderstood. It is a process that reduces the size of digital audio files without affecting the overall quality of the sound. The goal of this article is to provide a comprehensive overview of the science behind digital audio compression, including its history, the different types of compression, and how it affects the quality of the sound.

The History of Digital Audio Compression

The history of digital audio compression can be traced back to the early 1990s when the first MP3 encoder was developed. MP3 stands for MPEG-1 Audio Layer 3 and is a method of compressing digital audio files. This compression method quickly gained popularity due to its ability to reduce file size without compromising the quality of the sound.

Since then, many different types of digital audio compression have been developed, each with its own set of advantages and disadvantages. However, they all work on the same principle of reducing the amount of data in the audio file while maintaining the overall quality of the sound.

The Different Types of Digital Audio Compression

There are two main types of digital audio compression: lossy and lossless. Lossy compression is the most common type of compression and is used in formats like MP3, AAC, and WMA. It works by removing parts of the audio file that are deemed less important to the overall quality of the sound.

Lossless compression, on the other hand, is used in formats like FLAC and ALAC. This method of compression works by compressing the file in a way that allows it to be decompressed back to its original form without losing any of the data. This means that the sound quality is preserved, but the file size is still reduced.

The Science Behind Digital Audio Compression

Digital audio compression works by reducing the amount of data in an audio file. The amount of data in an audio file is measured in bits per second (bps) or kilobits per second (kbps). The higher the bitrate, the better the quality of the sound. However, higher bitrates also mean larger file sizes.

Compression algorithms work by analyzing the audio data and removing parts that are not critical to the overall sound quality. These parts can include frequencies that are outside the range of human hearing or parts that are masked by other sounds in the file.

Once the compression algorithm has identified the parts of the file that can be removed, it uses a mathematical formula to compress the remaining data. This formula is designed to reduce the size of the file without affecting the overall quality of the sound.

The Effects of Compression on Sound Quality

The goal of digital audio compression is to reduce the size of the file without affecting the overall quality of the sound. However, compression can have some effects on sound quality, depending on the type of compression used and the bitrate of the original file.

Lossy compression, for example, can result in a loss of high-frequency information and dynamic range. This can lead to a loss of detail in the sound and a less natural-sounding reproduction of the original recording.

Lossless compression, on the other hand, preserves the original sound quality of the recording, but the resulting file sizes can still be quite large. This makes it less practical for use in situations where file size is a concern.

The Future of Digital Audio Compression

The future of digital audio compression is closely tied to the ongoing development of digital audio technology. As technology continues to improve, the potential for more efficient compression algorithms and higher quality sound reproduction is becoming a reality.

One of the most exciting developments in digital audio compression is the emergence of artificial intelligence (AI) and machine learning. These technologies have the potential to create compression

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

The compression algorithm of an Mp3.

In addition to the physiological structural properties of the human ear, the function of the brain also plays a very important role.

Mp3 compression algorithm

The pitch in the sound is determined by the fundamental tone, while the timbre is determined by the harmonics, and the human brain will automatically complete the fundamental tone, even if the fundamental tone does not exist. For example, the bandwidth of a telephone is only 300~3200 Hz, but when we listen to a man with a base tone of 120 Hz talking on the telephone, we can still hear his correct tone and will not confuse a boy with a girl. . .

We still don’t know how the brain uses complex calculations to reconstruct this non-existent tone.

PS Add a little visual easter egg, can you see what’s weird about this image?

(Please read the answer to the end)

…………………………………………………………………………………………………………………… ………… ……… …………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………………… ……………………………………………………………………………………………………………………………… ……………………………………………………………………………………………………………………………… ……………………………
_ Your vision~ amazing! The human body still has too many unknown magic eggs waiting to be excavated~~

The compression algorithm of an Mp3.

The birth of the MP3 compression algorithm is nonsense of human organs in the digital age. The whole algorithm is not improved around the math, but rather optimized around how to fool the human hearing organ.

Mp3 compression algorithm

So this algorithm is very curious, Baidu finally found information after a long time, and has a little understanding of the principle of it, so please record it.

basic principle
There is a special effect of shading effect on the human hearing model.
The role of the cochlea is as a spectrum analyzer, converting sound waves into signals of different frequencies. The villous cells at each specific location will be stimulated by a specific frequency, but when the basilar membrane leads to fluctuations, the villous cells around it will also be stimulated. That is, if there is a frequency with a high volume, and at the same time there is a relatively weak frequency near it, the sound of the relatively weak frequency will be covered by the relatively loud sound, and our human ears have no way to distinguish the sound There is another sound of a weaker frequency.

To the human ear, the perception characteristics of sound do not change on a linear frequency scale (human hearing is not that good), but can be expressed in a series of limited frequency bands called critical frequency bands. Simply put, the entire frequency band is divided into several segments, and in each frequency band the auditory perception of the human ear is the same, that is, the psychoacoustic characteristics are the same.
Then, according to this principle, the mp3 compression work can be simply divided into two parts:

The first step: dividing the original audio data into several subcritical frequency bands according to certain principles;

Step 2: Analyze the frequency spectrum according to the psychoacoustic model to find the masking effect curve. Then, according to this curve, each sub-frequency band is quantized separately, and finally the compression of the audio is below the masking effect curve.

In this way, mp3 compression is done. And it is surprising that mp3 is really compressed in the digital world, but it belongs to compression without distortion for human perception.

Compress mp3 with best quality

Reducing the size of MP3 audio files means creating extra space on your device for more audio files.

File storage and management is a major concern for all music lovers, DJs, podcasters, and musicians. In this case, the role of MP3 compression tools becomes very important. When you want to compress MP3 files online, there is a list of options because online tools are always free and easy to use. According to your requirements, you can choose the most suitable MP3 audio compression tool. So if you are looking for the best way to reduce MP3 file size, then read the details below.

Part 1: Best Ways to Reduce MP3 Audio Volume Without Compromising Quality

Although online MP3 compression tools are simple and convenient to use, they also have certain limitations. Since most of these tools are free to use, they only support a limited number of files and sizes and have no additional features.

Mp4Gain has a lot of additional functions, from the normalizer, to eplay gain, also equalizer, also modify the pitch without altering the speed and vice versa.

Because it is not just about converting, for example, between audio or video files, but about the possibility of obtaining a high quality result and for which we can modify the settings until we obtain exactly what we were looking for, in the sense of volume level. , quality, bit rate, sample rate, etc.

Because one of the most common current problems is finally getting the song or video we were looking for and it doesn’t sound or look like we need or want, and for that Mp4Gain is the software that offers the best options.

How is file compression done?

How is file compression done?

Audio and Video Compression

As there are many computers, their owners do not have enough memory on internal and removable drives to accommodate their data. The rapid growth of disk volumes does not solve this problem. If 10 years ago we did not have 20 megabytes on the hard drive, today 20 gigabytes are the same.

Audio and Video Compression

The size of the programs and data we use grows with the growth of hard drives. We can already afford to store a library of tens of thousands of books on our hard drive. But we can store music compositions on the hard drive for several hundred hours of sound and video, only a few tens of hours of viewing. Therefore, the problem of archiving or compressing data is still as urgent as it was 10 and 20 years ago.

How does information compression occur?

Let’s give you, as usual, a rough but understandable analogy. Data compression is similar to the production of powdered milk or dried fruit. That is, it is the process of removing water, which can then be added to give the product its original appearance.

And what kind of water can there be in the data? This water is informative. There are many repetitions in the data. This can be used to compress data.

For example, compressing text files goes something like this. A table of words and expressions found in the text is compiled. Then all the words and expressions in this table are given numbers. And all the text in the file is replaced with numbers from the word and expression table. This method allows you to reduce the size of a text file 2-3 times. Sometimes the text is compressed up to 10 times, if there are many repetitions in it.

A program that converts a text file into a “compressed” format is called a wrapper. And the file resulting from compression is called a packed or compressed file.

Compressed files are often called archives or archives, which is, strictly speaking, a misnomer. The files were originally called files that were created especially during backup processes. During this process, a single file was created, containing multiple source files and folders. This was the file. Compression was not performed. A similar situation still exists in the Linux operating system, where archiving and data compression are two independent processes. In the MS-DOS operating system, and later in MS Windows, the data compression programs of their early versions began to support both compression and data archiving, that is, they created a compressed file that contained not one, but several source files and folders (archived). … Since then, in these operating systems, the concept of ”

Since the archive file is not written in text format, text editors cannot work with it. Before opening the archive file with a text editor, this archive must be unzipped. The decompression is done using the same program: a filing cabinet. After unzipping, the text file takes on exactly the same look and size as before.

Text filing cabinets can also archive program files. Only programs are much less compressed than text.

The packers used to compress text and programs cannot efficiently compress audio, graphics, or video files. Other more complex algorithms have been developed for its compression. However, after unpacking, the resulting files differ slightly from the originals (this compression is called lossy compression). But the common human ear does not pick up on this and the common eye does not notice it on the monitor screen.

A brief history of filing cabinets
From what I remember, the first popular data file cabinet was the file cabinet named “ARJ”. Created archive files with a similar extension “ARJ”. It was in the late 80s, early 90s of the last century. These files are still in existence today. They are generally written in DOS encoding.

Then the two most popular archivers on the territory of the CIS appeared: “RAR” and “ZIP”. They are now represented by the “WinRAR” and “WinZIP” programs. Also, the “WinRAR” program can create both “RAR” and “ZIP” archives. And “WinRAR” can unzip files from a dozen formats. In this sense, “WinRAR” is for us a universal and convenient (but not free) archiver.

Audio compression formats

Audio compression formats

Audio Compression Formats

Now there are many audio compression formats that were originally developed for a computer, but later migrated to home appliances. Some of them are outdated and practically unused, some have appeared recently and have not had time to occupy their niche yet. Here I will focus only on the lossy compression formats that allow you to achieve the highest degree of compression of the audio data. What does “lossy compression” mean? Only after encoding from a .wav file to a compressed format, and then re-encoding from a compressed format to a .wav file, the original file and the final file will be different. Maybe not for the better.

audio compression formats

The compressed audio format means that there is practically no change in sound quality, despite the decrease in file size by several times. How do you manage to achieve such a result? The science of psychoacoustics answers this. The human brain is designed in such a way that we do not notice the whisper of books in the background of a conversation, although on a computer, with close listening, we can track this sound. So it turns out that it looks like it is, but it looks like it isn’t …

The combination of conventional data compression methods and the knowledge of what information is perceived by our brain and what is not, allows you to achieve a music compression ratio of up to 10 times with an acceptable sound quality. Below I have provided a brief overview of the most common and well-known music file compression formats that could be used to create a home music collection.

MP3
MPEG 1 Layer III (less often MPEG 2 Layer III), also sometimes called by people as incompetent MPEG 3 (this format does not exist), has been for many years the only association with the phrase “computer music” for many users. Developed in the late 1980s, the format, which allowed music to be compressed up to 10 times without a catastrophic loss of quality, quickly took root in home computers.

The optimal compression bit rate is approximately 192 Kb / s. Although everyone’s ears are different: someone distinguishes distortions better, someone worse. A decent minimum is 128 Kb / s. It is possible to use a variable bit rate. That is, at the moment when the range of sound frequencies is small, the bit rate decreases, and when many things sound at the same time, then, on the contrary, it increases. A constant bit rate greater than 320 Kb / s is often excessive and causes a loss of space. Also, the MP3 file includes a specific area header Id3 tag. Contains basic information about the file. There are 2 different versions of this tag. The second, consequently, is more extensive, but nothing revolutionary has been added. The sound quality of an MP3 file can vary greatly depending on the selected encoder and player.

MPEGplus / Musepack (MP + / MPC / MPP)
This encoder is similar in principle to MPEG Layer II (MP2), but uses a more advanced algorithm. Unlike most modern codecs, the goal of the creators of Musepack was not at all to achieve the highest possible quality at low bit rates. The format is best displayed at medium and high bit rates (typical file bit rate is usually in the 160-180 Kb / s range). A superb psychoacoustic model that uses VBR encoding for excellent sound quality. As a result, the codec performs better than most of its competitors at similar bit rates. The quality of the files obtained when compressed in MPC significantly exceeds the quality of similar MP3 files. One of the serious shortcomings of the current version of Musepack is the limitation of the file format: 44 kHz, 16 bit, stereo, which makes it inapplicable, for example, to compress audio tracks for DVD movies. If MP3 compatibility is not too important to you and you want the highest quality from the final file, choosing Musepack may be the ideal solution. Using this format is a real alternative to using lossless compression to encode music from CDs for those who are already disappointed with the possibilities of the MP3 format. and it is desirable that the quality of the final file is as high as possible, choosing Musepack may be the ideal solution. Using this format is a real alternative to using lossless compression to encode music from CDs for those who are already disappointed with the possibilities of the MP3 format. and it is desirable that the quality of the final file is as high as possible, choosing Musepack may be the ideal solution.

Mp3: Audio Compression.

Audio Digitization.

Sound is a continuous wave that propagates through air or other media, formed by
pressure differences, so that it can be detected by measuring the pressure level in a
point. Sound waves have the proper and measurable characteristics of waves in general,
such as reflection, refraction and diffraction. As it is a continuous wave, a
digitization process to represent it as a series of numbers. Currently, most of
the operations carried out on sound signals are digital, since both storage and
processing and transmission of the signal in digital form offers very significant advantages over
analog methods. Digital technology is more advanced and offers greater possibilities, less
sensitivity to transmission noise and ability to include error protection codes,
as well as encryption. With the appropriate decoding mechanisms, moreover, they can be treated
simultaneously signals of different types transmitted on the same channel. The disadvantage
main aspect of the digital signal is that it requires a much greater bandwidth than that of the signal
analog, hence an exhaustive study is carried out regarding data compression,
some of whose techniques will be the center of our study.
The digitization process consists of two phases: sampling and quantization. In the sampling,
Divide the time axis into discrete segments: the sampling frequency will be the inverse of time
that mediates between one measurement and the next. At this time the quantization is performed, which, in its
In the simplest way, it is simply to measure the signal value in amplitude and save it.

Nyquist’s theorem guarantees that the frequency necessary to sample a signal that has its
Higher components at a given frequency f is at least 2f. Therefore, the range being
higher than human hearing around 20 Khz., the frequency that guarantees a sampling
suitable for any audible sound will be about 40 Khz. Specifically, to get sound
High-quality frequencies of 44.1 Khz are used, in the case of CD, for example, and up to 48 Khz.
in the case of the DAT. Other typical values are submultiples of the first, 22 and 11 Khz. According to
nature of the application of course the appropriate frequencies can be much lower
such that the voice process is usually carried out at a frequency of between 6 and 20 Khz. or
even less. Regarding quantization, it is evident that the more bits used for the
axis division of amplitude, the “finer” the partition will be and therefore the less error in attributing
a concrete amplitude to the sound at every moment. For example, 8 bits offer 256 levels of
quantization and 16, 65536. The dynamic range of human hearing is about 100 dB. The
axis division can be performed at equal intervals or according to a certain density function,
looking for more resolution in certain sections if the signal in question has more components in a certain
intensity zone, as we will see in the coding techniques.
The complete process is usually called PCM (Pulse Code Modulation) and so we
We will refer to it hereinafter. It has been described in a very simplistic way, mainly
because it is widely discussed and is well known, being the field of study of
this work. However, we will go into detail at any time that is necessary for the
development of the exhibition.
1.2 Coding and Compression.
Before describing compression and encoding systems, we must pause briefly.
analysis of human auditory perception, to understand why a quantity
Significant information that the PCM provides can be discarded. The heart of the matter,
as far as we are concerned, it is based on a phenomenon known as masking.
The human ear perceives a frequency range between 20 Hz. And 20 Khz. First of all, the
sensitivity is higher in the area around 2-4 Khz., so that the sound is more
hardly audible the closer to the ends of the scale. Second is the
masking, whose properties exhaustively use the most interesting algorithms:
when the component at a certain frequency of a signal has high energy, the ear cannot
perceive lower energy components at close frequencies, both lower and higher. TO
a certain distance from the masking frequency, the effect is reduced so much that
negligible; the range of frequencies in which the phenomenon occurs is called the critical band
(critical band). Components belonging to the same critical band influence each other and
they do not affect nor are affected by those that appear outside it

Audio Data compression

Data compression or the technique that changed everything

Without pretending to extend ourselves in the description of this critical concept, it is important to know that compression is understood as a scheme that allows, by means of a “decision” algorithm based on a series of “rules” (which in the case of audio are masking and audibility threshold) reduce the amount of data to transmit a certain message. In other words: if the song “x” occupies, in the format used to encode the sound of a CD, 1 million bits, the data compression allows that song to be reproduced with maximum intelligibility using only 50,000 of those bits.

In this way, the download of a complete CD from a certain website could be carried out in a reasonable period of time. But, of course, the price to pay was high in terms of quality because such “castration” of the original message (which in turn was not “continuous”, analog, but also digital, although “linear”, without compression) meant removing many nuances of music, a disaster that in reality did not care for many consumers but it did worry, and a lot, those who bet on that High Fidelity in the reproduction of the sound that we are so passionate about and who received a wound that was almost fatal . In this sense, it is worth knowing that the “philosophical” keys to data compression are summarized in two terms: redundancy and irrelevance. In the first case, it is about reordering the available data to eliminate the ones that are repeated (for whatever reason: security, etc.), a bit like a “zip” computer file. It is a formal remodeling that does not affect the sound message at all (but it does save space to transmit / save data, making it very practical), so in this case, we are talking about lossless compression or “lossless” ” It is the second term that has the greatest scope in terms of sound quality because the idea of irrelevance implies deleting irrelevant data from a certain message. And, of course, who decides what is relevant or not? Well, an algorithm, a program that, obviously, can be more or less sophisticated but still makes decisions with which everyone will agree. It is easy to understand: what may be irrelevant to such a person and / or the team may not be so to someone else. The fact is that here musical information is deleted, which, fundamentally, can no longer be recovered. Well, the algorithms in which there are losses of musical information are known as “lossy” or lossless coding algorithms. From what has been said, it is easily deduced that the difference between the concepts “lossless” and “lossy” is the one that marks the border between high and low quality digital audio, between high resolution (with recording studio quality formats or “Studio Master” on the cusp) and that “practical” sound (in principle for portable players and cars) and very often unnatural formats like the once ubiquitous MP3, which, we insist, almost ruined with the improvements provided by the CD.
ADSL, the key to accessing High End audio via the Internet
Basically it was a purely technical progress that, logically, had to come. A progress that allowed breaking the limitations that prevented downloading a song recorded in PCM at 16 bits / 44’1 kHz and, over time, the files with much higher resolution than for a good decade and a half are the usual ones in studios of recording. So, thanks to ADSL, the High End in audio via the Internet, and therefore “without physical support” is available to everyone. At this point, it will be good to briefly review the small “soup” of acronyms with which we can find ourselves, otherwise the result of the availability of open and “closed” environments (Windows, Mac), in what CODEC’s (algorithms that compress and decompress data (in this case of music) refers to the fact that compression is the norm.

AAC (Advanced Audio Coding): It was designed to be the successor to MP3 and, although it is a lossy CODEC, the results in terms of sound quality are superior to those of MP3 for the same bit rate. The AAC has adopted a wide range of portable audio devices such as the iPod and its derivatives for use.
AIFF (Audio Interchange File Format): It is the version of WAV created by Apple. Works with uncompressed (ie “lossless”) files that maintain full resolution and size.

ALE (Apple Lossless Encoder), also known as ALAC (Apple Lossless Audio Codec): Uses lossless compression to save storage space. Once unzipped for listening, the file will be bit by bit identical to a full size WAV or AIFF encoded file. As in AIFF or FLAC, in ALE / A files

What is audio compression?

I have finally returned to the tutorials, we are going to talk about the compression of audio from the most basic to the most advanced, it is a subject that many as producers have had a hard time learning and understanding.

So what is audio compression and what can you do to help?

Basically, compression reduces the dynamic range of your recording by reducing the level of the loudest parts, which means that the noisy and silent parts are now closer together in volume and the natural volume variations are less obvious. The audio compressor unit can increase the overall level of this compressed signal.

So, the end result is that the quieter parts sound as if they had increased their volume to be closer to the louder parts. Dynamic changes in the volume of a recording are now under more control, and a side effect is that the overall level of the compressed recording can be increased within its mix. The recording will also be located within the entire mix much more easily.

What are the compression controls?

The compression device itself has many different controls that can affect the sound it is processing. We will review the main controls that are commonly found.

Input Gain
This controls the level of the signal entering the audio compressor.
Threshold
Compression reduces the overall level of the loudest parts of your recording. But how does the compressor know what part of the signal is “high” and what part of the signal is compressed? When setting the threshold.
The threshold sets the level at which the compressor starts and begins to change the recording dynamics. So, for example, if you set your threshold to -20 dB, everything below this level will not be affected by the compressor. But everything higher than this level (-20 dB) will be compressed.
Ratio
How much will the signal be compressed once it has exceeded this threshold? This is controlled with the relationship. The higher the ratio, the greater the compression.
The easiest way to show you how reason works is by showing you some numbers, if the ratio is 1: 1, there is no compression at all. On the other hand, if the ratio is set to 2: 1, for every 2 dB of sound that exceeds the threshold, you will get 1 dB of output above the threshold. So, if the signal exceeds the threshold by 10 dB, the compressor reduces this signal, so it is now 5 dB above the threshold.
If the ratio goes up to 8: 1, for every 8 dB of sound above the threshold you would get 1 dB of output above the threshold. Then, if the signal exceeds the threshold by 16 dB, the compressor reduces it, so only 2 dB exceeds the threshold.
Attack
This is the time it takes for the compressor to act on the input, once the sound level has exceeded the threshold. It is usually measured in milliseconds (ms).
Release
This is the time it takes for the compressor to let the signal return to normal once it has fallen below the threshold. Again, usually measured in ms.
Makeup
If the audio signal has been compressed, the overall level of the signal will be reduced. Increasing the output gain increases the level that comes out of the compressor, so the volume can more easily adapt to the levels of the rest of its tracks in its mix.
Knee
The soft compression of the knee is softer in the sound as it passes through the audio compressor: the change of uncompressed sound to compressed is softer. Hard knee compression is a more immediate and obvious effect.
Compressors are a very effective tool for us engineers, in the next post I will talk about the different types of compressors.