Dissecting Audio Lossy Formats


Free Download Mp4Gain
picture

Dissecting Audio Lossy Formats: Technical Mechanisms and Trade-offs

Audio Lossy Formats
Audio Lossy Formats
Audio Lossy Formats
Audio Lossy Formats

Understanding Audio Compression

As an audio enthusiast, I have always been fascinated by the technology behind audio compression. Audio compression is the process of reducing the size of an audio file by removing or reducing redundant or irrelevant information. This is done to make the file smaller and more manageable, especially for streaming and other bandwidth-limited applications.
There are two types of audio compression: lossless and lossy. Lossless compression preserves all of the original audio data, while lossy compression removes some of the data to achieve a smaller file size. Lossy compression is the most common type of audio compression used today, and it is used in a wide range of applications, from music streaming services to podcasting.

Audio Compression Techniques

There are many different techniques used in audio compression, each designed to optimize audio quality and reduce file size. One of the most important techniques is perceptual coding, which involves analyzing the human perception of sound and using that information to remove or reduce irrelevant information.
Another important technique is psychoacoustic modeling, which is used to identify and remove sounds that are not perceptible to the human ear. As the book “The Art of Digital Audio” explains, “Psychoacoustic modeling is a technique that takes advantage of the limitations of human hearing to remove sounds that are not perceptible to the listener.”
In my experience, understanding these techniques and how they work together is essential for optimizing audio quality and reducing file size. By using the right combination of techniques, you can achieve excellent audio quality while minimizing file size.

Audio Compression Trade-offs

One of the key trade-offs of audio compression is the balance between audio quality and file size. As the book “The Audio Programming Book” explains, “The more you compress an audio file, the smaller it becomes, but the more audio quality you lose.”
In my experience, this trade-off is particularly important for musicians and sound engineers. By understanding the trade-offs between audio quality and file size, you can make informed decisions about how to compress your audio files for different applications.
Overall, dissecting audio lossy formats is essential for anyone working with audio. By understanding the technical mechanisms and trade-offs of audio compression, you can optimize your audio quality and file size, making it ideal for a wide range of applications.
Final words:
In conclusion, audio compression is a powerful technology that offers excellent audio quality at reduced file sizes. By understanding the techniques and technologies behind audio compression, you can optimize your audio quality and file size, making it ideal for streaming and other bandwidth-limited applications. And if you’re looking for a powerful tool to help you normalize and convert your audio and video files, be sure to check out mp4gain.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Lossy audio encoding. What is what?

Lossy audio encoding. What is what?

LOSSY AUDIO
.

The Evolution of Audio Coding

lossy compression

It’s 2020, it’s been years since the first MP3 encoder appeared. But just because most of us still calmly listen to MP3 music does not mean that progress has marked time all this time. And this applies not only to the development of the MP3 encoding algorithm, but also to the evolution of lossy audio encoding in general, in the form of newer and more advanced codecs that actually allow you to get better quality in a smaller size. . Formats like OGG Vorbis, AAC, WMA, Musepack have left behind outdated MP3 with its many limitations and flaws.

In parallel, lossless encoding is gaining momentum. But due to the large amount of data, today it is still not suitable for large-scale use, especially for portable devices with limited memory, for streaming on the network and only for quickly sharing music on the Internet (I must admit that not all 100 megabit internet access isn’t always at hand).

And so MP3 is out of date and definitely ready to be replaced. But what about the uninitiated user, but who wants to achieve the highest quality sound with the least amount of memory? After all, there are quite a few alternative codecs (at least 3 of them are really worthy of attention): Apple is promoting the AAC (Advanced Audio Coding, positioned as the successor to MP3) format through its iTunes Store, Microsoft, its own WMA (Windows Media Audio) license, moreover, OGG Vorbis is becoming more and more famous, and specially illustrated people even use a format like Musepack. Which of these codecs should I choose?

There is no definitive answer to this question, and that is why I am writing this article.

How to decide?

The choice of one or the other codec depends on the specific task. Namely:

1. From the equipment and software with which the sound will be reproduced. Those. on the availability of support for one or another audio format, as well as the quality of reproduction (it is advisable to be guided by it when choosing a bit rate).

2. Of the amount of memory that will be allocated to the final material. Accordingly, a higher or lower target quality / bit rate is selected.

And of course, in addition to the format and bit rate, you need to choose the optimal encoder and encoding parameters. It should be understood that different formats / encoders are displayed in different ways in different bit rate ranges.

Therefore, the algorithm is approximately the following:

1) Find out what formats the target device supports.
2) Determine how much space you can allocate for the audio material, as well as determine the total length of the audio intended for encoding.
3) Calculate the required bitrate using the formula: bitrate = disk_space (in kilobits) / total_time (in seconds).
4) According to the bitrate, choose the optimal one of the supported formats (more on this later).
5) Choose the best encoder and parameters for it.

More about our heroes

CAA

image

The development of psychoacoustics and data compression methods gradually led to the fact that the MP3 standard became “strict” for the implementation of new ideas in audio coding. As a result, in 1997, Fraunhofer IIS, which created MP3 in the early 1990s, as well as Dolby, AT&T, Sony, and Nokia, developed a new audio compression method: Advanced Audio Coding (AAC), which became a standard. . MPEG-2 and MPEG-4. The main differences from the MP3 standard are:
support for a wider range of audio formats (up to 48 channels) and sample rates (8 kHz to 96 kHz);
More efficient and simple filter bank: The hybrid MP3 filter bank has been replaced by the conventional MDCT (Modified Discrete Cosine Transform);
wider ranges of variation of the time-frequency resolution in the filter bank – eight times (in MP3 – three times) – led to an improvement in the encoding of transients (transients) and stationary sections of the audio signal;
better coding of frequencies above 16 kHz;
more flexible stereo encoding mode, allowing to switch to M / S (“joint stereo”) mode independently in different frequency bands;
Additional features of the standard that increase compression efficiency: time domain noise shaping technology (TNS), prediction of MDCT coefficients over time (long-term prediction), parametric stereo coding mode, synthesis of noise (perceptual noise replacement), high frequencies (SBR).

Thanks to these features, the AAC standard can achieve more flexible and efficient audio coding and therefore better quality. As a result of the widespread use of the MP3 format, the AAC standard has not yet acquired a popularity comparable to MP3. However, AAC is the main format on the popular iTunes Store, iPods, iTunes, iPhone, PlayStation 3, Nintendo Wii, and DAB + / DRM digital streams.
OGG Vorbis

image

Ogg Vorbis is a relatively new universal audio compression format that was officially released in the summer of 2002. It belongs to the same type of format as MP3, AAC, VQF and WMA, that is, lossy compression formats. The psychoacoustic model used in Ogg Vorbis is similar in principle to MP3 and similar ones, but only that the mathematical processing and practical implementation of this model are fundamentally different, allowing the authors to declare its format completely independent of all predecessors.
The main undeniable advantage of the Ogg Vorbis format is its total openness and freedom. In addition, it uses the latest and highest quality psychoacoustic model, so the bitrate / quality ratio is significantly lower than other formats. As a result, the sound quality is better, but the file size is smaller.
The format has many advantages. For example, the Ogg Vorbis format does not restrict the user to only two channels of audio (stereo: left and right). Supports up to 225 individual channels at a sample rate of up to 192 kHz and up to 32 bits (which no lossy compression format does), making Ogg Vorbis ideal for encoding 6-channel DVD-Audio. Additionally, the OGG Vorbis format has sample accuracy. This ensures that the audio data before encoding and after decoding will not have offsets or extra / missing samples to each other. This is easy to appreciate when you are encoding music endlessly (where one track gradually fades into another); in the end, the integrity of the sound will be preserved.
Streaming capacity is nowhere to be found, but this format has built it from the ground up. This gives the format a rather useful side effect: multiple songs can be stored in one file with their own tags. When loading such a file into the player, all songs should be displayed as having been loaded from several different files.
We should also mention a fairly flexible labeling system. The tag header can easily be expanded to include lyrics of any length and complexity (eg song lyrics) interspersed with images (eg album cover photo). Text labels are stored in UTF-8, allowing you to type in all languages ​​at the same time and eliminating potential problems with encodings. This is much more convenient than various tricks like id3 tags.
Ogg Vorbis uses a variable bitrate by default, while the latter is not limited to hard values ​​and can vary even by 1 kbps. It should be noted that the format does not strictly limit the maximum bit rate and with the maximum encoding setting it can range from 400 kbps to 700 kbps. The sample rate has the same flexibility: users can choose between 2000 Hz and 192000 Hz.
Ogg Vorbis was developed by the Xiphophorus community to replace all paid proprietary audio formats. Even though this is the youngest format of all MP3 competitors, Ogg Vorbis has full support on all known platforms (Windows, PocketPC, Symbian, DOS, Linux, MacOS, FreeBSD, BeOS, etc.), as well as a large number of hardware implementations. … The current popularity far exceeds all alternative solutions.
It is worth noting that Ogg Vorbis is only a small part of the Ogg Squish multimedia project, which also includes free encoders: Speex – for voice compression; FLAC: for lossless audio compression; Theora: for video compression.
Musepack

image
MusePack (mpp, mp +, mpc, MPEG +) is an unlicensed file format for storing audio information, distributed under the GNU General Public License.
The quality of MPC encoding at high bit rates (160 Kbps and above) is notably (if not significantly) higher than the quality provided by MP3.
Main advantages:
The format doesn’t do a second dct conversion, it doesn’t actually suffer from pre-echo artifacts, unlike formats like MP3, Vorbis, AAC, and WMA.
More efficient variable bit rate algorithms. If you track how the bit rate changes during MPC track playback, you will notice that for simpler sections the encoder assigns a lower bit rate, and for complex ones a much higher one, sometimes above 400 ( !) Kbps. An interesting fact is also worth mentioning: the MP3 encoder in VBR mode for silence assigns a bit rate of 32 kbps (at a sampling rate of 44100 Hz), AAC and OGG Vorbis – 2 kbps, Musepack encodes silence with minimal costs, <1 kbps / s (for example, one minute of silence will occupy about 514 bytes). All of this speaks to the extreme “frugality” of this encoder.
Powerful and flexible psychoacoustic model. Here we can mention, for example, a frame-based dynamic low-pass filter (in other encoders, a fixed bandwidth is set for each quality preset).
More advanced compression based on optimized Huffman tables (the same MP3 LAME wastes about 20% of the bit rate, only due to imperfect mathematical compression)

WMA

image

Windows Media Audio is a licensed file format developed by Microsoft for storing and transmitting audio information.

WMA was initially marketed as an alternative to MP3, but Microsoft now opposes AAC. Nominally, the WMA format is characterized by good compressibility, allowing it to “bypass” the MP3 format and compete on parameters with the Ogg Vorbis and AAC formats. But as independent tests, as well as subjective evaluation, showed, the quality of the formats is not yet exclusively equivalent, and the advantage even over MP3 is unequivocal, as Microsoft claims.

Format, encoder and parameter selection

Now straight to the heart of the matter.

To make your choice easier, I would like to share my experience gained in the course of numerous comparisons, auditions, as well as based on the analysis of the results of open hearing tests.

And so, next I will talk about the most suitable encoders for each case, as well as the correct choice of parameters. For the conversion, I recommend using foobar2000 (the converter settings are described in detail here), the parameters themselves are specified just for it. Additionally, foobar2000 has a host of useful DSPs that can be useful for audio pre-processing.

For those who are going to convert through the console or another program: the variable% s must be replaced with the name of the source file (or a similar variable) and% d with the name of the output file.

Note that for each bit rate range, the possible format options are indicated: the first is the highest priority. If your player doesn’t support the first option, please pay attention to the next one, etc. As I already wrote, in fact today only three codecs deserve attention: these are AAC, OGG Vorbis and Musepack. WMA, on the other hand, due to its closed nature, does not differ in special quality, but still, in most cases, it is better than MP3. Since some of the alternatives are only compatible with WMA, I will make recommendations for each of the four formats.

About bit rates: It should be understood that the optimal encoding mode is called. True VBR, ie target quality mode, not bit rate. Ideally, the result is a track with variable bit rate, but constant quality (don’t equate the two, more complex parts of a track need more bits to maintain quality). Therefore, the output bit rate is difficult to predict. Therefore, the bitrate values ​​below are indicated only as approximate, if possible, as an average for a large number of compositions of varying complexity.

Mentioned in this article, as well as some other encoders, with Russian descriptions of the main parameters and recommendations can be found here.

Ultra-low bit rates (~ 25-40 kbps)

This range is ideal for encoding audiobooks. And here there can only be one option: AAC, or rather, Nero AAC. The parameters are as follows:

-lc -q 0.35 -ignorelength -if – -of% d

In this case, the material must be pre-converted to mono and resampled at 22050 Hz (preferably using a SoX resampler). At the output, we get the usual low complexity AAC with a bit rate of about 25 kbps.

There are also options for music in this range:

1) Nero AAC. No conversions are needed here:

-q 0.15 -ignorelength -if – -of% d

On the output – High efficiency AAC v2 (with parametric stereo and HF synthesis), ~ 35 kbps. A great option for internet radio. Only here we must not forget that the decoder in the player must be compatible with HE-AACv2, otherwise you will get a complete absence of HF and monophony.

2) OGG Vorbis AoTuV – This modification of libvorbis includes improvements to the low bitrate encoding algorithm and even without SBR technology it is not much inferior to HE-AACv2. Command line:

-s% r -Q -q-2 – -o% d

Resulting files must be fully compatible with standard OGG Vorbis decoders. Bit rate – similar – around 35 kbps.

3) WMA 10 Pro. For such cases Microsoft also has something like SBR (high frequency synthesis), it doesn’t sound as bad as it could. It is true that the bit rate is slightly off limits: 48 kbps.

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 48_44_2_16 -input% s -output% d

Note that older decoders (especially “hardware”) do not support WMA 10. In this case, you can use WMA 9.2 (the same encoder), however, its quality at low bit rates is much worse.

-silent -a_codec WMA9STD -a_mode 3 -a_setting 48_44_2 -input% s -output% d

Low bit rate, ~ 64 kbps

Initially, I thought about going straight to higher speeds. But since hydrogenaudio.org recently ran an encoder comparison at this bitrate, it’s a sin to lose it.

1) QuickTime AAC is the winner (except for the newly created Opus / CELT) of the same test. The following are the QAAC encoder settings:

-s -v 64 –he -q 2 –ignorelength – -o% d

The output is HE-AAC (with SBR, but not parametric stereo), which should be compatible with various iPods and the like.

2) OGG Vorbis AoTuV – although it turned out to be quite far from QAAC, but still:

-s% r -Q -q0 – -o% d

3) And just in case WMA 10 Pro:

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 64_44_2_16 -input% s -output% d

For older decoders – WMA 9 standard:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 64_44_2 -input% s -output% d

Slightly higher, ~ 80-100 kbps

And I already consider this bitrate due to Vorbis.

1) As tests have shown, the OGG Vorbis AoTuV encoder is best suited to it:

-s% r -Q -q1 – -o% d

2) Nero AAC: a very good result. In places where the highs are not as pronounced, it can sound even better than Vorbis (in the highs it loses due to synthesis).
30 -ignorelength -if – -of% d The

profile used is HE-AAC.

De facto standard, 128 kbps

Interesting fact: many people argue that for MP3 128 kbps – “edge bit rate”, which starts the quality indistinguishable from the original. Maybe this is so … for plastic Chinese speakers with blatnyak. Actually, this threshold is around 200 kbps, and newer formats provide more stable quality at this bit rate.

Modern encoders managed to cut this level from 128 kbps to almost half (again, according to the developers). But nevertheless, if you have more or less decent acoustics (or headphones), the difference can be captured in complex snippets even at 128 kbps.

Audio formats

Before going through the different audio formats to identify the best ones for you, it seems right to try to make you understand what digital audio is. In short, it is nothing more than a representation of real sounds through a chain of zeros and ones. The more there is in a file, the closer the digital sound will be to what it represents.

Audio Formats

Better audio formats

It all started with Pulse-Code Modulation (PCM), created in 1937 and characterized by two properties: the sampling frequency to measure the amplitude of the waveform and the bit depth to measure possible digital values. It is basically the faithful conversion of analog audio into a digital file in which no compression is done. The result is a very large audio file, which takes up a lot of space.

Audio Formats

To remedy this, therefore, more or less compressed audio formats have been created that, depending on their characteristics, are divided into two different types: Lossless formats, that is, when the information contained in the final file is identical to that contained in the source file and therefore there is no loss of quality, and lossy formats, for which the information contained in the final file is less than that contained in the source file with the consequent loss of quality but in benefit of the space of necessary storage. For more details, continue reading, below you will find the different audio formats belonging to the categories in question indicated and explained.

Lossless (WAV, AIFF, FLAC and ALAC)

As I told you a few lines above, Lossless audio formats are those that are not compressed or that, despite being subjected to this type of treatment, the final quality remains practically unchanged with respect to the original audio. The main formats that belong to this category are the following: WAV, AIFF, FLACC, ALAC and APE. Let’s see its characteristics in detail.

WAV – An acronym for WAVEform audio file format, is a standard that was developed by Microsoft and IBM in 1991. It is the most popular category of apparent audio file format. It is not compressed and is essentially what you get when you rip audio from a music CD with your computer. It takes up a lot of space (1,411 kilobits of information per second of stereo music at 44,100 Hz / 16 bits), but it reproduces sounds faithfully. In terms of quality and quantity of information, it is similar to the AIFF format, which you will find explained below.
AIFF – Short for Audio Interchange File Format, it belongs mainly to the Mac world, it was developed by Apple based on the Electronic Arts Interchange File Format and is particularly suitable for audiophiles and music recorders. It basically has the same characteristics as the WAV format mentioned above, so it is not compressed, so it takes up a lot of space (1,411 Kilobits of information per second of stereo music at 44,100 Hz / 16 bits) and is capable of reproducing sounds with a lot of fidelity.

FLAC: is the abbreviation for Free Lossless Audio Codec. It is an open source codec that is often used to store music CDs on the computer without loss of quality and is compatible with most programs and devices. Compared to the formats that I have already told you about, it has a minimal degree of compression, but most people cannot perceive significant differences compared to a WAV or AIFF file.
ALAC – Short for Apple Lossless Audio Codec, is essentially Apple’s worldwide counterpart to the earlier FLAC format. The quality is good on average but the format is not as efficient as the FLAC in terms of weight. Then keep in mind the fact that not all gamers support it, so unless you have uniquely and exclusively Apple devices, it may not be the best solution to opt for.

Other important but less common audio formats that always belong to the Lossless calorie are Monkey’s Audio (APE) and OptimFROG (OFR). Its characteristics are more or less similar to those of the FLAC and ALAC formats.

Lossy (MP3, AAC, WMA, and Ogg Vorbis)

Now let’s move on to the audio formats belonging to the Lossy category, that is, those always subjected to compression that take up very little space but “sacrifice” a certain degree of audio quality. The main formats in this category are: MP3, AAC, WMA, and Ogg Vorbis. For more information, keep reading, you will find more details about it below.

MP3: in Full Moving Picture Expert Group-1/2 Audio Layer 3, also known as MPEG-1 Audio Layer III or MPEG-2 Audio Layer III.

What are the digital audio formats?

What are the digital audio formats?

PCM, Wav, Aiff. Compression. Mp3, Ogg, Wma.

Working with digital audio is almost a chore for puzzle specialists. Since audio is saved on the computer and all computer files have extensions, we have to interpret each acronym and abbreviation.

The extension is the end of the file after the name and period. It is used to know what type of file it is, whether it is a text, a video or an audio. There are many extensions and they are all sure to sound familiar to you: WAV, RM, MP3, WMA, OGG … Let’s play, then, to decipher puzzles and see what each of these acronyms means.

 

 UNCOMPRESSED DIGITAL AUDIO FILES

.PCM

It is not a file type or format, but a technique of transforming analog to digital audio without any compression. (1) Therefore, we do not see audios with the pcm extension. We work with PCM when digitizing, but we always keep files with one of these extensions:

.WAV: (Wave, wave in English)

It is the most widely used uncompressed digital audio format. It belongs to Microsoft / IBM.

.AIFF: (Audio Interchange File Format)

It is similar to WAV but for Apple Macintosh or MAC computers.

.CDA

: These are the audio tracks recorded on Compact Disc that also use the PCM system.

All uncompressed files are large. Approximately 10 megabytes for every minute of audio. These are the formats used to store audio at a professional level since the quality is very good. But when we don’t need that much quality and we’re short on space, it’s time to use file compression.

 AUDIO COMPRESSION

Compressing is reducing and whenever we reduce we lose something. The same is true for digital audio. The latest advances have allowed compression to be done with the least possible loss of quality, but there always are. Against that, much has been gained in reducing the size of the files.

While a 4-minute audio in WAV format takes approximately 40 megabytes, that same audio, compressed to MP3, can reduce its weight to 4 megabytes, 10 times less. And apparently, they sound the same. (2)

SAVE WITHOUT COMPRESSING

When working in production, it is always recorded in WAV, without compression. In that same way it is edited and mixed. If the final result of the edition is an audio to be uploaded on the Web or saved on the hard drive of a computer, we can compress it to mp3 but with a quality of no less than 160 kbps.

If, on the contrary, the production has as its final destination to be recorded on a CD, never compress, always leave the audio in WAV and burn it that way on the CD.

1. How does compression work?

It is not about wrinkling or crushing the audio. Most audio compression systems take advantage of a “defect” in our ears to reduce file size. It is called masking.

Masking is a property of the human ear that prevents it from distinguishing two frequencies close together within the same range, one masking the other. For example, if a sound with a frequency of 12 Khz and another of 12.2 Khz sounds at the same time in a song, we could remove one of the two without being noticed when listening to it.

In this way, the compressor “subtracts” the masked frequencies, which reduces the number of bytes. And fewer bytes in computing translates into smaller files, but not shorter. The song, when compressed, lasts as long as it is uncompressed.

2. Quality of compressed files

We saw in the previous question that digital audio has two parameters: the sampling frequency (the optimum is 44.1 Khz.) And the resolution or size of each sample (8 or 16 bits). By compressing, we add a third parameter to these two, the bitrate. It is the amount of kilobytes per second (kbps) and refers to the quality of the compression.

• A lower number of Kbps, more compression, smaller file size, but lower quality.

• A higher number of Kbps, less compression, larger file size and more quality.

A compressed audio at 128 Kbps has a higher compression level than a 256 Kbps one. That means that 128 is a smaller file and less quality than 256. Although you must have a cat’s ear to distinguish between both!

VARIABLE OR CONSTANT BIT

Some files have a constant bit rate per second (CBR Constant Bit Rate) and others have a variable one (VBR Variable Bit Rate). The constant is always the same for all audio, for example 128 kilobytes per second. In the variable method, what the compressor does is use more bits when there are parts of the audio where there are more frequencies and it cannot mask all of them.

 

COMPRESSED FILE FORMATS

Mp3 (MPEG-1 Audio Layer 3)

It achieves high compressions without much loss, although it all depends on the quality of the compression we use. 128 Kbps and below is not recommended.

Although mp3 is the most widely used compression standard, especially for audio on Web pages, the great drawback is its patent. So any player or editing software that wants to use it has to pay for it.

.OGG (Vorbis)

As a result of this patent, the Xiph.org Foundation developed in 2002 a completely free codec (5) for audio compression. Similar in characteristics to mp3, it is beginning to be used a lot on the Web and in some players since manufacturers do not have to pay the costs of the patent. At this point, it is difficult to completely replace the mp3 but it is eating up a lot of ground.

.AAC (Advanced Audio Coding)

The compression level is higher than mp3 (MPEG-1) without major loss of quality. AAC is one of the codecs used in the new MPEG-4 compression standard. This audio format is used in players like the iPod and in some of the new digital radio systems. AAC is shaping up to be the successor to the mp3.

.RAM (also RM or RA)

They are the files of the Real Network company for audio. The problem is that its reproduction and edition is very limited to software from the same company and few others.

.WMA (Windows Media Audio)

It is Windows’ bet on compressed formats. It is like a WAV, but smaller and less quality. While mp3 and ogg files are played by almost all players and editors, the same is not the case with wma files, so it is rarely used.

. AA3 (ATRAC – Adaptive Transform Acoustic Coding)

Format invented by Sony. It is the one used by minidisc recorder-players.