Lossy audio encoding. What is what?

.
The Evolution of Audio Coding

It’s 2020, it’s been years since the first MP3 encoder appeared. But just because most of us still calmly listen to MP3 music does not mean that progress has marked time all this time. And this applies not only to the development of the MP3 encoding algorithm, but also to the evolution of lossy audio encoding in general, in the form of newer and more advanced codecs that actually allow you to get better quality in a smaller size. . Formats like OGG Vorbis, AAC, WMA, Musepack have left behind outdated MP3 with its many limitations and flaws.
In parallel, lossless encoding is gaining momentum. But due to the large amount of data, today it is still not suitable for large-scale use, especially for portable devices with limited memory, for streaming on the network and only for quickly sharing music on the Internet (I must admit that not all 100 megabit internet access isn’t always at hand).
And so MP3 is out of date and definitely ready to be replaced. But what about the uninitiated user, but who wants to achieve the highest quality sound with the least amount of memory? After all, there are quite a few alternative codecs (at least 3 of them are really worthy of attention): Apple is promoting the AAC (Advanced Audio Coding, positioned as the successor to MP3) format through its iTunes Store, Microsoft, its own WMA (Windows Media Audio) license, moreover, OGG Vorbis is becoming more and more famous, and specially illustrated people even use a format like Musepack. Which of these codecs should I choose?
There is no definitive answer to this question, and that is why I am writing this article.
How to decide?
The choice of one or the other codec depends on the specific task. Namely:
1. From the equipment and software with which the sound will be reproduced. Those. on the availability of support for one or another audio format, as well as the quality of reproduction (it is advisable to be guided by it when choosing a bit rate).
2. Of the amount of memory that will be allocated to the final material. Accordingly, a higher or lower target quality / bit rate is selected.
And of course, in addition to the format and bit rate, you need to choose the optimal encoder and encoding parameters. It should be understood that different formats / encoders are displayed in different ways in different bit rate ranges.
Therefore, the algorithm is approximately the following:
1) Find out what formats the target device supports.
2) Determine how much space you can allocate for the audio material, as well as determine the total length of the audio intended for encoding.
3) Calculate the required bitrate using the formula: bitrate = disk_space (in kilobits) / total_time (in seconds).
4) According to the bitrate, choose the optimal one of the supported formats (more on this later).
5) Choose the best encoder and parameters for it.
More about our heroes
CAA
image
The development of psychoacoustics and data compression methods gradually led to the fact that the MP3 standard became “strict” for the implementation of new ideas in audio coding. As a result, in 1997, Fraunhofer IIS, which created MP3 in the early 1990s, as well as Dolby, AT&T, Sony, and Nokia, developed a new audio compression method: Advanced Audio Coding (AAC), which became a standard. . MPEG-2 and MPEG-4. The main differences from the MP3 standard are:
support for a wider range of audio formats (up to 48 channels) and sample rates (8 kHz to 96 kHz);
More efficient and simple filter bank: The hybrid MP3 filter bank has been replaced by the conventional MDCT (Modified Discrete Cosine Transform);
wider ranges of variation of the time-frequency resolution in the filter bank – eight times (in MP3 – three times) – led to an improvement in the encoding of transients (transients) and stationary sections of the audio signal;
better coding of frequencies above 16 kHz;
more flexible stereo encoding mode, allowing to switch to M / S (“joint stereo”) mode independently in different frequency bands;
Additional features of the standard that increase compression efficiency: time domain noise shaping technology (TNS), prediction of MDCT coefficients over time (long-term prediction), parametric stereo coding mode, synthesis of noise (perceptual noise replacement), high frequencies (SBR).
Thanks to these features, the AAC standard can achieve more flexible and efficient audio coding and therefore better quality. As a result of the widespread use of the MP3 format, the AAC standard has not yet acquired a popularity comparable to MP3. However, AAC is the main format on the popular iTunes Store, iPods, iTunes, iPhone, PlayStation 3, Nintendo Wii, and DAB + / DRM digital streams.
OGG Vorbis
image
Ogg Vorbis is a relatively new universal audio compression format that was officially released in the summer of 2002. It belongs to the same type of format as MP3, AAC, VQF and WMA, that is, lossy compression formats. The psychoacoustic model used in Ogg Vorbis is similar in principle to MP3 and similar ones, but only that the mathematical processing and practical implementation of this model are fundamentally different, allowing the authors to declare its format completely independent of all predecessors.
The main undeniable advantage of the Ogg Vorbis format is its total openness and freedom. In addition, it uses the latest and highest quality psychoacoustic model, so the bitrate / quality ratio is significantly lower than other formats. As a result, the sound quality is better, but the file size is smaller.
The format has many advantages. For example, the Ogg Vorbis format does not restrict the user to only two channels of audio (stereo: left and right). Supports up to 225 individual channels at a sample rate of up to 192 kHz and up to 32 bits (which no lossy compression format does), making Ogg Vorbis ideal for encoding 6-channel DVD-Audio. Additionally, the OGG Vorbis format has sample accuracy. This ensures that the audio data before encoding and after decoding will not have offsets or extra / missing samples to each other. This is easy to appreciate when you are encoding music endlessly (where one track gradually fades into another); in the end, the integrity of the sound will be preserved.
Streaming capacity is nowhere to be found, but this format has built it from the ground up. This gives the format a rather useful side effect: multiple songs can be stored in one file with their own tags. When loading such a file into the player, all songs should be displayed as having been loaded from several different files.
We should also mention a fairly flexible labeling system. The tag header can easily be expanded to include lyrics of any length and complexity (eg song lyrics) interspersed with images (eg album cover photo). Text labels are stored in UTF-8, allowing you to type in all languages at the same time and eliminating potential problems with encodings. This is much more convenient than various tricks like id3 tags.
Ogg Vorbis uses a variable bitrate by default, while the latter is not limited to hard values and can vary even by 1 kbps. It should be noted that the format does not strictly limit the maximum bit rate and with the maximum encoding setting it can range from 400 kbps to 700 kbps. The sample rate has the same flexibility: users can choose between 2000 Hz and 192000 Hz.
Ogg Vorbis was developed by the Xiphophorus community to replace all paid proprietary audio formats. Even though this is the youngest format of all MP3 competitors, Ogg Vorbis has full support on all known platforms (Windows, PocketPC, Symbian, DOS, Linux, MacOS, FreeBSD, BeOS, etc.), as well as a large number of hardware implementations. … The current popularity far exceeds all alternative solutions.
It is worth noting that Ogg Vorbis is only a small part of the Ogg Squish multimedia project, which also includes free encoders: Speex – for voice compression; FLAC: for lossless audio compression; Theora: for video compression.
Musepack
image
MusePack (mpp, mp +, mpc, MPEG +) is an unlicensed file format for storing audio information, distributed under the GNU General Public License.
The quality of MPC encoding at high bit rates (160 Kbps and above) is notably (if not significantly) higher than the quality provided by MP3.
Main advantages:
The format doesn’t do a second dct conversion, it doesn’t actually suffer from pre-echo artifacts, unlike formats like MP3, Vorbis, AAC, and WMA.
More efficient variable bit rate algorithms. If you track how the bit rate changes during MPC track playback, you will notice that for simpler sections the encoder assigns a lower bit rate, and for complex ones a much higher one, sometimes above 400 ( !) Kbps. An interesting fact is also worth mentioning: the MP3 encoder in VBR mode for silence assigns a bit rate of 32 kbps (at a sampling rate of 44100 Hz), AAC and OGG Vorbis – 2 kbps, Musepack encodes silence with minimal costs, <1 kbps / s (for example, one minute of silence will occupy about 514 bytes). All of this speaks to the extreme “frugality” of this encoder.
Powerful and flexible psychoacoustic model. Here we can mention, for example, a frame-based dynamic low-pass filter (in other encoders, a fixed bandwidth is set for each quality preset).
More advanced compression based on optimized Huffman tables (the same MP3 LAME wastes about 20% of the bit rate, only due to imperfect mathematical compression)
WMA
image
Windows Media Audio is a licensed file format developed by Microsoft for storing and transmitting audio information.
WMA was initially marketed as an alternative to MP3, but Microsoft now opposes AAC. Nominally, the WMA format is characterized by good compressibility, allowing it to “bypass” the MP3 format and compete on parameters with the Ogg Vorbis and AAC formats. But as independent tests, as well as subjective evaluation, showed, the quality of the formats is not yet exclusively equivalent, and the advantage even over MP3 is unequivocal, as Microsoft claims.
Format, encoder and parameter selection
Now straight to the heart of the matter.
To make your choice easier, I would like to share my experience gained in the course of numerous comparisons, auditions, as well as based on the analysis of the results of open hearing tests.
And so, next I will talk about the most suitable encoders for each case, as well as the correct choice of parameters. For the conversion, I recommend using foobar2000 (the converter settings are described in detail here), the parameters themselves are specified just for it. Additionally, foobar2000 has a host of useful DSPs that can be useful for audio pre-processing.
For those who are going to convert through the console or another program: the variable% s must be replaced with the name of the source file (or a similar variable) and% d with the name of the output file.
Note that for each bit rate range, the possible format options are indicated: the first is the highest priority. If your player doesn’t support the first option, please pay attention to the next one, etc. As I already wrote, in fact today only three codecs deserve attention: these are AAC, OGG Vorbis and Musepack. WMA, on the other hand, due to its closed nature, does not differ in special quality, but still, in most cases, it is better than MP3. Since some of the alternatives are only compatible with WMA, I will make recommendations for each of the four formats.
About bit rates: It should be understood that the optimal encoding mode is called. True VBR, ie target quality mode, not bit rate. Ideally, the result is a track with variable bit rate, but constant quality (don’t equate the two, more complex parts of a track need more bits to maintain quality). Therefore, the output bit rate is difficult to predict. Therefore, the bitrate values below are indicated only as approximate, if possible, as an average for a large number of compositions of varying complexity.
Mentioned in this article, as well as some other encoders, with Russian descriptions of the main parameters and recommendations can be found here.
Ultra-low bit rates (~ 25-40 kbps)
This range is ideal for encoding audiobooks. And here there can only be one option: AAC, or rather, Nero AAC. The parameters are as follows:
-lc -q 0.35 -ignorelength -if – -of% d
In this case, the material must be pre-converted to mono and resampled at 22050 Hz (preferably using a SoX resampler). At the output, we get the usual low complexity AAC with a bit rate of about 25 kbps.
There are also options for music in this range:
1) Nero AAC. No conversions are needed here:
-q 0.15 -ignorelength -if – -of% d
On the output – High efficiency AAC v2 (with parametric stereo and HF synthesis), ~ 35 kbps. A great option for internet radio. Only here we must not forget that the decoder in the player must be compatible with HE-AACv2, otherwise you will get a complete absence of HF and monophony.
2) OGG Vorbis AoTuV – This modification of libvorbis includes improvements to the low bitrate encoding algorithm and even without SBR technology it is not much inferior to HE-AACv2. Command line:
-s% r -Q -q-2 – -o% d
Resulting files must be fully compatible with standard OGG Vorbis decoders. Bit rate – similar – around 35 kbps.
3) WMA 10 Pro. For such cases Microsoft also has something like SBR (high frequency synthesis), it doesn’t sound as bad as it could. It is true that the bit rate is slightly off limits: 48 kbps.
-silent -a_codec WMA9PRO -a_mode 3 -a_setting 48_44_2_16 -input% s -output% d
Note that older decoders (especially “hardware”) do not support WMA 10. In this case, you can use WMA 9.2 (the same encoder), however, its quality at low bit rates is much worse.
-silent -a_codec WMA9STD -a_mode 3 -a_setting 48_44_2 -input% s -output% d
Low bit rate, ~ 64 kbps
Initially, I thought about going straight to higher speeds. But since hydrogenaudio.org recently ran an encoder comparison at this bitrate, it’s a sin to lose it.
1) QuickTime AAC is the winner (except for the newly created Opus / CELT) of the same test. The following are the QAAC encoder settings:
-s -v 64 –he -q 2 –ignorelength – -o% d
The output is HE-AAC (with SBR, but not parametric stereo), which should be compatible with various iPods and the like.
2) OGG Vorbis AoTuV – although it turned out to be quite far from QAAC, but still:
-s% r -Q -q0 – -o% d
3) And just in case WMA 10 Pro:
-silent -a_codec WMA9PRO -a_mode 3 -a_setting 64_44_2_16 -input% s -output% d
For older decoders – WMA 9 standard:
-silent -a_codec WMA9STD -a_mode 3 -a_setting 64_44_2 -input% s -output% d
Slightly higher, ~ 80-100 kbps
And I already consider this bitrate due to Vorbis.
1) As tests have shown, the OGG Vorbis AoTuV encoder is best suited to it:
-s% r -Q -q1 – -o% d
2) Nero AAC: a very good result. In places where the highs are not as pronounced, it can sound even better than Vorbis (in the highs it loses due to synthesis).
30 -ignorelength -if – -of% d The
profile used is HE-AAC.
De facto standard, 128 kbps
Interesting fact: many people argue that for MP3 128 kbps – “edge bit rate”, which starts the quality indistinguishable from the original. Maybe this is so … for plastic Chinese speakers with blatnyak. Actually, this threshold is around 200 kbps, and newer formats provide more stable quality at this bit rate.
Modern encoders managed to cut this level from 128 kbps to almost half (again, according to the developers). But nevertheless, if you have more or less decent acoustics (or headphones), the difference can be captured in complex snippets even at 128 kbps.
Comments:
I’ve been wondering about this for a while! Thanks for explaining it so clearly. I always wondered why my FLAC files take so much space, but now I get it.
This helped me decide what format to use for my music collection. But I’d love to see more details about AAC vs MP3 specifically. Could you do a comparison on those?
Great article! I’m converting my old CDs to FLAC now. Didn’t know about the lossless advantage before. This makes so much sense for preserving music.
I was looking for this information for a school project. It’s super useful, but could you explain more about how lossy compression actually works technically?
Honestly, I can’t tell much of a difference between MP3 and FLAC on my headphones. Maybe it depends on the equipment you use?
This is exactly the kind of info I needed. I’ve been using WAV for everything, but now I’ll switch to FLAC to save space without losing quality.
The breakdown of lossy vs lossless was super helpful. Do you know if most streaming platforms use lossy formats? I’d love to learn more about that.
I never thought about how lossy formats remove inaudible data. Makes sense now why the files are so much smaller!
Wish I’d read this before I started digitizing my vinyl collection. I used MP3, but now I think I should’ve gone with FLAC for better quality.
I’m new to audio formats, and this was super easy to follow. Thank you for breaking it down so clearly!
This is awesome. Could you do a follow-up on how to properly compress audio files without losing too much quality?
Very detailed and useful article. Now I know why audiophiles always recommend FLAC. Thanks for sharing this info!
Finally, a guide that makes sense! My car stereo doesn’t play FLAC though, so I guess I’ll stick to MP3 for now.