About Lossy

About Lossy

Lossy

We all love good music. More recently, the audio CD was good digital music. This is 44100 Hz, stereo, 16 bits (linear) per channel, not compressed in any way, which means, according to Wikipedia, 1411.2 kbps.

Lossy

But at the end of the 20th century, in the era of the birth of multimedia, when music began to be played not only on players, but also on computers, it turned out that the audio CD (that is, naked PCM) is even better. . compress. There was, for example, Microsoft ADPCM, which compressed this case a bit, without losing quality, in WAV files. But generally speaking, the original 44 kHz stereo would still require a lot of space this way. Hence, the quality dropped to 22 kHz mono. One of the first multimedia albums of that time: “Immersion” from the group “Nautilus Pompilius”, is still around, and I did.

So MP3 won. To store and distribute compressed music. At 128 kbps “CD Quality”.

MP3 came up strangely. Technically, this is MPEG-1 Audio Layer 3. A layer for compressing audio data into a modern, progressive standard for storing video data on Video CDs. Just packed in its own .mp3 file format. The video CD is no longer interesting to anyone. The following MPEG-2 standard is used in DVD and digital television broadcasts (not HD). And the next MPEG-4 standard is now used for HD video and continues to evolve.

MP3 was revolutionary. It was (almost) the first lossy compression format. When we don’t try to preserve everything that was in the original signal, but, based on some psychoacoustic model, we cut out what a person is not going to hear anyway, and compress the rest. Like JPEG.

Then I tried digitizing the accumulated audio collection. Compact cassettes (just “cassettes”, but more correctly “compact cassettes”) turned out to be complete shit. The frequency range is such that it makes no sense to sample with more than 22 kHz. There were no reel-to-reel recorders in the house. But vinyl records shook the sound quality. With good equipment, you can draw better quality than a CD. You just need to get rid of the clicks.

And then I realized that MP3 is shit too. At these same 128 kbps, the sound quality suffers greatly. And the scariest thing is that vile metallic hues appear where they shouldn’t be. My ears need at least 192 kbps, and the more the better.

Let’s take a hint from a famous punk rock band in the past. Like FLAC. It is such a modern lossless compression standard that it has successfully replaced WAV. Because it is free.

The original is CD quality, so frequencies up to 22 kHz are present as expected.

Original flac

We are going to harvest with FFmpeg, or rather with LAME.

At 320 kbps and 256 kbps, the spectrogram looks almost like the original.

At 192 kbps, there are signs of a 16 kHz cutoff. The spectrogram “darkens”, apparently, the psychoacoustic model has cut something out. By ear, the higher frequency “bursts” really disappeared.

MP3 192 kbps

At the notorious 128 kbit / s, everything is already specifically cut off at 16 kHz. Background sounds are “fuzzy” and begin to bubble. Nothing to do with the original in terms of enjoying the musical details.

MP3 128 kbps

But you can do 64 kbps in MP3. The stereo is gone. Everything gurgles terribly and irritates with completely strange sounds.

In what format and with what quality is music heard on the radio?

In what format and with what quality is music heard on the radio?

Radio most used audio file formats

In fact, we can say that there are currently two main audio formats: lossy (compressed) and lossless (uncompressed). They are classified into many types.

Radio audio file formats

Lossy takes up less disk space, but degrades the quality of the audio track. When compressed using the MPEG protocol (hence the name mp3 – mp4 for files containing video sequences), the hues and transition tones, which are barely noticeable to the ear, are cut off. This makes the file clearer, but it also degrades it. The last place is occupied by the bit rate of that file: the degree of compression of each second of the audio track. The lower the bitrate, the less space the file will occupy and the worse the quality. Thus, a composition of three minutes in mp3 with a bit rate of 320 kilobits per second will occupy up to 3 megabytes on disk; a similar composition with a 96 kilobit bit rate will occupy about 400 kilobytes.

Lossless is as close to the original analog sound as possible *, making it much loved by sound engineers. Lossless formats take up much more disk space even compared to mp3-320. Among these formats, the most common are WAV (standard), FLAC (economic), AIFF (Apple). The former is used most often.

Professional sound recording is done only in uncompressed format. Only with him do sound engineers work.

On the radio, the situation is somewhat more complicated. This is due to the peculiarities of the work of the media, namely, efficiency and commercial profitability. The use of high-capacity servers is expensive and therefore most radio stations encode audio tracks in mp3 format at a bit rate of 256 kilobits per second. However, this is typical mainly of national stations. Equipment purchased from abroad has standard configurations that assume WAV encoding.

Why are software developers focusing on WAV? Because the radio signal cannot propagate without interference. Therefore, the listener still receives a small and sometimes significantly distorted signal. Therefore, broadcasters are faced with a reasonable question: what quality of sound will the listener perceive best: distorted ideal or distorted distortion? For this reason, in Europe and the United States, the WAV standard (AIFF, if the station operates with Apple equipment) is adopted, in Russia – mp3 with a bit rate of 256 kilobits per second.

Analog data transmission is based on the physical properties of sound. The record-playback mechanism is based on the principles of human auditory perception. That is, the sound wave vibrates the membrane (by analogy with the tympanic membrane of the ear) and is fixed with a needle in the carrier in the form in which it was obtained. Reproduced, therefore, also without deviations and changes associated with digital conversion.

The Audio Files category includes compressed and uncompressed audio formats that contain a data signal and can be played by audio programs. This category also includes MIDI files, music scores, and audio project files, which generally do not contain audio data.

The most common extensions are .WAV, .AIF, .MP3, and .MID.

Lossy audio compression

Lossy audio compression

MP3: Lossy compression

I’ll start with the well-known and widely used (though not always loved) MP3 format.

Lossy audio format

This audio format is actively used everywhere and everywhere, where it is needed and where it is not needed. But this does not mean that it is not worthy of the place it occupies in its niche. Very worthy. Although he has been “sitting” in his niche for about two decades, no one has “kicked” him out of there yet. And there were many who wanted to say it. And the main favorite of them is WMA (Windows Media Audio), which was conceived by Microsoft as an alternative to MP3. As a result, it is an alternative and it is, despite the best efforts of the developers. The next character is OGG. Despite the broader possibilities than MP3, for example, it never received widespread acceptance. Although it is compatible with many operating systems. Perhaps, it is worth mentioning the AAC audio format, which was supposed to replace MP3 in the relay. Encoding quality has been improved and compression loss reduced. But Ay.

The main advantage of these formats is their small size. The downside is the loss of quality.

Different formats
In today’s world, you can find a large number of different sound extensions. Let’s remember at a glance:

MP3 (Well where without it?)
WMA
OGG
CAA
And many others
Of course, each of these formats is good, especially MP3, which is probably the most popular format. But today we are not talking about popularity. MP3 and other similar formats, no matter how good they sound, are compressed originals. And even if you set the maximum quality to 320 btrate, it still won’t be of the highest quality. It was compressed, reduced, so there will be certain losses.

Lossy audio encoding. What is what?

Lossy audio encoding. What is what?

LOSSY AUDIO
.

The Evolution of Audio Coding

lossy compression

It’s 2020, it’s been years since the first MP3 encoder appeared. But just because most of us still calmly listen to MP3 music does not mean that progress has marked time all this time. And this applies not only to the development of the MP3 encoding algorithm, but also to the evolution of lossy audio encoding in general, in the form of newer and more advanced codecs that actually allow you to get better quality in a smaller size. . Formats like OGG Vorbis, AAC, WMA, Musepack have left behind outdated MP3 with its many limitations and flaws.

In parallel, lossless encoding is gaining momentum. But due to the large amount of data, today it is still not suitable for large-scale use, especially for portable devices with limited memory, for streaming on the network and only for quickly sharing music on the Internet (I must admit that not all 100 megabit internet access isn’t always at hand).

And so MP3 is out of date and definitely ready to be replaced. But what about the uninitiated user, but who wants to achieve the highest quality sound with the least amount of memory? After all, there are quite a few alternative codecs (at least 3 of them are really worthy of attention): Apple is promoting the AAC (Advanced Audio Coding, positioned as the successor to MP3) format through its iTunes Store, Microsoft, its own WMA (Windows Media Audio) license, moreover, OGG Vorbis is becoming more and more famous, and specially illustrated people even use a format like Musepack. Which of these codecs should I choose?

There is no definitive answer to this question, and that is why I am writing this article.

How to decide?

The choice of one or the other codec depends on the specific task. Namely:

1. From the equipment and software with which the sound will be reproduced. Those. on the availability of support for one or another audio format, as well as the quality of reproduction (it is advisable to be guided by it when choosing a bit rate).

2. Of the amount of memory that will be allocated to the final material. Accordingly, a higher or lower target quality / bit rate is selected.

And of course, in addition to the format and bit rate, you need to choose the optimal encoder and encoding parameters. It should be understood that different formats / encoders are displayed in different ways in different bit rate ranges.

Therefore, the algorithm is approximately the following:

1) Find out what formats the target device supports.
2) Determine how much space you can allocate for the audio material, as well as determine the total length of the audio intended for encoding.
3) Calculate the required bitrate using the formula: bitrate = disk_space (in kilobits) / total_time (in seconds).
4) According to the bitrate, choose the optimal one of the supported formats (more on this later).
5) Choose the best encoder and parameters for it.

More about our heroes

CAA

image

The development of psychoacoustics and data compression methods gradually led to the fact that the MP3 standard became “strict” for the implementation of new ideas in audio coding. As a result, in 1997, Fraunhofer IIS, which created MP3 in the early 1990s, as well as Dolby, AT&T, Sony, and Nokia, developed a new audio compression method: Advanced Audio Coding (AAC), which became a standard. . MPEG-2 and MPEG-4. The main differences from the MP3 standard are:
support for a wider range of audio formats (up to 48 channels) and sample rates (8 kHz to 96 kHz);
More efficient and simple filter bank: The hybrid MP3 filter bank has been replaced by the conventional MDCT (Modified Discrete Cosine Transform);
wider ranges of variation of the time-frequency resolution in the filter bank – eight times (in MP3 – three times) – led to an improvement in the encoding of transients (transients) and stationary sections of the audio signal;
better coding of frequencies above 16 kHz;
more flexible stereo encoding mode, allowing to switch to M / S (“joint stereo”) mode independently in different frequency bands;
Additional features of the standard that increase compression efficiency: time domain noise shaping technology (TNS), prediction of MDCT coefficients over time (long-term prediction), parametric stereo coding mode, synthesis of noise (perceptual noise replacement), high frequencies (SBR).

Thanks to these features, the AAC standard can achieve more flexible and efficient audio coding and therefore better quality. As a result of the widespread use of the MP3 format, the AAC standard has not yet acquired a popularity comparable to MP3. However, AAC is the main format on the popular iTunes Store, iPods, iTunes, iPhone, PlayStation 3, Nintendo Wii, and DAB + / DRM digital streams.
OGG Vorbis

image

Ogg Vorbis is a relatively new universal audio compression format that was officially released in the summer of 2002. It belongs to the same type of format as MP3, AAC, VQF and WMA, that is, lossy compression formats. The psychoacoustic model used in Ogg Vorbis is similar in principle to MP3 and similar ones, but only that the mathematical processing and practical implementation of this model are fundamentally different, allowing the authors to declare its format completely independent of all predecessors.
The main undeniable advantage of the Ogg Vorbis format is its total openness and freedom. In addition, it uses the latest and highest quality psychoacoustic model, so the bitrate / quality ratio is significantly lower than other formats. As a result, the sound quality is better, but the file size is smaller.
The format has many advantages. For example, the Ogg Vorbis format does not restrict the user to only two channels of audio (stereo: left and right). Supports up to 225 individual channels at a sample rate of up to 192 kHz and up to 32 bits (which no lossy compression format does), making Ogg Vorbis ideal for encoding 6-channel DVD-Audio. Additionally, the OGG Vorbis format has sample accuracy. This ensures that the audio data before encoding and after decoding will not have offsets or extra / missing samples to each other. This is easy to appreciate when you are encoding music endlessly (where one track gradually fades into another); in the end, the integrity of the sound will be preserved.
Streaming capacity is nowhere to be found, but this format has built it from the ground up. This gives the format a rather useful side effect: multiple songs can be stored in one file with their own tags. When loading such a file into the player, all songs should be displayed as having been loaded from several different files.
We should also mention a fairly flexible labeling system. The tag header can easily be expanded to include lyrics of any length and complexity (eg song lyrics) interspersed with images (eg album cover photo). Text labels are stored in UTF-8, allowing you to type in all languages ​​at the same time and eliminating potential problems with encodings. This is much more convenient than various tricks like id3 tags.
Ogg Vorbis uses a variable bitrate by default, while the latter is not limited to hard values ​​and can vary even by 1 kbps. It should be noted that the format does not strictly limit the maximum bit rate and with the maximum encoding setting it can range from 400 kbps to 700 kbps. The sample rate has the same flexibility: users can choose between 2000 Hz and 192000 Hz.
Ogg Vorbis was developed by the Xiphophorus community to replace all paid proprietary audio formats. Even though this is the youngest format of all MP3 competitors, Ogg Vorbis has full support on all known platforms (Windows, PocketPC, Symbian, DOS, Linux, MacOS, FreeBSD, BeOS, etc.), as well as a large number of hardware implementations. … The current popularity far exceeds all alternative solutions.
It is worth noting that Ogg Vorbis is only a small part of the Ogg Squish multimedia project, which also includes free encoders: Speex – for voice compression; FLAC: for lossless audio compression; Theora: for video compression.
Musepack

image
MusePack (mpp, mp +, mpc, MPEG +) is an unlicensed file format for storing audio information, distributed under the GNU General Public License.
The quality of MPC encoding at high bit rates (160 Kbps and above) is notably (if not significantly) higher than the quality provided by MP3.
Main advantages:
The format doesn’t do a second dct conversion, it doesn’t actually suffer from pre-echo artifacts, unlike formats like MP3, Vorbis, AAC, and WMA.
More efficient variable bit rate algorithms. If you track how the bit rate changes during MPC track playback, you will notice that for simpler sections the encoder assigns a lower bit rate, and for complex ones a much higher one, sometimes above 400 ( !) Kbps. An interesting fact is also worth mentioning: the MP3 encoder in VBR mode for silence assigns a bit rate of 32 kbps (at a sampling rate of 44100 Hz), AAC and OGG Vorbis – 2 kbps, Musepack encodes silence with minimal costs, <1 kbps / s (for example, one minute of silence will occupy about 514 bytes). All of this speaks to the extreme “frugality” of this encoder.
Powerful and flexible psychoacoustic model. Here we can mention, for example, a frame-based dynamic low-pass filter (in other encoders, a fixed bandwidth is set for each quality preset).
More advanced compression based on optimized Huffman tables (the same MP3 LAME wastes about 20% of the bit rate, only due to imperfect mathematical compression)

WMA

image

Windows Media Audio is a licensed file format developed by Microsoft for storing and transmitting audio information.

WMA was initially marketed as an alternative to MP3, but Microsoft now opposes AAC. Nominally, the WMA format is characterized by good compressibility, allowing it to “bypass” the MP3 format and compete on parameters with the Ogg Vorbis and AAC formats. But as independent tests, as well as subjective evaluation, showed, the quality of the formats is not yet exclusively equivalent, and the advantage even over MP3 is unequivocal, as Microsoft claims.

Format, encoder and parameter selection

Now straight to the heart of the matter.

To make your choice easier, I would like to share my experience gained in the course of numerous comparisons, auditions, as well as based on the analysis of the results of open hearing tests.

And so, next I will talk about the most suitable encoders for each case, as well as the correct choice of parameters. For the conversion, I recommend using foobar2000 (the converter settings are described in detail here), the parameters themselves are specified just for it. Additionally, foobar2000 has a host of useful DSPs that can be useful for audio pre-processing.

For those who are going to convert through the console or another program: the variable% s must be replaced with the name of the source file (or a similar variable) and% d with the name of the output file.

Note that for each bit rate range, the possible format options are indicated: the first is the highest priority. If your player doesn’t support the first option, please pay attention to the next one, etc. As I already wrote, in fact today only three codecs deserve attention: these are AAC, OGG Vorbis and Musepack. WMA, on the other hand, due to its closed nature, does not differ in special quality, but still, in most cases, it is better than MP3. Since some of the alternatives are only compatible with WMA, I will make recommendations for each of the four formats.

About bit rates: It should be understood that the optimal encoding mode is called. True VBR, ie target quality mode, not bit rate. Ideally, the result is a track with variable bit rate, but constant quality (don’t equate the two, more complex parts of a track need more bits to maintain quality). Therefore, the output bit rate is difficult to predict. Therefore, the bitrate values ​​below are indicated only as approximate, if possible, as an average for a large number of compositions of varying complexity.

Mentioned in this article, as well as some other encoders, with Russian descriptions of the main parameters and recommendations can be found here.

Ultra-low bit rates (~ 25-40 kbps)

This range is ideal for encoding audiobooks. And here there can only be one option: AAC, or rather, Nero AAC. The parameters are as follows:

-lc -q 0.35 -ignorelength -if – -of% d

In this case, the material must be pre-converted to mono and resampled at 22050 Hz (preferably using a SoX resampler). At the output, we get the usual low complexity AAC with a bit rate of about 25 kbps.

There are also options for music in this range:

1) Nero AAC. No conversions are needed here:

-q 0.15 -ignorelength -if – -of% d

On the output – High efficiency AAC v2 (with parametric stereo and HF synthesis), ~ 35 kbps. A great option for internet radio. Only here we must not forget that the decoder in the player must be compatible with HE-AACv2, otherwise you will get a complete absence of HF and monophony.

2) OGG Vorbis AoTuV – This modification of libvorbis includes improvements to the low bitrate encoding algorithm and even without SBR technology it is not much inferior to HE-AACv2. Command line:

-s% r -Q -q-2 – -o% d

Resulting files must be fully compatible with standard OGG Vorbis decoders. Bit rate – similar – around 35 kbps.

3) WMA 10 Pro. For such cases Microsoft also has something like SBR (high frequency synthesis), it doesn’t sound as bad as it could. It is true that the bit rate is slightly off limits: 48 kbps.

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 48_44_2_16 -input% s -output% d

Note that older decoders (especially “hardware”) do not support WMA 10. In this case, you can use WMA 9.2 (the same encoder), however, its quality at low bit rates is much worse.

-silent -a_codec WMA9STD -a_mode 3 -a_setting 48_44_2 -input% s -output% d

Low bit rate, ~ 64 kbps

Initially, I thought about going straight to higher speeds. But since hydrogenaudio.org recently ran an encoder comparison at this bitrate, it’s a sin to lose it.

1) QuickTime AAC is the winner (except for the newly created Opus / CELT) of the same test. The following are the QAAC encoder settings:

-s -v 64 –he -q 2 –ignorelength – -o% d

The output is HE-AAC (with SBR, but not parametric stereo), which should be compatible with various iPods and the like.

2) OGG Vorbis AoTuV – although it turned out to be quite far from QAAC, but still:

-s% r -Q -q0 – -o% d

3) And just in case WMA 10 Pro:

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 64_44_2_16 -input% s -output% d

For older decoders – WMA 9 standard:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 64_44_2 -input% s -output% d

Slightly higher, ~ 80-100 kbps

And I already consider this bitrate due to Vorbis.

1) As tests have shown, the OGG Vorbis AoTuV encoder is best suited to it:

-s% r -Q -q1 – -o% d

2) Nero AAC: a very good result. In places where the highs are not as pronounced, it can sound even better than Vorbis (in the highs it loses due to synthesis).
30 -ignorelength -if – -of% d The

profile used is HE-AAC.

De facto standard, 128 kbps

Interesting fact: many people argue that for MP3 128 kbps – “edge bit rate”, which starts the quality indistinguishable from the original. Maybe this is so … for plastic Chinese speakers with blatnyak. Actually, this threshold is around 200 kbps, and newer formats provide more stable quality at this bit rate.

Modern encoders managed to cut this level from 128 kbps to almost half (again, according to the developers). But nevertheless, if you have more or less decent acoustics (or headphones), the difference can be captured in complex snippets even at 128 kbps.