Compression and compression methods of audio signals


Free Download Mp4Gain
picture

Compression and compression methods of audio signals (types, differences, use)

Audio Compression

Basics of the analog-to-digital conversion principle, sound conversion and compression method, existing sound storage formats. Programs to convert and process sound and audio files. Application of these programs in linguistic research.

Bit rate is the amount of information per unit of time. In general, the bit rate is the number of bits that we spend encoding a sound with a duration of 1 second.

Analog-to-digital converter (ADC): A device that converts an input analog signal into a binary code (digital signal). The reverse conversion is done using a DAC (digital-to-analog converter, DAC). Typically, an ADC is an electronic device that converts voltage into a binary digital code. However, some non-electronic devices with digital output must also be classified as ADCs, such as some types of angle-to-code converters. The simplest one-bit binary ADC is a comparator.

The circuit to convert an audio signal from analog to digital:

Sampling is the transformation of continuous images and sound into a set of discrete values ​​in the form of codes.

Quantization is the process of aligning a set of musical notes to a grid.

Compression (compression) of audio data is a process of lowering the bit rate by reducing the statistical and psychoacoustic redundancy of a digital audio signal.

The underlying idea behind all lossy audio compression techniques is to neglect the subtle details of the original sound that are beyond the reach of the human ear.

Codec (CoDec) is an abbreviation for compressor and decompressor. Basically, a codec is a collection of files, drivers, and libraries required to package a video or audio file into a compressed format and play the compressed file.

Formats:

AAC (Advanced Audio Coding) is an audio file format with less quality loss when encoding than MP3 of the same size. The format also allows you to compress without losing the quality of the source (ALAC AAC profile).

AAC (Advanced Audio Coding) was originally created as a successor to MP3 with improved encoding quality. The AAC format, officially known as ISO / IEC 13818-7, was released in 1997 as the new seventh part of the MPEG-2 family. There is also the AAC format known as MPEG-4

Apple AIFF: This file type is standard for Apple Macintosh systems and sound processing systems built on top of it. Apple AIFF stands for Audio Interchange File Format, an audio interchange file format, it is somewhat similar to WAV. Its peculiarity is that it allows you to place additional information along with the sound wave, in particular WaveTable samples (examples of the instrument sound together with synthesizer parameters), which improves the quality of the final result. Although today Apple computers are capable of playing files of almost any format, including MP3.

FLAC (Free Lossless Audio Codec) is a popular free codec for audio compression. Unlike lossy Ogg Vorbis, MP3 and AAC codecs, it does not remove any information from the audio stream and is suitable for both daily listening and archiving of audio collection. Today, the FLAC format is compatible with many audio applications.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Digital audio compression methods

Digital audio compression methods

audio compression

Lossless compression

AUDIO COMPRESSION

Generally speaking, the meaning of lossless compression is as follows: some pattern is found in the original data, and taking this pattern into account, a second stream is generated, uniquely describing the original. For example, to encode binary sequences in which there are many zeros and few ones, we can use the following replacement:

00> 0
01> 10
10> 110
11> 111

In this case, sixteen bits:
00 01 00 00 11 10 00 00

will be converted to thirteen bits:
0 10 0 0 111 110 0 0

If we write a compressed string without spaces, we can still add spaces in it, which means restoring the original sequence.

FLAC (Free Lossless Audio Codec)
Coding principle: the algorithm tries to describe the signal with this function so that the result obtained after subtracting it from the original (called difference, remainder, error) can be encoded with the minimum number of bits.

When the model is fitted, the algorithm subtracts the approximation from the original to obtain a residual signal (error), which is then losslessly encoded.

Lossy compression (MP3, AAC, WMA, OGG)
Using a lossy compression algorithm, the size of an MP3 file with an average bit rate of 128 kbps is approximately 1/11 of the original file of an Audio CD (uncompressed audio in CD-Audio format has a rate bit rate of 1411.2 kbps). MP3 files can be created at high or low bit rates, which affects the quality of the result.

The principle of compression is to reduce the precision of some parts of the sound flow, which is almost indistinguishable for most people. The audio signal is divided into segments of equal length, each of which, after processing, is packed into its own frame (frame). Spectral decomposition requires continuity of the input signal; therefore the table above and below are also used for calculations. The audio signal contains harmonics with a lower amplitude and harmonics that are close to the strongest; Such harmonics are cut off, as the average human ear will not always be able to determine the presence or absence of such harmonics. This characteristic of hearing is called the masking effect. It is also possible to replace two or more nearby peaks with an averaged one (which, as a rule, leads to sound distortion). The cutoff criterion is determined by the outflow requirement. Since the entire spectrum is relevant, the high-frequency harmonics are not cut off, but are only selectively removed to reduce information flow due to spectrum sparsity. After spectral removal, mathematical compression and frame packing methods are applied.

Masking effect
In certain cases, a sound can be hidden by another sound. For example, talking near the railroad tracks can be completely impossible if a train passes. This type of effect is called masking. A weak sound is said to be masked if it becomes indistinguishable in the presence of a louder sound.

Simultaneous masking
Any two sounds when heard simultaneously have an impact on the perception of the relative volume between them. A louder sound reduces the perception of a weaker one, until the disappearance of your hearing. The closer the frequency of the masked sound is to the frequency of the masker, the more it will be hidden. The masking effect is not the same when the masked sound is shifted down or up in frequency with respect to masking. Low-frequency sound masks high-frequency sound. However, it is important to note that high-frequency sounds cannot mask low-frequency sounds.

Time masking
This phenomenon is similar to frequency masking, but time masking occurs here. When the masking sound is stopped, the masking remains inaudible for some time. Under normal conditions, the temporary masking effect lasts significantly less. The masking time depends on the frequency and amplitude of the signal and can be up to 100 ms.
In the case where the masking tone appears at a time after masking, the effect is called post-masking. When the masking tone appears before the masking (this is also possible), the effect is called premasking.

Post-stimulus fatigue
Often after exposure to loud, high-intensity sounds, a person’s hearing sensitivity drops dramatically. Recovery to normal thresholds can take up to 16 hours. This process is called “temporary change in hearing sensitivity threshold” or “post-stimulus fatigue.”

Digital audio compression methods

Digital audio compression methods

Audio Compression

Lossless compression

Audio Compression

Generally speaking, the meaning of lossless compression is as follows: some pattern is found in the original data, and taking this pattern into account, a second stream is generated, uniquely describing the original. For example, to encode binary sequences with many zeros and few ones, we can use the following replacement:

00> 0
01> 10
10> 110
11> 111

In this case, sixteen bits:

00 01 00 00 11 10 00 00

will be converted to thirteen bits:

0 10 0 0 111 110 0 0

If we write a compressed string without spaces, we can still add spaces in it, which means restoring the original sequence.

FLAC (Free Lossless Audio Codec – Free Lossless Audio Codec)
Coding principle: the algorithm tries to describe the signal with this function so that the result obtained after subtracting it from the original (called difference, remainder, error) can be encoded with the minimum of bits.

When the model is fitted, the algorithm subtracts the approximation from the original to obtain a residual signal (error), which is then losslessly encoded.

Lossy compression (MP3, AAC, WMA, OGG)
Using a lossy compression algorithm, the size of an MP3 file with an average bit rate of 128 kbps is approximately 1/11 of the original file of an Audio CD (uncompressed audio in CD-Audio format has a rate 1411.2 kbps bit rate). MP3 files can be created at high or low bit rates, which affects the quality of the result.

The principle of compression is to reduce the precision of some parts of the sound flow, which is almost indistinguishable for most people. The audio signal is divided into segments of equal length, each of which, after processing, is packed into its own frame (frame). Spectral decomposition requires continuity of the input signal; therefore, the previous and next tables are also used for calculations. The audio signal contains harmonics with a lower amplitude and harmonics that are close to the strongest; Such harmonics are cut off, as the average human ear will not always be able to determine the presence or absence of such harmonics. This characteristic of hearing is called the masking effect. It is also possible to replace two or more close peaks with an averaged one (which, as a rule, leads to sound distortion). The cutoff criterion is determined by the outflow requirement. Since the entire spectrum is relevant, the high frequency harmonics are not cut off, but are only selectively removed to reduce information flow due to rarefaction of the spectrum. After spectral removal, mathematical compression and frame packing methods are applied.

Masking effect
In certain cases, a sound can be hidden by another sound. For example, talking next to a train track can be completely impossible if a train passes. This type of effect is called masking. A weak sound is said to be masked if it becomes indistinguishable in the presence of a louder sound.

Simultaneous masking
Any two sounds, when heard simultaneously, have an impact on the perception of the relative volume between them. A louder sound reduces the perception of a weaker one, until the disappearance of your hearing. The closer the frequency of the masked sound is to the frequency of the masker, the more it will be hidden. The masking effect is not the same when the masked sound is shifted down or up in frequency relative to masking. Low-frequency sound masks high-frequency sound. However, it is important to note that high-frequency sounds cannot mask low-frequency sounds.

Time masking
This phenomenon is similar to frequency masking, but time masking occurs here. When the masking sound is stopped, the masking remains inaudible for some time. Under normal conditions, the effect of temporary masking lasts much less. The masking time depends on the frequency and amplitude of the signal and can be up to 100 ms.
In the case where the masking tone appears later than the masking, the effect is called post-masking. When the masking tone appears before the masking (this is also possible), the effect is called premasking.

Post-stimulus fatigue
Often, after exposure to loud, high-intensity sounds, a person’s hearing sensitivity drops dramatically. Recovery of normal thresholds can take up to 16 hours. This process is called “temporary change in hearing threshold.”

Digital audio compression

Digital audio compression

Digital Audio Compression

The concept of loudness is close and understandable not only for a musician, but also for people who are not associated with music. The relationship between the volume of the parts of a piece and the volume of the instruments that are playing simultaneously is called the dynamic range. One of the main tools producers and musicians use to influence dynamic range is the compressor.

Digital Audio Compression

Although the compressor works with a known phenomenon, loudness, in most cases its use occurs spontaneously, randomly, without understanding the essence of what is happening. You can know the general principle of the compressor and the purpose of each handle, but this does not eliminate the stupor at the first experience.

Why do you need a compressor?

The main purpose of the compressor is to automatically change the signal level. It works roughly the same as if you kept your hand constantly on the volume fader, turning it up and down. The difference is that a compressor can react very quickly to changes, much faster and more accurately than a human.

Up to this point, the word compressor meant a whole class of dynamic devices. Using the same basic principles as a conventional compressor, various instruments work for different purposes: limiters, expanders, gates, etc. They are united by working with the volume of individual sounds or the mix as a whole.

The classic compressor is controversial by its very name. Everyone knows that he makes the loudest sound. But the name comes from compress, which means “compression”, and if you ask any sound engineer what a compressor does, you’ll hear the answer: “squash the signal.” The compressor reduces the amplitude of the dynamic bursts, makes them quieter. So what is the main purpose of the compressor: to make it quieter or louder? The answer is both at the same time.

Let’s take an example of voice recording. Very often, in the process of singing, syllables or sounds of different volume are heard. If the singer does not control the dynamics of his performance very well, then such differences create problems for the sound engineer and negatively affect the final result of the work. Silent syllables disappear into the mix, text becomes difficult to distinguish, and if you adjust the volume for a quiet area, in other places the voice begins to “stand out.”

This is where the compressor comes in. It allows you to suppress strong bursts, equalize them with silent fragments. Now you can turn up the volume of the track without fear of some syllables sticking out. So the compressor makes the sound lower and higher at the same time. Three images show the stages of working with sound: a source with large peaks (a), a compressed signal (b) and an increase in the volume level of the entire file (c).

It is especially important to apply compression when recording in a digital environment, when we are forced to adhere to a maximum level of 0 dB, because exceeding this threshold leads to clips and distortion. When clips appear, we lower the preamp level, which means we lower the volume of not only bursts, but quiet areas as well, leading to signal degradation due to quantization and aliasing noise.

The compressor, positioned between the preamp and the digital recording system, operates only on the loudest bursts, reducing their volume and ensuring a smooth soundtrack. Thanks to this, we have the opportunity not to reduce the overall volume of the recorded signal and to maintain the sound quality.

Unfortunately, many modern musicians, without going into the technical characteristics of the compressor, use it everywhere, believing that with its help you can “stretch” any sound in the mix. Also, compressors are often included on the road in extreme conditions. They are only used by experienced sound engineers when there is a real need.

The compressor helps avoid recording problems. The most common causes of problems can be the following:

Non-professionalism of the interpreter (dynamic unevenness).
Mismatched path (bad, mismatched, or inadequate microphones, preamps).
Disadvantages of the digital environment (limited to 0 dB).
Uncomfortable conditions for the singer (small and stuffy room, poor monitoring).
Low qualification of a recording engineer.
If a performer has a voice and can sing into a microphone, and a recording engineer knows her job well and knows how to properly position microphones and set up equipment, a compressor may not be required at all. But this is the ideal situation.

Digital audio compression

Digital audio compression

Digital Audio Compression

Audio data compression is a real problem today. There are two reasons for the need to compress audio data: memory savings when storing audio information, low bandwidth of remote digital information transmission channels. Compression effectively solves the two problems above. Data compression is an algorithmic transformation of data performed to reduce its volume.

Data Compression

It is used for a more rational use of data storage and transmission devices. Compression is based on eliminating the redundancy contained in the original data. To guarantee the parameters necessary for the transmission of voice signals (music) over modern low-speed digital communication channels and to guarantee the specified noise immunity, it is necessary to use highly efficient data compression algorithms. The transmission channel is characterized by a concept such as the capacity of the channel: And the signal – by the volume (signal): …

Both of the above features include dynamic range D, channel width (signal spectrum), and transit time T. Digital audio compressors are used to reduce dynamic range. To improve spectral efficiency, digital filters are used to limit the spectrum of the encoder output signal (according to Nyquist criteria). Among other things, encoders based on the principles of elimination of redundancy (Huffman codes) are used to guarantee a certain information transmission speed. The essence of which is as follows: codes based on the principle of assigning more probable values ​​of the amplitudes of the codewords of shorter length than the improbable ones.

Let’s consider how the types of redundancy described above are eliminated.
Structure of a lossy audio compression encoder The original digital audio signal is divided into frequency subbands and time-segmented into a time-frequency segmentation block. The length of the encoded sample depends on the shape of the temporal function of the audio signal. In the absence of sharp peaks in amplitude, a long sample is used, which provides high-frequency resolution. In the case of abrupt changes in signal amplitude, the length of the encoded sample decreases dramatically, giving a higher time resolution. The decision to change the length of the coded sample is made by the psychoacoustic analysis unit, calculating the value of the psychoacoustic entropy of the signal.
After segmentation, the frequency subband signals are normalized, quantized, and encoded. In the most efficient compression algorithms, it is not the samples of the audio signal that are encoded, but the corresponding MDCT coefficients. (the differential between the coefficients is smaller) The accounting of the auditory perception patterns of a sound signal is carried out in the psychoacoustic analysis unit. Here, according to a special procedure, for each frequency sub-band, the maximum allowable level of quantization distortion (noise) is calculated, in which they are still masked by the useful signal of this sub-band.

The block of dynamic distribution of bits according to the requirements of the psychoacoustic model for each coding subband selects a minimum possible number of them, in which the level of distortions caused by quantization does not exceed the threshold of their audibility calculated by the model psychoacoustic.

This article will consider the functional diagrams of the audio data compression algorithms, based on µ-laws, A. The functional diagram of the compression algorithm based on the A-level compression law is shown in Fig.2. Figure 2. Functional diagram of the compression algorithm based on the A-level compression law A signal (discrete sine) is applied to the input of the compressor. After compression, the signal passes to the adder, where the noise is fed to the second input of the adder, thus simulating the additive noise of the transmission channel.

Then the noisy signal enters the input of the expander, at the output we get the reconstructed signal. The reconstructed and original signal is then fed to the adder, after which the power of the spectral noise is observed.

Simulation results (A = 87.6)
The following graphs are presented: 1-original signal, 2-signal passed through the compressor, 3-recovered signal, 4-noise power at the output of the noise generator, 5-noise power after the expander.

Understand what audio compression is

Understand what audio compression is

Audio Compresion

A container format is a data format that “encapsulates” other encoded data. It often contains “meta information” about the encoded data, or has a way of storing several separate streams of encoded data, or something like that.

Adio Compression

The encoding produced by the codec is the real essence of the data stream.

The most common example I can think of is the Ogg / Vorbis format. Ogg is the container format and Vorbis is the encoding. So you have an Ogg file and inside there are these little segments that contain encoded data. Each block contains a stream of Vorbis-encoded data and nothing else. For example, a cube might have the name of an artist and the title of a song stamped on it.

So, back to technology:

If you already have lossy music like mp3 or ogg / vorbis, converting it to lossless format will only take up (a lot) of disk space and will NOT, at all, NOT improve the audio quality at all. You can’t create loyalty when it’s already lost. Unless you’re writing a Visual Basic GUI on some popular TV show called CSI, but that’s fantasy, not reality.

If you have music in other lossless formats and want to convert it to FLAC, you can.

Be careful when using the term “WAV”. Wav doesn’t have to be lossless; in fact, WAV is just a container for the various possible formats. In this sense, it is similar to AVI. You can have lossless WAV if it is just raw PCM data, but you can also embed MPEG-1 Layer III (lossy) data in a WAV file.

It is possible to lose data when converting from one lossless format to another if you reduce the precision of the data. For example, if you convert an unsigned 16-bit PCM data stream at 48000 Hz to 8-bit PCM data at 44100 Hz, you lose precision in two ways: samples are merged from 48000 to just 44100 at a time. second (leading to data loss), and the data needs to be scrambled to fit the information into just 8 bits instead of 16 per sample, which will drastically degrade the quality.

Every digital audio stream, even encoded with a compressed encoder (lossy or lossless), has the following sample format properties, which are important elements that describe the properties of the stream:

An example of bit width and depth, i.e. 8 bit, 16 bit, etc. Bit widths and depths are slightly different and there is also big endian / endian byte order (which does not affect quality) and signed or unsigned sign (which does not matter either) affects quality but does affect encoder / decoder operation with data). The key point to remember is “the more bits the better”. So 32-bit is better than 16-bit, etc.

Frequency, also known as the sample rate. The more the better, because more “samples” of sound are played per second. Imagine sliding your finger across a deck of cards and seeing the cards blur; this is essentially how digital sound occurs. Each sample is a map, and if you have more maps flying per second, the sound is softer. For example, you would really notice if you were flipping only 5 cards per second, but everything would be blurry if you were flipping thousands of cards per second. So it’s even better, because it’s more natural and closer to reality, which is analogous and infinitely divisible (well, up to Planck units, but this is debatable and off-topic).

Lossless simply means that if you use the same or better sample format in the output that you used in the input, you won’t lose any data.

Therefore, if you change from 16-bit to 32-bit sample format, you will not lose data. But if you go from 32 bit to 16 bit, you will lose data.

So the answer to your question about whether it makes sense to use FLAC depends on the original data: if you have 64-bit WAV files that were originally recorded in this 192,000 Hz (or 192 kHz) sample format, and you convert them to “format Standard 16-bit 44.1kHz FLAC, you’ll lose a ton of data. But if your WAV file is 8-bit with 22100 samples per second and you convert it to 16-bit FLAC with 44100 samples in second, you won’t lose data. and you can even increase the file size depending on whether you gain lossless compression or a smaller sample format.

The sample format will affect the amount of space the file takes up, so “bigger” bits and a “faster” sample rate will take up more space.

When it comes to practical considerations and human hearing, you won’t notice if you convert very high-quality originals to 16-bit FLAC at 44.1 kHz. But you won’t notice any improvement if you convert MP3 to FLAC either. As such, you need to evaluate what format your raw data is in before deciding what to do.

Improved efficiency of digital audio data compression algorithms.

Improved efficiency of digital audio data compression algorithms.

audio compression

The relevance of the work. Methods for encoding high quality (HS) audio signals have become very widespread in the last decade in the field of broadcasting, digital sound recording, and home audio and video equipment. There’s even a fast-growing new class of consumer electronics: portable MP3 players.

Audio Compression:

Digital television and radio transmission networks are being developed, providing consumers with high-quality images and sound with a wide coverage area. The popularity of radio and television broadcasts over the Internet and mobile phone networks is increasing. All these technological innovations have become economically viable, and in some cases even technically possible, thanks to the use of highly efficient digital video and audio data compression algorithms, such as MPEG-1 ISO / IEC 11172, MPEG-2 TSO / IEC 13818, MPEG-4 ISO / IEC FCD 14496, ATSC Dolby AC-3. At the same time, due to the economic advantages of using these algorithms, which make it possible to reduce the bandwidth requirements of the transmission channels or the capacity of the information carriers by an order of magnitude, it is necessary to compensate with a certain decrease in the sound quality. During the era of the dominance of digital audio CDs, consumers have created a requirement for high sound quality from any sound reproduction equipment. The efforts of algorithm developers for encoding audio signals have always been aimed at ensuring that the quality of decoded audio material is no worse than that of a CD. Sound quality is often the determining factor in the economic success of digital broadcasting services or digital sound distribution services like iTunes). Further,

It is obvious that the problem of improving the quality of audio coding is today one of the key problems for the sound recording industry, the audio broadcasting industry and the manufacturers of various multimedia systems.

The basic principle of operation of highly efficient audio coding systems is to use the properties of the human auditory system, mainly the phenomenon of masking. The phenomenon of psychoacoustic masking is due to the biophysical and neuronal processing of sound signals by the human auditory system [173]. At the same time, part of the sound information does not affect the acoustic perception of the sound signal due to the presence of components with greater intensity in it. Therefore, the strongest components of the audio signal form the so-called masking thresholds. Sound information with a signal energy level below the masking threshold is not perceived by the auditory system. In the traditional digital representation of audio signals using pulse code modulation (PCM), time-sampled samples of the original signal are represented using a specific number of bits in the code word. The finite precision of the instantaneous values ​​of a continuous analog signal introduces an error in the signal, the so-called quantization noise. The idea of ​​encoding audio signals with the elimination of psychoacoustic redundancy is to combine psychoacoustic analysis and the quantization mechanism of audio signals [112]. In this case, the digitally encoded signal is converted into a time-frequency representation, as close as possible to the time-frequency resolution of the human auditory system. Psychoacoustic analysis determines the masking thresholds at each point in the time-frequency representation of the encoded signal, and the quantizer re-quantizes the signal with the minimum possible number of bits per sample, in which the increasing quantization noise is still below the masking thresholds. Thus, a compact representation of audio signals can be achieved without subjective degradation of sound quality. It is obvious that the efficiency and quality of such systems depend mainly on the precision of the psychoacoustic analysis. a compact representation of audio signals can be achieved without subjective degradation of sound quality. It is obvious that the efficiency and quality of such systems depend mainly on the precision of the psychoacoustic analysis. a compact representation of audio signals can be achieved without subjective degradation of sound quality. It is obvious that the efficiency and quality of such systems depend mainly on the precision of the psychoacoustic analysis.

LEARN HOW AUDIO DATA COMPRESSION WORKS

LEARN HOW AUDIO DATA COMPRESSION WORKS

Audio Data Compression

MP3s Around Us Many, many years ago, the Internet was supposed to be the force that would democratize the music industry, physical distribution was supposed to become obsolete, and it was possible to publish music on the Internet and be heard by millions of audiences.

Audio Data Compression

In fact, enthusiasts and companies have created websites where fans can listen to new tunes, the MP3 format has made it easy to place songs for critics, and music demo pieces are now helping to sell a CD or LP. physical. It is not difficult to put your music on the Internet, but if you are not a star of the first magnitude, you will have to accept the placement of the data in compressed format to save space on the server, as well as save download time for those who download your masterpiece. While there are many critics of MP3, there are ways around some of the limitations of this format.

The MP3 format is based on the use of data compression algorithms that can reduce the amount of data required to play music. Compression algorithms in MP3 work with loss of data, they do not work like Zip or Rar compression algorithms that restore original file without data loss. MP3 algorithms discard “unnecessary” data. For example, if there is a lot of high-level sound on a track, the algorithm may assume that you cannot hear low-level material and think that only 24 dB of dynamic range is sufficient for that part of the audio material. It only requires 4 bits of data, a quarter of the data needed for 16-bit resolution. Unfortunately, it is difficult to preserve the sound quality of music when compressed, but it is possible. One way is to use algorithms, working without data loss, such as FLAC, or some algorithms offered by Microsoft and Apple for their audio formats. However, these algorithms do not lead to a significant reduction in file size; with complex music, the size reduction can be only 10-20%.

Although there are many algorithms for compressing audio data, only a few are the most common:

MP3. This format allows multiple levels of encoding, you can create audio files of almost any size with a smaller size with greater loss of precision. There are many free and shareware MP3 players (such as iTunes and Windows Media Player), to encode MP3, you can use iTunes and most digital audio editors.

AAC. As the native iPod format, this format is quite popular and sounds better than MP3 for the same file size according to most users. ITunes can convert files to AAC.

Windows Media Audio. The format is promoted by Windows, but is used less frequently than MP3 or AAC. WMA sound quality is generally better than MP3. While Microsoft does not offer users WMA playback software for the Mac platform, the Flip4Mac utility (free version available) can play Windows Media formats on Mac.

Ogg Vorbis. A great but rarely used format that sounds better than MP3 at the same bit rate, and unlike MP3, the encoding tools are free for developers. Ogg Vorbis files are not widely used yet, but they are popular with advanced technical users.

FLAC. This popular lossless format is not supported by many portable music players, but musicians often use FLAC to exchange files when working on collaborative projects. High sound quality is maintained.

Although MP3 does not offer the best quality, this format is most often used when placing audio files on the network. all players can play MP3. It is important to choose the correct MP3 settings. When encoding files to MP3, it is always best to use a high-quality source file without compression. Then select the compression settings. When saving in MP3 format, you can generally choose from a range of bit rates (bits per second), from 320 kbps stereo (great quality, but also a fairly large file) to 8 kbps mono (good enough for dictation) . In addition to the fixed settings, there is variable bit rate (VBR) encoding, which optimizes the bit stream according to the playback material. VBR encoding is not supported by all players.

Audio compression: facts, myths, and a blind test

Audio compression

When compressing, for example with MP3, there is a loss. But do you hear that? Where does good hearing end and where does esotericism begin? We verify the theory with a blind test, which you can do yourself.
Audio compression is a constant part of everyday life – almost always when you listen to music, it gets compressed. However, audio signal processing is difficult to understand for people who do not work in this field and who have adequate basic training. Consequently, in my impression, most people do not care at all or demonize MP3 and everything that has to do with compression.

MUSIC PRODUCTION WEEK: DAY 2, Compressor Tuesday: How to use compressors  and why? — Steemit

The question is: Are we depriving ourselves of a pleasant pleasure if we only listen to music on Spotify or YouTube? Or don’t you notice a difference with the best possible quality?

Numbers and what they say

Different measurement parameters say something about sound quality, but what exactly is it? The following is an overview of the factors as brief and clear as possible.

1. Bit rate

Bit rate tells you how many bits are processed per second. It is also called data transfer speed or bandwidth.

It makes intuitive sense: the more data that flows, the higher the sound quality. Bit rate is the most important measured variable in everyday life. However, the bitrate alone doesn’t say much about sound quality.

There are variable and constant bit rates. Today variable bit rates (abbreviated VBR) are mainly used. In “little happens” passages, more data can be compressed without audible loss, whereas a relatively large amount of data is stored in complex passages. The result is higher sound quality with the same file size. In the case of variable bit rates, the average is given as a value, sometimes also the maximum allowed.

2. Compression method

CAA compresses more efficiently than MP3, making it better quality than MP3 at the same bit rate. The same goes for Ogg Vorbis, which is used on Spotify.

Also the compression software that Encoder, has an impact on the quality. In the early days of MP3, 128 kbit / s songs often sounded terrible. Now they sound so much better because bad encoders are no longer used.

3. bit depth

Bit depth tells you how many bits a sample has. Therefore, it is also called the sampling depth. The more bits per sample, the more different volume levels can be stored.

This may remind you of photos and videos – there are bit depths too and they mean something similar.

The LG V30 can record * 10-bit videos **. What is the point? A direct comparison with our system camera VIDEO
mobile background
The LG V30 can record 10-bit videos. What is the point? A direct comparison with our system camera.
Which is better: * RAW or JPEG? **
background photo + video
Which is better: RAW or JPEG?
A CD has 16 bits per stereo channel. There is no fixed bit depth with MP3 and other compressed audio files. Bit depth hardly plays a role in normal everyday life, only in studio recordings. Sometimes 24-bit is also used there to get more out of the sound processing. However, in the end, the music is reduced to 16-bit because it can see the difference, according to acoustics experts I can’t hear anything.

.
4. Sampling frequency

The sample rate (also called the sample rate) is also irrelevant for normal music listeners. But it is important to understand how digital sound storage works in the first place. A CD has a sampling frequency of 44100 Hz or 44.1 kHz. Hertz is a unit of measurement that indicates something like “frequency per second”. In audio sampling, it means that the sound level is measured 44,100 times per second. The same applies here: when recording in the studio, higher values ​​make sense, but not in the final format.

Nyquist’s theorem: Many people believe that digital music is fundamentally a loss compared to a “real” (analog) sound wave. These discussions began when the CD was invented and immediately ridiculed by audio snobs as inferior to the record. But that can be refuted. The Nyquiste Theorem states that an audio curve can be completely reconstructed from individual points without any loss if the sample rate is high enough. And it also says how high the rate should be: twice the bandwidth. Since the human ear reaches a maximum of 20,000 Hz, this bandwidth is roughly selected. Hence the sample rate of just over 40,000 Hz.

5. Other factors

With all the technical measurement parameters, it should not be forgotten that the best values ​​are useless if the sound is already badly recorded. For example, if the sound engineer has not set the volume level high enough, dynamism is lost. The recording starts to creak when it gets louder afterwards. If the level is too high, the result is even worse: the recording is cluttered, rattles and scratches. Or a dynamic compressor alienates the result. Bad recordings are ubiquitous on YouTube and are also sold on CDs, for example for very old studio recordings or live concert recordings.

The quality of your headphones or speakers also has an influence. With faulty minijacks, you will barely hear a difference between 128 kbit / s MP3 and uncompressed music. Most likely with good boxes.

How is music encoded?

First of all, let’s understand why music should be compressed.

Uncompressed files like AIFF and WAV take up a lot of space. This causes that it is not comfortable to transfer them on phones or players, or even store them on the hard drive of our computer.

Lossy audio encoding

Even trying to send them online would be very difficult, due to their large size.
,
This has forced the creation of various formats of audio files that take up less space. Of course, the important thing is that they sound practically the same as the original, although they take up less space.

lossless lossy audio

This is where compression enters the picture.

On the one hand, ZIP or RAR compression is used, but it is not enough. So other techniques are used, namely:

– An uncompressed file contains a lot of information about sounds (even silence) that is inaudible to the human ear and that information is discarded. With that one, it is possible to save a lot of space, since there is little point in occupying space in storing information about sounds that our hatred cannot perceive.

-On the other hand, there is a perfectly known phenomenon regarding the human ear, which is based on the idea that if two sounds occur more or less simultaneously and these sounds occupy similar or close frequencies and one of them sounds louder, the ear You will NOT hear the less loud sound.

This is other information that can also be discarded, since it is generally not audible or the brain does not process it.

Once discarding both types of information, the file has been much less large and therefore does not occupy the same space.

Practically what remains is to apply some composition algorithm, something similar to ZIP. And then you will have a compressed file, for example the mp3.

This is called the lossy method.

There is another method, without loss, where it is only compressed with a method similar to ZIP, but without discarding information.

Is there really a difference between the two? Practically no. the human ear practically cannot distinguish between the two.

A file with loss, that has a good sample rate (minimum 44,100) and a good bit rate, it is almost impossible to distinguish it from the original and therefore, from the file without loss.

Many experiments have been done allowing people to listen to both types of files (those with loss and those without loss) and more than 90% have not been able to distinguish between them, as long as the one with loss has a good samplerate and a good bit rate.