how mp3 compression works Archives - Page 3 of 4

MP3 audio format. What is the bit rate?

Free Download Mp4Gain

MP3 audio format. What is the bit rate?

Mp3

2. MP3 (CBR, VBR, ABR)

MP3 is currently the most widely used and widely used lossy compressed digital audio format. It has been explained above and will not be repeated here.

CBR (constant bit rate)

CBR is the oldest and simplest MP3 encoding (compression) method. When this method is used for encoding, the bit rate of the entire file is the same, in other words, the bit rate used by the MP3 file per second is the same. Although the music file has sections of varying complexity, the encoder always keeps the bit rate constant, unless you use the highest sound quality, otherwise the sound quality of the different sections of the MP3 file will vary. The more complex the passage, the worse the sound quality. Its biggest advantage is that the file size is fixed, which is convenient for calculating storage space.

VBR (Variable Bit Rate, Variable Bit Rate)

VBR is a variable encoding rate MP3 compression method. Its principle is to encode the complex part of a song with a high bit rate and the simple part with a low bit rate. Through this dynamic adjustment of the encoding rate, the sound quality can be improved. additionally obtained and the size of the file. Its main advantage is that the entire song can approximately meet our sound quality requirements, but the disadvantage is that the size of the compressed file cannot be estimated during encoding.

Most MP3 players released now support VBR, but although some machines can play songs in VBR format, they can’t display the playing time correctly. Nowadays, a lot of high-quality MP3 music is encoded in VBR.

ABR (average bit rate, average bit rate)

ABR is an interpolation parameter of VBR. It is an encoding method developed on the basis of VBR. This encoding mode is created for the large file size of CBR and the variable size of files generated by VBR. ABR is within the specified file size, with every 50 frames (about 1 second for 30 frames) as a segment, low-frequency and insensitive frequencies use relatively low traffic, high-frequency and large dynamic performance use high traffic , which can be used as VBR and CBR A compromise.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

What is bit rate? Knowledge of the MP3 audio format.

Digital audio formats are audio signals that are recorded, processed, and reproduced in digital form.

The emergence of digital audio formats is to meet the needs of high-fidelity playback, storage and transmission. Simply put, early analog audio formats had issues with playback distortion and glitches due to media wear. Since the advent of the CD, digital format audio files have become popular, but another problem has arisen: the limitation of the storage volume, and the CD still has the phenomenon of wear. Saving to hard drive (relatively longer storage time) is not a good solution when storage media (mainly hard drives) are still expensive at the time. The rise of the Internet has created a requirement for long-distance file transmission. Under the restriction of bandwidth, the demand to reduce file size has become more intense. All this has led to the generation of lossy compressed digital audio formats from external factors!

In terms of internal factors, with the improvement of computer operation and coding capabilities, the progress of various acoustic psychological models has promoted the emergence of various lossy compressed digital audio formats. Some of the most commonly used audio formats in MP3 players are briefly introduced below: MP3 (CBR, VBR, ABR), WMA, WAV, ADPCM, and the emerging audio formats AAC, ASF, and OGG.

Before introducing various digital audio formats, let’s clarify one concept: bitrate.

In the field of computing, all information is digitized. Bit is the smallest unit of data in a computer, it refers to a number of 0 or 1, which is a mathematical binary number, a “0” or “1” , is a bit. For example, when we say a 2-digit number, it means that it is a two-digit binary number, and there are 4 combinations of “00”, “01”, “10” and “11”, which represent 0, “11” in decimal respectively. 1, 2 and 3 are four numbers.

Bitrate is a benchmark indicator of the efficiency of digital music compression. The bit rate represents the number of bits bps (bit per second, bits per second) transmitted per unit of time (1 second). We usually use kbps (in simple terms, it is per second) clock 1000 bits) as the unit. The bit rate of digital music on CD is 1411.2 kbps (ie recording 1 second of CD music requires 1411.2 × 1024 bits of data). The higher the bit rate of the music file, the more data (Bit) must be processed in a unit of time (1 second), and the better the sound quality of the music file. However, when the bit rate is high, the file size increases, which will occupy a large amount of storage capacity. 8 to 320 kbps.

1. WMA (Windows Media Audio, Windows Media Audio)

As a Microsoft media compression method, it is a part of the technology that compresses only audio data in Windows Media Technologies. The sound quality is similar to MP3 and can be compressed with half the technology of MP3. It has the copyrighted Windows Media Rights Manager and can be played by installing it in WMP (Windows Media Player, Windows Media Player). Due to the strong influence of Microsoft and Windows, as well as major copyright reasons, the major American record companies EMI and BMG have officially confirmed that they use the WMA method developed and produced by Microsoft. It is believed that this advanced method will become even more popular in the future.

What is MP3?

What is MP3?

MP3

“MP3” widely used in audio players. The official name is “MPEG-1 Audio Layer III”, which is the audio format for MPEG-1. The MP3 format itself is being standardized in parallel with MPEG as the video format, and in 1992 it will be standardized as “ISO / IEC IS 11172-3 (MPEG-1 Audio)”.

After that, MP3s will be distributed “as is” among enthusiasts, but this has not been a major advance since the introduction of the portable “mpman” audio player launched by SAEHAN International in South Korea in 1998. By combining this player, which can download and play music data over the Internet, with Napster, which appeared in 1999, the scene of portable audio players that used to carry cassettes, CDs, MDs, etc. it will change completely.

MP3s can also reduce the original data to less than one tenth. For example, it has become possible to compress a one-hour music CD to about 40MB and, using Napster, etc., we have established a new need for music sharing between users. After that, despite various “RIAA (Recording Industry Association of America)” procedures and the emergence of successor formats formulated by many manufacturers, MP3s remain a widely used audio. It is still used as a format.

■ MPEG

To understand the working principle of MP3, let’s first explain about “MPEG Audio” itself. A feature of MPEG Audio is that it uses auditory psychology, the lower audible limit of hearing, and the masking effect.

Let’s start with this minimum audible limit. In general, it is considered that humans can hear sounds in the range of 20 Hz to 20 kHz. Of course, this is an average value, and some people can hear a wider range, while others can only hear a narrower range, but this time I’ll drop it.

So if you can hear any sound in the 20Hz to 20KHz range, that’s not the case. The lower audible limit curve is shown in Fig. 1, and it is possible to hear even a fairly low sound around 2KHz, but at frequencies above or below it, it is heard that it is not considerably loud. .

You may have heard the term “volume curve”, which is the curve shown in Figure 1. Therefore, even if there is a sound source that sounds in a wide range from bass to treble (Fig. 2 ), the human ear has the characteristic that it can only be heard with both ends drooping (Fig. 3). By taking advantage of this and omitting all inaudible frequency data, a great deal of compression is made possible.

Masking effect

The masking effect is another phenomenon. For example, when a very loud sound is generated at a certain frequency, a specific area called “Critical Band” is created before and after that. And you won’t hear any of the other sounds included in this critical band.

When sound A is generated, the sloping area that extends to the before and after frequencies is the Critical Band. I can hear the part of the B sound that sticks out of the Critical Band without any problem, but I can’t hear the C sound that completely fits into the Critical Band.

In MPEG Audio, compression efficiency is further improved by omitting sound data that cannot be heard due to this critical band as before. By the way, the masking effect itself is effective not only in the direction of frequency but also in the direction of the time axis. In other words, not only immediately after a loud sound is generated, but also just before that, you cannot hear a small sound for some reason. This is called the temporary masking effect, but in Figure 5, sound B and sound C become inaudible. This is also effective for data compression.

Mp3, the star format, the reasons

Mp3, the star format, the reasons

MP3

Another interesting property of hearing is that the lower the volume level, the lower its resolution, the lower the number of sounds perceived. When the volume is lowered the high frequencies are better perceived, when the volume is increased the low frequencies are perceived. And they do not complement each other, but rather replace each other.

File MP3 Icon - Silverblue Icons - SoftIcons.com

A person does not perceive some sounds, focusing on others. Pay attention: an instrument, or a voice, is usually audible clearly and consciously. Everything else becomes a background or a single tune. And no matter what we focus on in composition, we cannot increase the number of basic sounds perceived.

How to create the mp3

All these data obtained from experimental studies are gathered and presented in the form of an ideal model of human hearing. The MP3 standard focuses on this.

Everything that a person does not hear unambiguously is immediately cut off. Post-processing degrades the sound according to the understanding of this model.

Thanks to the great work done, modern psychoacoustic models accurately evaluate human hearing and do not stand still.

In fact, despite the assurances of music lovers, musicians and audiophiles, to the inexperienced middle ear, the highest quality MP3 has almost extreme parameters.

There are exceptions, they cannot cease to exist. But they are not always easily noticed by blind listening. And they are no longer derived from the mechanisms of hearing, but from the algorithms for processing sound information in the brain.

And here only personal factors play a role. All of this explains why we love different headphone models and why the numerical characteristics of the audio cannot unequivocally determine the sound quality.

MP3 fits everything: analog quality

Audiophiles’ insistence on picky FLACs is worth going through another serious sift. Most analog recordings do not contain enough information for lossless formats.

All CDs are recorded at 44.1 kHz sample rate and 16-bit quantization. Where does 192 kHz and 24/32 bit come from, which is used when encoding in FLAC? They are not, this is a doll!

You will object that these parameters are higher for analog sound … But for an audio cassette and a magnetic tape (unless, of course, it is a Japanese master tape), the characteristics of an audio CD are NOT ACCEPTABLE. For conventional studio equipment, the ability to record analog sound corresponding to AudioCD is relatively new.

Therefore, it makes no sense to digitize recordings from the pre-digital era in frenetic quality, especially those made on magnetic media. They do not contain those spectra and the amount of information that containers can store without compression.

Everything fits in MP3: digital

Strictly speaking, with most digital recordings, the image is the same. In the 90s and later, cheap plastic boomboxes appeared. The sound engineers had to take care of the uniform sound on all devices: the dynamic range of the recordings was reduced to 10-12 bits.

One more point. Until recently, no one recorded in a very high-quality studio. Because it is difficult to work simultaneously with several dozen audio tracks with high recording quality, and sometimes there are simply not enough human and technical resources.

Why mp3 is enough for you, but Lossless is not necessary

Why mp3 is enough for you, but Lossless is not necessary

Mp3

Did you finish the greenhouse? So you don’t need to lose, listen to high quality mp3.

Very often there are people who, in principle, despise compressed formats. You should not be guided by your opinion. The following mods that in the studio with a 90% probability will not hear the difference between compressed and uncompressed audio.

What is mp3

MP3 isn’t just about cutting quality. It was developed by the Fraunchhofer Society, an association of applied research institutes in Germany. Later they came up with AAC, which could become the main compressed audio format … But it didn’t work.

Did you know that MP3 comes with variable (VBR) and constant (CBR) bit rate? The constant bit rate, due to the operation of the algorithm, is encoded each time as the first. Therefore, it can produce uneven quality, which means that not all sounds in this situation will be recorded in high quality.

Since MP3 has been around for a long time, it has many limitations. Bit width is 16-24 bits. The sample rate is represented by the following set of options: 8; 11,025; 12; sixteen; 22.05; 24; 32; 44.1; 48. The maximum bit rate does not exceed 320 kbps. The maximum number of channels is 2. But we are still talking about music, we still have to search for multi-channel recordings.
25104704-2
Now let’s see how MP3 is encoded. The illustration shows the time-frequency distribution of sound. Same recording: Audio CD, OGG file, MP3 well encoded. What we observe is that the pieces on the right and left almost completely coincide. This means that the MP3 file sounds almost the same as the original CD recording.

Human hearing and its limits – psychoacoustics

The fact is that the main task of the Fraunchhofer Society is the development of psychoacoustic models of human perception of sound. And here are many subtleties. The main thing is that we are not dolphins.

Second, there are certain restrictions on the number of sounds perceived simultaneously. A person cannot simultaneously hear more than 250 sounds of 24 ranges (in addition, the number of simultaneous sounds in the range is also quite small).

Third, the audible range is 16 Hz to 20 kHz and at the age of 60 it is reduced by almost half. Ideally, and during training (yes, you have to train it!).

All frequencies below 100 Hz are perceived not by the hearing cells, but … by the skin. Then the low waves are reflected in the ear canal; these waves are perceived as infrabass. (This is from the bone conduction area).
and
Also, the number of cells that register acoustic waves is different for each one. But what is there? For each individual, their number in the right and left ear is different.

By the way, the perception of each ear is different. Change channels of your favorite song – get a new sound.

If you dig deeper, it turns out that each sound frequency is perceived only at a certain volume. When it is reached, the silence is replaced by a sharp and quite different sound. After that, a person can hear a lower sound of this frequency.

What you need to know about MP3

What you need to know about MP3

Mp3

What is MP3?

MP3 is short for MPEG Layer3. It is one of the transmission formats for storing and transmitting audio in digital form, developed by Fraunhofer IIS and THOMSON, and later approved as part of the MPEG1 and MPEG2 compressed video and audio standards. This scheme is the most complex scheme in the MPEG Layer 1/2/3 family. It requires the most amount of machine time to encode compared to the other two and provides higher encoding quality. It is mainly used for audio CD encoding.

The high degree of compactness of MP3 compared to other formats such as PCM (i.e. normal WAV- file) and similar formats while maintaining similar sound quality (considered 16-bit stereo at 44.1 kHz) is achieved using additional quantization according to a certain scheme, which minimizes the loss of quality. This is achieved by taking into account the peculiarities of human hearing, including the masking effect of a weak signal from one frequency range with a stronger signal from an adjacent range, when it occurs, or a strong signal from the previous frame, which causes a temporary decrease in the ear’s sensitivity to the current frame signal (simply, background sounds are eliminated, which are not heard by the human ear due to the presence at a given / previous moment of another – louder). It also takes into account the inability of most people to distinguish between signals that are below a certain power level,

This is called adaptive coding, and it allows you to save on the less perceptually significant sound details. The compression ratio (and therefore quality) is not determined by the format, but by the width of the data stream when encoded in MP3. The bit rate when encoding a signal similar to an audio CD (44.1 kHz 16 bit stereo) varies from the largest, 320 kbs (320 kilobits per second, also kbs, kbps or kb / s), up to 96 kbs and less.

Why MP3?

MP3 has two huge advantages over other formats available today. It is true that MicroSoft is trying to squeeze MP3 with its new WMA format, and there are also alternative VQF and AAC formats, but they have not yet received proper distribution and the quality is often a little worse. However, WMA is still, in fact, closed for free use, so you have problems with various encoding / listening / maintenance programs (although, who doubts MicroSoft’s mobilization capabilities :-).

The first advantage of MP3 is that none of the existing similar formats can yet be said to fully guarantee the stable preservation of sound quality at sufficiently high bit rates, except MP3, which has stood the test of time with dignity.
The second, no less important advantage: over the next few years, and perhaps the entire decade, MP3 has become the de facto standard, as the parties that use it (eg me 😉 have made a lot of investments in him, including digital radio stations. There are also many easy-to-use software programs written for MP3. Now the production of hardware MP3 players has been launched, both pocket and car. Thus, MP3 became the first massively recognized audio storage format after Audio CD (although it is often illegal).

The most famous encoders

Today there are 3 main sources that have created programs to encode MP3 music. These are Fraunhofer-IIS, Xing Technologies, and ISO itself, which adhere to the ISO MPEG standard developed by it.
Most of the encoders created to date use modified code from one of these organizations. Fraunhofer-IIS based encoders are not very fast, but very high quality, quality optimized for low bit rates.

128 kbps (11: 1)
The most popular bit rate today. The 11: 1 compression ratio is of course an argument, especially for the internet, where every kilobyte counts. However, the high frequencies are not very well preserved and there is some distortion in the sound. At the same time, I can safely say that on an ordinary computer, for example, using an ordinary sound card, computer speakers, albeit of good quality, or output through a simple recorder to your speakers (using the input for a External CD, like me), the difference will not be noticeable unless you are a sound expert.
However, in normal speakers (at least large and expensive), the lack of high frequencies is quite noticeable.

Interview with the inventor of the mp3: “We weren’t the only ones, we were just better”

A handful of German inventors from the Fraunhofer Institute in white coats invent a revolutionary process against all odds to compress music files to one-twelfth of their original size compared to CD with virtually no loss of quality. When was the moment they felt : Are we doing something bigger here?

mp3 developers

There are several moments. When I was still a student at the University of Erlangen in 1988 and doing basic research, someone visited our laboratory. My PhD supervisor, Dieter Seitzer, proudly demonstrated to this guest what we were currently working on: compressing digital music files. And when he asked what could become of our work, I replied: “Either our work will be forgotten and it will be accumulating dust in the library, or technology will become a standard that will be used by millions of people.” But I did not dare to dream about it. that really happened.

In 1977, his PhD supervisor, Seitzer, from Erlangen, had the idea of transmitting music by telephone wire. And they all said, “I can’t.” And then you came. What application did you originally have in mind? Was it music in your pocket?

Back then, all textbooks said that you could compress images, videos, and voice, but definitely not music. It is too sensitive and complex. That was the starting point.

We asked ourselves: How can we compress music in that way, that is, reduce the amount of data per piece of music, so that people don’t hear the difference?

The question is to understand how the human ear works so that very similar things happen in our encoder, which compresses the music, as in the inner ear. Even in the inner ear, not all data is transmitted to the brain through nerve fibers. The brain always compares pitches with an internal reference, basically checking what it knows. In addition, there are so-called masking effects: if the sensory hairs tremble in the ear, the other sensory hairs are also automatically stimulated. This leads to the fact that the tones overlap and cannot be perceived at all. This is due to the mechanics of the inner ear. We use this as a guide when we come to the question: For what data can we reduce the level of detail, without being heard? Where would a coarser data structure be acceptable? We did not invent this trick in Erlangen. We weren’t the only ones working on it. We have only brought this knowledge to concrete results faster and optimized it better.

Is it true that you bought records for 1,000 marks in a music store in Erlangen to have compression material?

It is true. We had requested the project and absolutely needed better speakers, a small sound booth, and most of all, lots of audio samples. So I went to buy records: simple pieces, complex pieces, music of all genres, in all areas. We didn’t know what would work and, more importantly, what wouldn’t.

You mean the famous example of the Suzanne Vegas song “Tom’s Diner”, whose a cappella intro with “Da da da da …” was used to fine-tune the psychoacoustic MP3 model. What exactly was it about?

That was a special challenge: dense tones that the ear can still filter very well. My dissertation was almost done at the time and I really believed: I’m done, my process works for all kinds of music. But then I read in a hi-fi magazine that Suzanne Vegas’ voice had been used to test speakers. A colleague bought the CD because we wanted to know: What happens if we compress this music? The result was a disaster.

And how did you solve the problem?

There were two solutions. The first was to realize that what we had read in the specialized literature about how the masking of signals so rich in spectra works was not really true. Then we realized that psychoacoustics in these cases works differently than what the publications of the time suggested. We then test what happens when we transmit the lower frequencies very precisely and become less complex at the higher frequencies in favor of less storage space. That worked

Mp3 Compression, step by step

The MP3 Encoder is that program that analyzes the uncompressed digital file (for example, a Wav file) and transforms it into an MP3 file.

The audio signal is filtered and divided into 576 areas (called subbands) through a process that uses DCT (Discrete Cosine Transformation) and manages to eliminate all unnecessary frequencies. The human ear, as already stated, perceives sounds only beyond a certain threshold so that all the audio below is not encoded.

Auditory Perception

At this point, the resulting signal passes through the psychoacoustic model in which the masking thresholds of which we have spoken previously are identified. This is done using the discrete Fourier transform (DFT, Discrete Fourier Transform).

During the masking of the 576 subbands, the frequencies to be masked are determined and therefore can be removed.

Auditory perception

After masking, the defined Stereo Ensemble process is applied. Below a certain frequency, the ear cannot perceive the spatial position of sounds, so they can be recorded on a single channel (therefore in mono format) with significant space savings.

Once the file is ready, the data is further analyzed and compressed using Hufmann encoding which allows for a data reduction (without loss of information) of approximately 20%.

At this point, after all the data has been collected, the encoder proceeds to create the bit stream that will form the final MP3 file.

Compression criteria

To perform such compression, the MP3 format is based on a simple concept: filter a digital musical piece and eliminate all unnecessary information, thus reducing space.

The human ear is an almost perfect instrument but it also has its limits. The human ear pass band extends from 20 Hz to 20,000 Hz, but is much more sensitive to those in the mid-range, 700 to 6,000 Hz, where most of the information is concentrated.
The study of auditory perception is a matter of psychoacoustics that mainly analyzes 2 factors that are later used in MP3 encoding:

Auditory perception

In the area of sounds, only a few can be heard by the human ear. The following figure shows these areas that represent the different sound frequencies. Only those in the white area are audible from our ear.

Masking

Masking is nothing more than the superposition of weak sounds with loud sounds. It almost always happens that the sounds of different instruments overlap each other. In cases where the loudest sound completely covers the lowest, there is a so-called masking. In MP3 files, masking allows you to remove the information from the weakest sounds, which, however, because they are not perceived by the ear, are virtually irrelevant.

MP3: features and alternatives

The peculiarities of the MP3 format and some clues about other solutions of equal or even higher quality.

Impossible to deny, the MP3 format is the most common and most enjoyable to listen to music on the go or, as it has been for some years, streaming. We use it everywhere now and any device can play it today.

MP3 is part of the family of audio files called “lossy”, that is, the types of formats that can also reduce the amount of data that should contain a sound, in any case try to maintain at least an acceptable quality.

The peculiarities of the MP3 format and some clues about other solutions of equal or even higher quality.
The parameters that determine the quality level of an MP3 file are: the sampling rate, bit rate, encoder and of course the source. Now let’s move on to the order.
At the origin of everything is the source, that is, the support or source from which the MP3 file can be downloaded. The higher the quality of the source, the greater the end result: purchasing MP3s from particularly reliable sites or extracting them from compact discs in good condition is the basis for a successful MP3. What becomes crucial is the encoder (the most famous and free is LAME) or the software that takes care of creating the file after properly configuring its parameters.

Portada

The sampling rate is measured in Herz and expresses the number of times per second. Second, as the analog signal is measured and digitized; for MP3 it must be as faithful as possible on a CD, ie 44 100 Hz (44.1 KHz).

Bitrate is the number of binary units flowing, measured every second. The value of the bit rate is not fixed: as it increases, the similarity to the original file will also increase proportionally. The higher the bit rate, the higher the quality, the larger the file size. The bit rate range ranges from 32 kbps to 320 kbps, the maximum that can be obtained from an MP3 file.

The ones we’ve just listed are an important part of the tricks that allow us to have an MP3 quality; however, be aware that a lost file is by no means faithful in all respects to the original source. The most famous lost alternatives are: AAC (the format Apple uses to sell music in the iTunes Store and since July to stream audio from the Apple Music service); WMA; MPC; OGG (excellent quality open source format).

If you are looking for maximum faith in digital audio, give up MP3 and its loss-free alternatives to switch to “loss-free” audio formats, ie loss-free quality. Overall, this file type compresses the original sound while keeping the number of bits intact. Needless to say, quality comes at a cost in terms of the space taken up: a lossless file takes about half of the original audio file, but “weighs” nearly three times as much as a 320Kbps MP3. Of these, the most famous and used are: FLAC; ALAC (Apple Lossless Format); BEE; WavPack. The “lossy” and “lossless” file distinctions are extremely applicable to images and videos as well, not just audio files.

On several occasions it has been said how absolutely difficult it is to distinguish an MP3 at 320 kbps, obtained under the best conditions, from its original version on CD or in lossless files; It is only possible to notice it with instruments at a certain level and with a good ear. When noted, the MP3 format is excellent for listening on the move, as highlighted above; On the other hand, to better preserve our music or listen to it on systems of a certain level, it is better to resort to lossless formats such as FLAC or ALAC.

MP3 – Compression criteria

To perform such compression, the MP3 format is based on a simple concept: filter a digital piece of music and eliminate all unnecessary information, thus reducing space.

The human ear is an almost perfect instrument but it also has its limits. The human ear pass band extends from 20 Hz to 20,000 Hz, but is much more sensitive to those in the midrange, 700 to 6,000 Hz, where most of the information is concentrated.
The study of auditory perception is a matter of psychoacoustics that mainly analyzes 2 factors that are later used in MP3 encoding:

Mp3 – Auditory perception

The sounds that the ear perceives are only those of the white areas

Masking

MP3 – The Name

The name MP3 comes from the MPEG standard, which means Moving Picture Experts Group. This group was created specifically for the development of systems and standards used in video compression. DVD movies and satellite broadcasts (DBS) use the MPEG standard to efficiently compress video information.

MPEG compression includes a subsystem for sound compression with three different compression levels (layers) depending on the quality of the information. Layer-3 is the one used for the MP3 standard, which stands for MPEG Layer-3.

MP3 – Step by step compression

The MP3 Encoder is that program that analyzes the uncompressed digital file (for example, a Wav file) and transforms it into an MP3 file.

The audio signal is filtered and divided into 576 areas (called subbands) through a process that uses DCT (Discrete Cosine Transformation) and manages to eliminate all unnecessary frequencies. The human ear, as already said, perceives sounds only beyond a certain threshold so that all the audio below is not encoded.

At this point, the resulting signal is passed through the psychoacoustic model in which the masking thresholds of which we spoke earlier are identified. This is done using Discrete Fourier Transformation (DFT).

During the masking of the 576 subbands, the frequencies to be masked are determined and therefore can be removed.

After masking, the defined Stereo Ensemble process is applied. Below a certain frequency, the ear cannot perceive the spatial position of the sounds, so they can be recorded on a single channel (therefore, in mono format) with significant space savings.

Once the file is ready, the data is re-analyzed and compressed using Hufmann encoding which enables a data reduction (without loss of information) of approximately 20%.

At this point, after all the data has been collected, the encoder proceeds to create the bit stream that will form the final MP3 file.