Mp3, description of audio compression technique


Free Download Mp4Gain
picture

Mp3, description of audio compression technique

Digitization

Sound is a continuous wave that propagates through air or other media, formed by pressure differences, so that it can be detected by measuring the pressure level at a point. Sound waves have the proper and studyable characteristics of waves in general, such as reflection, refraction and diffraction.

To the Being a continuous wave, a digitization process is required to represent it as a series of numbers. Currently, most of the operations performed on sound signals are digital, since both storage and
Processing and transmitting the signal in digital form offers very significant advantages over analog methods. Digital technology is more advanced and offers greater possibilities, less sensitivity to transmission noise and the ability to include error protection codes, as well as encryption. With the appropriate decoding mechanisms, moreover, they can be processed simultaneously signals of different types transmitted by the same channel. The main disadvantage of the digital signal is that it requires a much greater bandwidth than that of the analog signal, hence an exhaustive study is carried out regarding data compression, some of whose techniques will be the center of our study.

Digitalization of the audio

The digitization process consists of two phases: sampling and quantization. At sampling divides the time axis into segments
discrete: the sampling frequency will be the inverse
the time between a measurement and the
following. At this time the
quantization, which, in its simplest form,
it simply consists of measuring the value of the signal
in breadth and save it.

Nyquist’s theorem

Nyquist’s theorem ensures that the frequency required to sample a signal that has its highest components at a given frequency f is at least 2f. Therefore, being the upper range of human hearing around 20 Khz, the frequency that guarantees adequate sampling for any audible sound will be around 40 Khz.
Specifically, to obtain high quality sound, frequencies of 44’1 Khz are used,
in the case of CD, for example, and up to 48 Khz, in the case of DAT. Other typical values ​​are submultiples of the first, 22 and 11 Khz.

Depending on the nature of the application, of course, the appropriate frequencies can be much lower, such that the voice process is usually performed at a frequency between 6 and
20 Khz. or even less. Regarding quantization, it is evident that the more bits used for the division of the amplitude axis, the “finer” the partition will be and therefore the less error when attributing a specific amplitude to the sound at each moment.

For example, 8 bits offer 256 levels of quantization and 16,65536. The dynamic range of human hearing is about 100 dB. The axis division can be carried out at equal intervals or according to a specific density function, seeking more resolution in certain sections if the signal in question has more components in
certain zone of intensity, as we will see in the coding techniques.

The complete process is usually called PCM (Pulse Code Modulation) and we will refer to it hereinafter. It has been described in a very simplistic way, mainly because it is widely treated and is well known, being
another the field of study of this work. However, we will go into detail at any time that is necessary for the development of the exhibition.

Coding and Compression.

Before describing coding and compression systems, we must pause in a brief analysis of human auditory perception, to understand why a significant amount of the information provided by PCM can be discarded.

The heart of the matter, as far as we are concerned, is based on a phenomenon known as masking.

The human ear perceives a frequency range between 20 Hz. And 20 Khz.

Firstly, the sensitivity is greater in the area around 2-4 Khz., So that the sound is more difficult to hear the closer to the ends of the scale.

Second is masking, the properties of which are used extensively by the most interesting algorithms: when the component at a certain frequency of a signal has high energy, the ear cannot perceive lower energy components at close frequencies, both lower and higher.

At a certain distance from the masking frequency, the effect is reduced so much that it is negligible; the range of frequencies in which the phenomenon occurs is called the critical band.

The components that belong to the same critical band influence each other and do not affect nor are affected by those that appear outside it. The width of the critical band is different according to the
frequency in which we are located and is given by certain data that shows that it is greater with frequency.

It should be noted that these data are obtained by psychoacoustic experiments, which are carried out with experts trained in
sound perception, giving rise to psychoacoustic models with their impressions.

This we have described is the so-called simultaneous or frequency masking.

There is also the so-called asynchronous or time masking, as well as other phenomena of hearing that are not relevant in this point. For now, let’s focus on the idea that certain signal frequency components support higher noise than we would generally consider to be tolerable, and therefore require fewer bits to be encoded if the encoder is endowed.
of the right algorithms to solve masks.

Digitizing the signal using PCM is the simplest form of signal encoding, and is used by both CDs and DAT systems. Like still digitizing, it adds noise to the signal, generally undesirable. As we have seen, the fewer bits used in sampling and quantization, the greater the error in
accept discrete values ​​for the continuous signal, that is, the higher the noise.

To avoid that the noise reaches an excessive level, it is necessary to use a large number of bits, so that at 44.1 Khz. and using 16 bits to quantize the signal, one of the two channels on a CD produces more than 700 kilobits per second (kbps). As we will see,
Much of this information is unnecessary and takes up bandwidth that could be freed, at the cost of increasing the complexity of the decoder system and incurring some loss of quality.

The compromise between bandwidth, complexity and quality
it is the one that produces the different market standards and will form the essential part of our study.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Mp3: What is it really?

Mp3: What is it really?

MP3 is a data format that gets its name from an algorithm
encoding called MPEG 1 Layer 3, which, in turn, is an audio compression system that allows you to store sound with a quality similar to that of a CD and with a very high compression ratio, on the order of 1:11

In practice, this means that about 11 audio CDs can be recorded on a CD-Rom, that is, approximately 150 songs.
The encoding system that MP3 uses is a loss algorithm. That is, the original sound and the one that we obtain later are not identical.

This is because MP3 takes advantage of the deficiencies of the human ear and eliminates all the information that we are not able to perceive. A multitude of studies of acoustic perception have been carried out, discovering that there are a series of effects that can aid the coding of sound with the aim of reducing as much as possible the amount of useless or redundant information. The most important are: The limits of hearing. Our ear only works with frequencies that go between 20 Hz and 20 Khz
approximately, so the remaining frequencies are disposable.

Masking effect.

It is one that occurs when two signals of similar frequency are
overlap. So we can only perceive the one that
it has more volume and, therefore, the one with a smaller volume is
liable to be removed

Stereo redundancy.

There are redundancies between the tonal and non-tonal components of the sound on the two stereo channels, and furthermore
below a certain frequency the human ear is not capable of
perceive the directionality of the sound, so below these
frequencies it is even possible to encode a single channel together with
complementary information to restore the spatial feeling for the other channel.

To carry out this “loss of information” action, a system called Subband Coding is used, a process by which the signal is broken down into subbands through a filter bank.

These subbands are then compared to the original using a psychoacoustic model that is responsible for determining which bands can be removed and which cannot.

Depending on the quality we want to obtain, more or less will be eliminated
bands. To end the process, the resulting subbands are quantized and encoded, and the final result is compressed using a standard algorithm, thus obtaining the resulting MP3 file. The encoding process is much more complicated than the decoding process, so it takes much longer to encode an MP3 file than to play it.

This perceptual coding algorithm was developed by the company MPEG (Moving Picture Expert Group) in conjunction with the Franunhofer Institute of Technology, and has been standardized as an ISO standard.

How much compresses an MP3

How much compresses an MP3

MP3 compression was an engineering response to the problem of digital storage and its large memory resource requirements. A conventional digital signal called PCM (Pulse Code Modulation) could easily require up to 10 Megabytes of memory per minute. This would represent about 30 Mb for a three minute song.
That requirement for storage memory could be handled by any computer if it were a few files, but when talking about three thousand songs the numbers become worrying. As if this were not enough, there is the problem of the Internet and its current transmission speeds. In the case of telephone lines, they have a limitation on their transmission bandwidth, so very large or heavy files represent a problem for conventional network traffic.

MPEG3 compression is considered the sound part of the original MPEG1 format that was intended for cinematography. Its abbreviations, Moving Picture Experts Group come from the committee that was created by the ISO Organization (international Standards Organization) and IEC ((International Electrotechnical Commission) to develop this format. Its principle is based on the Psychoacoustic model.

The human ear is known to discriminate sound according to its limitations. According to subject matter expert Paul Sellars, “If you hear solitary applause in a room, it will surely sound loud, but if it is preceded by the sound of a gunshot, it will sound fainter. The same thing happens in a room when you record a rock band, at a certain moment the strongest sound guitar in the mix, until the moment the drummer plays a certain cymbal, at which point the guitar will seem to attenuate “This phenomenon is used by the MP3 algorithm to perform its compression . I once explained it in the article that talked about ATRAC compression of the Minidisc.

The MP3 format divides the sound into 32 sub-bands, which allows it, according to the Psychoacoustic model on which it is based, to give priority to one element over another. At a certain moment in the material we can have a predominant low frequency sound of the kick drum, a high frequency of the cymbal and the vocalist at the same time. The algorithm is not that it eliminates two of them, but that it dedicates less storage space to them.

The mathematical part used with MP3 compression goes through the Shannon-Nyquist theorem, which states that for a wave to be properly reproduced in PCM digital format, its frequency of takes (Sampléo) must be twice the highest that is want to reproduce. In this case if we want to reproduce the frequency of 22.5KHz, (The auditory range oscillates between 20Hz-20KHz), our sampling frequency should be 44.1KHz.

The Fast Fourier Transform (FFT) is also used, which as we know can decompose a complex wave (PCM material) into a fundamental wave with its harmonics, all from its amplitude. The Discrete Cosine Transform is also used, which is based on the FFT but only using the real numbers

UNTIL IT IS RECOMMENDED

These formats will continue to be perfected and emerge, but it should be understood that despite being disseminated there may be details that will not be perceived. In other words, for serious Audio work this format should not be used.

Some improvements can be made by looking for compressors that have a better ratio, such as 224, 256 and 320 Kbps. You can also consider using VBR (Variable Bit Rate) encoding where musical passages with greater dynamic complexity are treated with a higher rate. storage in contrast to the simplest. However, this will bring other complications because not all the reproducers can handle them.