MP3 – the most popular digital audio format


Free Download Mp4Gain
picture

MP3 – the most popular digital audio format

Initial release 1986

MPEG-1 Audio Layer 3, better known as MP3, is a lossy compressed digital audio format developed by the Moving Picture Experts Group (MPEGH) to be part of version 1 (and later expanded to version 2) of the MPEG video. The standard mp3 is 144 kHz and a bitrate of 317 kbps for the quality / size ratio. Its name is the acronym for MPEG-1 Audio Layer 3 and the term should not be confused with that of MP3 player.

Mp3 – History

This format was mainly developed by Karlheinz Brandenburg, director of electronic media technologies at the Fraunhofer IIS Institute, part of the Fraunhofer-Gesellschaft – network of German research centers – which together with Thomson Multimedia controls the bulk of MP3-related patents. The first one was registered in 1986 and several more in 1991. But it was not until July 1995 when Brandenburg first used the .mp3 extension for the MP3-related files he kept on his computer. A year later, his institute paid 1.2 million euros for patents. Ten years later this amount has reached 26.1 million.

The MP3 format became the standard used for streaming audio and compression of high-quality audio (with loss in hi-fi equipment) thanks to the possibility of adjusting the quality of the compression, proportional to the size per second (bitrate), and therefore the final size of the file, which could occupy 12 and even 15 times less than the original uncompressed file.

It was the first audio compression format popularized thanks to the Internet, since it made possible the exchange of music files. The legal proceedings against companies like Napster and AudioGalaxy are the result of the ease with which this type of files are shared.

After the development of autonomous, portable or integrated players in music (stereo) channels, the MP3 format reaches beyond the world of computing.

At the beginning of 2002, other compressed audio formats such as Windows Media Audio and Ogg Vorbis began to be massively included in programs, operating systems and autonomous players, which made it foresee that MP3 would gradually fall into disuse, in favor of other formats, such as the mentioned ones, of much better quality. One of the factors that influences the decline of MP3 is that it has a patent. Technically, it does not mean that its quality is inferior or superior, but it prevents the community from continuing to improve it and can compel paying for the use of some codec, this is what happens with MP3 players. Even so, in late 2009, the mp3 format continues to be the most used and the most successful.

Mp3 player

Mp3 – Technical details

In this layer there are several differences with respect to the MPEG-1 and MPEG-2 standards, among which is the so-called hybrid filter bank that makes its design more complex. This improvement in frequency resolution worsens temporal resolution by introducing pre-echo problems that are predicted and corrected. Additionally, it enables audio quality at rates as low as 64 kbps.

Mp3 Filter bank

The filter bank used in this layer is the so-called hybrid multiphase / MDCT filter bank. It is responsible for mapping the time domain to the frequency domain for both the encoder and the decoder reconstruction filters. The bench output samples are quantized and provide variable frequency resolution, 6×32 or 18×32 subbands, adjusting much better to the critical bands of different frequencies. Using 18 points, the maximum number of frequency frequency components is: 32 x 18 = 576. Resulting in a frequency resolution of: 24000/576 = 41.67 Hz (if fs = 48 kHz.). If 6 frequency lines are used, the frequency resolution is lower, but the temporal resolution is higher, and it is applied in those areas where pre-echo effects are expected (abrupt transitions of silence at high energy levels).

The psychoacoustic model

Compression is based on the reduction of the irrelevant dynamic range, that is, on the inability of the auditory system to detect quantification errors under masking conditions. This standard divides the signal into frequency bands that approximate the critical bands, and then quantizes each subband based on the noise detection threshold within that band. The psychoacoustic model is a modification of the one used in Scheme II, and uses a method called polynomial prediction. It analyzes the audio signal and calculates the amount of noise that can be introduced as a function of frequency, that is, it calculates the “amount of masking” or masking threshold as a function of frequency.

The encoder uses this information to decide the best way to spend the available bits. This standard provides two psychoacoustic models of different complexity: model I is less complex than psychoacoustic model II and greatly simplifies calculations. Studies show that the distortion generated is imperceptible to the experienced ear in an optimal environment from 256 kbps and under normal conditions. For the inexperienced or common ear, with 128 kbps or up to 96 kbps it is enough to make you hear “well” (unless you have high quality audio equipment where the lack of bass is excessively noticeable and the sound stands out of “frying” in the treble). In people who listen to a lot of music or who have experience in the listening part, from 192 or 256 kbps it is enough to hear well. The music that circulates on the Internet, for the most part, is encoded between 128 and 192 kbps.

Coding and quantification

The solution proposed by this standard regarding the distribution of bits or noise is made in an iteration cycle that consists of an internal and an external cycle. Examines both the filter bank output samples and the signal-to-mask ratio (SMR) provided by the psychoacoustic model, and adjusts the bit or noise allocation, depending on the scheme used, to simultaneously satisfy the bit rate requirements and masking. These cycles consist of:

Internal cycle

The internal cycle performs non-uniform quantization according to the floating point system (each MDCT spectral value is raised to the 3/4 power). The cycle chooses a certain quantization interval and Huffman coding is applied to the quantized data in the next block. The cycle ends when the quantized values ​​that have been encoded with Huffman use less or equal number of bits than the maximum number of bits allowed. lokaS

External cycle

Now the external cycle is in charge of verifying if the scale factor for each subband has more distortion than allowed (noise in the encoded signal), comparing each band of the scale factor with the data previously calculated in the psychoacoustic analysis. The external cycle ends when one of the following conditions is met:

Neither scale factor band has much noise.
If the next iteration amplifies one of the bands more than is allowed.
All bands have been amplified at least once.
Bitstream packaging or formatter

This block takes the quantized samples from the filter bank, along with the bit / noise allocation data and stores the encoded audio and some additional data in the frames. Each frame contains information from 1152 audio samples and consists of a header, the audio data along with error checking by CRC and auxiliary data (the latter two optional). The header describes what layer, bit rate, and sample rate are being used for the encoded audio. Frames start with the same synchronization and differentiation header and their length may vary. In addition to dealing with this information, it also includes variable length Huffman encoding, an entropic encoding method that without loss of information eliminates redundancy. It acts at the end of compression to encode the information. Variable length methods are generally characterized by assigning short words to the most frequent events, leaving long words for the most infrequent.

Structure of an MP3 file

An Mp3 file is made up of different MP3 frames which in turn are made up of an Mp3 header and MP3 data. This data stream is called “elemental stream”. Each of the frames is independent, that is, a person can cut the frames of an MP3 file and then play them on any MP3 player on the market. The header consists of a sync word that is used to indicate the beginning of a valid frame. Following are a series of bits that indicate that the analyzed file is a Standard MPEG file and whether or not it uses layer 3.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Audio Formats: Everything musicians should know to choose the right file

What is the best audio format? It is a very frequent question. Surely you’ve already raised it.

The answer is simple. It all depends on your needs. Whether you’re sending demos, building your digital music distribution, or archiving your songs, the file format is very important.

So, to help you choose the best file format for your music, we have collected all the essential information about the audio formats.

And even more important, which one is better in each situation.

Compression: the first impression

Audio formats depend on compression.

I don’t mean the compression you apply to a song in your DAW software. I am talking about file compression.

Compression makes a file smaller, to save space when streaming, downloading or storing.

But what happens when you compress?

There are 3 types of file compression:

Uncompressed (I know that “uncompressed” is not a type of compression, but I add it to make everything clearer), without loss and loss.

Uncompressed and lossless files retain the original data intact. But files with loss delete certain data from the original file to reduce the file size.

So the more compressed a file is lost, the more information is lost.

Compressing with loss does not mean that all your drums are going to be deleted. It simply removes the audio that the human ear is not able to hear. Maybe only dogs notice the difference …?

In any case, if you really want to hear what disappears when you compress a file, watch this MP3 conversion experiment.

About compression types
Here is a simple way to understand each type of compression:

An uncompressed file is an exact copy of the original. No information is lost. Uncompressed files are like an original picture.
Lossless files are slightly smaller files, but they keep the original information intact. A lossless file is like an original painting, but it is folded in two until you look directly at it.
Files with loss are the most compressed. Some of the original information is lost during compression. Files with loss are smaller versions than the original — the photo is still there, but some details have disappeared.


formats_c

Now that you know what compression is, you may be wondering how each type of file is compressed.

Do not worry. Here we go.

How each type of file is compressed

Uncompressed Formats

Uncompressed formats are not compressed (obviously). The most common uncompressed formats are WAV and AIFF.

These are the formats that you usually export from your DAW. If you duplicate a song to WAV, it is an exact and uncompressed copy of the original.

Lossless Formats
LANDR: A space to create. More details
Lossless files are compressed. But although they are compressed, they retain all the original information as a WAV. They simply unfold at the time of opening.

The most common lossless format is FLAC. Apple also has its own lossless format, called ALAC, used in iTunes.

The FLAC format makes the files lighter than WAVs, but they retain all the original information. Although the size of these files is usually very large.

Formats with loss

Lossy files are the most common audio format. The most used is the MP3. But there are other types, such as OGG, WMA and AAC.

The drawback of files with lossy compression is that it deletes some data from the original file.

But the benefit is that they are smaller, open faster and take up less space.

Files with loss can be high and low resolution, depending on the amount of compression. The higher the quality, the less information will be lost.

The truth about bitrates

The quality of an audio file is determined by its bitrate (bit rate).

The bitrate corresponds to the information processed per second. And that is what 320 or 192 means in MP3 files.

Thus, an MP3 with a bitrate of 320 has 320 kilobits per second — or kbps.

WAV and AIFF usually have 1411kbps.

A higher bitrate means more information per second. And more information per second means better sound. Simple, right? Now you understand the basic points of compression, file types and bitrate, right?

Perfect. Let’s continue.

Now comes the million dollar question …

 

In what situation do I use each format?
If I talked about each of the audio formats, we would be here for days. Surely you have other responsibilities, and a lot of music to produce. So I will be brief and concise. These are the best uses for each of these formats. We talk about WAV, MP3 320 and MP · 192.

WAV
The WAV is at the top of the podium. It is the Ferrari of audio formats. The WAV offers a cleaner and sharper sound than the other compressed formats. If you share demos with a record label, show your work for a possible audiovisual project or send your music to a blog, you need a mastered WAV.

1512-38_mixtape-700×366

The WAV is a guarantee that your best sound represents you.

When you master your music, always use the WAV as the delivery format.

WAVs can also be converted to other formats later, so it is the right format for conversion later.

The only drawback of WAVs is the large size of the files. They take up a lot of hard disk space. So your computer, your phone, your iPod or your Dropbox will fill up very quickly if you only use WAV.

But when it comes to your own music, it is important to always have a WAV copy of each of your tracks.

Most platforms require WAV to upload your music for distribution. For example: iTunes and Amazon ask for high-quality WAV to upload music to their services.

The 320 MP3
The MP3 of 320 is the most frequent type of file. For one simple reason: It has the best of both worlds.

They are compressed, so they are easy to handle in regards to their size. But they also offer a pleasant and rich sound.

If you listen to music in streaming, it is very likely that it is 320. For example: everything you hear in high quality on Spotify is at 320kbps.

The MP3 of 320 is a good way to share your best sound saving space on the hard drive and avoiding long waits during the upload and upload.

MP3 192
The 192kbps MP3 is the draft horse. They are fast and dirty MP3, for when you have to share something easily and quickly. They are useful when transferring a handful of files at once, check your entire catalog or share and reference tracks quickly.

A lower bitrate causes more degradation than an MP3 of 320 with loss, but sometimes it is difficult to feel the difference. Take the test and judge for yourself.

The MP3s of 192 are the perfect tool for musicians who need an efficient and fast way to share or listen to their music in streaming.

Useful tip: if you use your own streaming player on your web page, an MP3 of 192 will make your page load faster.

Don’t forget any format by the way
Each format has its uses. Choosing the right format depends on each context.

So think about what sound you share and where you do it. Are you using the right format?

Mastering in WAV format is the best bet to share your music. Once you have the mastered WAV, you can convert it in any other format into a periquete.

Formats are important in the era of streaming. So make a smart choice and use the right format.