Mp3: Frequency band allocation in MP3 encoding

Free Download Mp4Gain

Frequency Band Allocation in MP3 Encoding

Let’s talk about frequency band allocation in MP3 encoding

When I first learned about frequency band allocation in MP3 encoding, it reminded me of organizing items in a suitcase. The suitcase is the MP3 file, and the items are the audio frequencies. Each item—or frequency—needs just the right space to ensure everything fits while keeping what’s essential. This is the magic behind MP3 encoding. It breaks audio into smaller chunks or frequency bands, prioritizing what the human ear can hear best and discarding the rest. This ensures the file size stays manageable while preserving quality.

The MP3 format utilizes psychoacoustic models to understand which frequencies are most important. High-priority bands hold rich, detailed sounds, while less critical bands—those our ears are less sensitive to—might be reduced or eliminated. It’s like deciding to pack a sweater over a scarf when you’re short on space. This concept fundamentally transforms how we store and share music.

Understanding frequency bands in audio compression

Frequency bands in audio compression are like compartments in a toolbox. Each one serves a specific purpose, organizing the sound spectrum into manageable chunks. Low frequencies, like bass, occupy one area, while mid and high frequencies, like vocals and cymbals, take other sections.

This segmentation allows MP3 encoders to apply different levels of compression to each band. For instance, low frequencies need more data for clarity because they carry much of the song’s energy. High frequencies, on the other hand, are often less noticeable to our ears and can handle more compression. The brilliance lies in tailoring the process for each band, maintaining a balance between quality and file size.

The psychoacoustic principle and its role

The psychoacoustic principle is the science behind why MP3s sound good despite compression. When I explain it, I think about sunglasses. Sunglasses filter out harsh light while letting in the parts that help you see clearly. Similarly, MP3 encoding filters out inaudible sounds while preserving those we notice most.

This principle is based on auditory masking, where louder sounds mask softer ones in similar frequencies. For example, a drumbeat can overpower a faint whisper in a recording. MP3 encoding uses this natural phenomenon to reduce file size by discarding sounds you wouldn’t hear anyway. It’s an elegant way of mimicking how our ears work.

How MP3 divides and processes frequency bands

MP3 encoding divides audio into 32 sub-bands using a filter bank, much like slicing a pizza into smaller pieces. Each slice— or sub-band—represents a portion of the audio spectrum. The encoder assigns bits to these slices based on their importance and complexity.

Critical bands, such as those carrying vocals or melody, receive more bits to preserve quality. Meanwhile, less significant bands, like subtle background noise, are given fewer bits. This division allows MP3s to shrink file sizes dramatically without losing the essence of the audio.

The importance of bit allocation per band

Bit allocation per band in MP3 encoding is like budgeting money. You spend more on essentials, like rent, and less on luxuries, like a fancy coffee. In MP3s, bits are currency, and they’re distributed across frequency bands based on priority.

When a band carries complex or prominent sounds, like a lead guitar riff, the encoder assigns more bits to capture its detail. Simpler or quieter bands get fewer bits, preserving overall quality while minimizing file size. This selective allocation ensures an efficient use of storage space.

Challenges with frequency band allocation

Frequency band allocation isn’t without its hurdles. One challenge is balancing compression and quality. Over-compression can make audio sound “tinny” or lose its depth. I’ve heard poorly encoded files where vocals sounded muffled, ruining the listening experience.

Another issue is compatibility. Not all playback devices process MP3s equally well. Older hardware might struggle with files that heavily compress certain frequency bands. This makes finding the right encoding balance vital for universal usability.

Advanced techniques to improve frequency band allocation

Advancements in MP3 encoding have introduced smarter ways to handle frequency bands. Dynamic bit allocation, for example, adjusts bit distribution in real-time based on audio complexity. It’s like turning up the AC in a car when driving through a hot desert—adaptive and efficient.

Another technique is joint stereo, which optimizes how stereo channels share data. Instead of encoding each channel separately, joint stereo focuses on shared information, saving bits without sacrificing quality. These innovations keep MP3s relevant even as audio technology evolves.

Frequency band allocation in modern MP3 encoding

Modern MP3 encoding leverages AI-driven algorithms to refine frequency band allocation. These algorithms analyze the audio content more accurately, predicting how listeners will perceive changes. I’ve noticed newer MP3s sounding much richer despite smaller file sizes, thanks to these advancements.

Additionally, encoders now focus more on preserving spatial cues. For example, they ensure that a listener can still distinguish instruments in a symphony, maintaining an immersive experience. This shift toward perceptual accuracy shows how far MP3 technology has come.

Latest words on frequency band allocation in MP3 encoding

Frequency band allocation in MP3 encoding is an intricate dance of science and art. By prioritizing the most critical sounds and optimizing bit distribution, MP3s achieve a balance between quality and file size. This process, rooted in psychoacoustics, has made MP3s a cornerstone of digital audio.

If you’re looking for a way to enhance your MP3 files, Mp4Gain offers tools to improve their sound quality. It’s an excellent choice for users who want more control over their audio files.

FAQ About frequency band allocation

What is frequency band allocation?

Frequency band allocation is the process of dividing an audio signal into distinct frequency ranges, optimizing how they’re encoded to preserve quality.

Why is frequency band allocation important in MP3 encoding?

It helps reduce file size by prioritizing important sounds and discarding inaudible ones, maintaining a balance between quality and compression.

How do psychoacoustics influence MP3 encoding?

Psychoacoustics determines how humans perceive sound, guiding MP3 encoding to focus on audible frequencies and mask others.

What are critical bands in MP3 encoding?

Critical bands are frequency ranges that our ears process similarly, helping encoders decide where to allocate bits most efficiently.

How does dynamic bit allocation work?

Dynamic bit allocation adjusts the number of bits assigned to frequency bands in real-time, depending on audio complexity.

What is joint stereo in MP3 encoding?

Joint stereo encodes shared audio data between channels, reducing file size while preserving stereo effects.

Can MP3 encoding handle spatial audio?

Modern MP3 encoders incorporate techniques to preserve spatial cues, ensuring an immersive listening experience.

How do modern MP3 encoders differ?

They use AI-driven algorithms for better frequency band allocation, improving quality without increasing file size.

What are the challenges of frequency band allocation?

Challenges include balancing compression and quality, ensuring compatibility with devices, and preserving auditory depth.

How does frequency band allocation improve MP3s?

It ensures the most important sounds are preserved, creating high-quality files that are compact and efficient.

Comments:

This was super helpful! I always wondered how MP3s manage to keep their quality while being so small.

Wow, learned so much. Could you go deeper into the role of AI in MP3 encoding? That part fascinated me!

I don’t know about anyone else, but my old MP3 files sound nothing like this description. Is there a way to fix them?

This makes it so much easier to understand. The comparison to packing a suitcase nailed it. Thanks a ton!

Great article. I still feel like some points about joint stereo could be clearer. Maybe add an example?

This article really explained things in a simple way. It’s exactly what I needed for my music project.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

Joint Stereo Encoding in MP3

Let’s talk about Joint Stereo Encoding in MP3

When we talk about MP3 encoding, joint stereo is one of the most fascinating and efficient techniques used to compress audio files. As someone who’s been working with audio compression for years, I can confidently say that joint stereo plays a pivotal role in optimizing sound quality while reducing file size. This is crucial, especially when you’re dealing with a large collection of music or audio files on your device. For example, think about the way your smartphone stores your favorite playlists. Without joint stereo encoding, those files would take up more space without offering any noticeable improvement in quality.

In essence, joint stereo is a method where the stereo channels (left and right) in a song are not treated as entirely separate entities but are combined in such a way that only the differences between the two are stored. This is like packing the same amount of information into a smaller suitcase without losing any of the essential items. Joint stereo encoding does this by reducing redundancy between the left and right channels, resulting in smaller files with nearly identical sound quality.

It’s important to note that joint stereo encoding is not the same as regular stereo. While regular stereo encoding treats each channel independently, joint stereo takes advantage of the similarities between the two channels to save space. The result is a more efficient encoding process that doesn’t compromise the listener’s experience.

The Mechanics of Joint Stereo Encoding

When we dive deeper into how joint stereo encoding works, it helps to visualize how stereo sound is created. Typically, stereo sound involves two channels: one for the left ear and one for the right ear. However, in many audio tracks, the left and right channels are not radically different from each other. They may have similar instruments, vocals, or background sounds.

What joint stereo encoding does is compare these two channels and only store the parts that differ between them. For the common parts, the encoder only needs to store the data once. This is similar to how two almost identical pictures could be compressed by saving just one of them and recording only the differences for the second one. The result? A significant reduction in file size without a noticeable drop in audio quality.

The Process of Joint Stereo Encoding

The encoder analyzes both channels to find similarities and differences.
Similar parts of the channels are encoded as a single signal.
The differences between the channels are encoded separately, reducing the file size.
When decoding, the differences are applied to the common signal, restoring the stereo effect.

By compressing the audio this way, joint stereo encoding ensures that the stereo effect is preserved while minimizing the data needed for storage. This is a significant advantage when you’re trying to fit hundreds or even thousands of songs on a portable device with limited storage capacity.

Types of Joint Stereo Encoding: Mid/Side and Intensity Stereo

There are different types of joint stereo encoding methods that are used depending on the audio track and desired compression level. The two primary types you’ll encounter are Mid/Side (M/S) stereo and Intensity stereo. Both methods offer unique advantages, and understanding these differences is key to choosing the right encoding approach.

Mid/Side Stereo

In Mid/Side stereo encoding, the audio is split into two components: the “mid” (center) and the “side” (difference between left and right).
The “mid” signal contains information that is common between the left and right channels, while the “side” signal holds the differences.
This technique is effective for music that has a strong center sound, like vocals or bass, while allowing the side information to be compressed efficiently.

In my experience, Mid/Side stereo is particularly useful for music with a lot of central elements, like pop or rock tracks where vocals are mixed at the center. By compressing the side channels, the file size shrinks while maintaining clarity in the center of the mix.

Intensity Stereo

Intensity stereo encoding focuses on adjusting the volume of the stereo channels based on the perceived loudness of sounds.
It reduces the stereo effect for quiet sounds and increases it for louder sounds.
This method can save space without compromising the quality of louder parts of the track.

For instance, if you have a song where the guitar solo is prominent, intensity stereo encoding may maintain a full stereo effect for the solo, but reduce the stereo spread during quieter passages, like a soft vocal section. This type of encoding is particularly effective for genres like classical or ambient music, where the dynamic range varies widely throughout the track.

The Advantages of Joint Stereo Encoding

When it comes to audio compression, joint stereo encoding provides several key benefits. I’ve seen firsthand how it allows for more efficient storage without sacrificing the quality that listeners expect from high-quality MP3 files.

Efficient Use of Storage

Joint stereo encoding reduces file size significantly by exploiting redundancies between the two channels.
This is especially beneficial for users with limited storage space, such as on smartphones or portable music players.
Even when file size is reduced, the audio quality remains almost identical to that of traditional stereo encoding.

For example, when I compress a collection of high-quality MP3s for a long road trip, I rely heavily on joint stereo encoding to maximize my storage space. With joint stereo, I’m able to fit hundreds of tracks on my device without having to worry about sound quality degradation.

Sound Quality Preservation

Joint stereo encoding preserves the overall sound quality by focusing on the differences between the stereo channels.
In contrast to mono encoding, joint stereo ensures that listeners still experience a rich, dynamic soundstage.
Most importantly, the compression doesn’t affect the stereo effect that’s essential to enjoying a full, immersive listening experience.

As someone who frequently listens to music on headphones, the stereo effect is crucial to me. I find that even with joint stereo encoding, the balance between left and right channels remains intact, providing an enjoyable experience. It’s remarkable how the technology allows for compression without affecting the auditory experience.

Considerations for Using Joint Stereo Encoding

While joint stereo encoding offers clear benefits, it’s not always the best option for every type of audio. In some situations, particularly with high-fidelity audio or tracks that require precise stereo separation, other encoding methods might be preferable.

High-Fidelity Audio

For audiophiles or those with high-end audio equipment, joint stereo encoding may not always be sufficient.
The reduced separation between left and right channels can result in a less distinct stereo image.
In such cases, lossless encoding or regular stereo encoding might be more suitable to maintain optimal sound quality.

For example, when I listen to classical music or jazz with a wide stereo image, I often opt for uncompressed or higher bit-rate stereo encoding to preserve the detailed spatial arrangement of instruments. Joint stereo, while efficient, may compromise some of the subtle nuances in these genres.

Low-Bitrate Audio

At lower bitrates, joint stereo encoding can still provide excellent results in terms of file size reduction without a major loss in quality.
However, the compression artifacts may become more noticeable at bitrates lower than 128 kbps.
In these situations, a higher bitrate or alternative encoding techniques may be needed to preserve audio fidelity.

If you’re encoding audio for streaming or casual listening, lower bitrates with joint stereo encoding might be a good balance. But when I’m encoding for professional use or high-quality playback, I prefer to use higher bitrates to ensure that the audio remains as close to the original as possible.

Latest Words on Joint Stereo Encoding in MP3

Joint stereo encoding has transformed the way we experience and store audio, offering a balance between quality and compression. Whether you’re a casual listener, a music enthusiast, or a professional audio engineer, understanding the benefits and limitations of joint stereo encoding is crucial for making informed decisions about how you encode and manage your audio files.

With its ability to optimize space and preserve sound quality, joint stereo encoding is one of the most valuable tools in audio compression. As I’ve demonstrated in this article, it’s an essential technique for anyone looking to maximize storage and maintain an excellent listening experience, especially for music that doesn’t rely heavily on complex stereo separation.

While it’s not a one-size-fits-all solution, joint stereo encoding offers significant advantages in most scenarios, particularly for everyday music listening. However, for those with more specialized needs, other encoding methods may be worth exploring. In all cases, it’s important to consider your specific requirements and select the encoding technique that best meets them.

When it comes to MP3 encoding, joint stereo is one of the most effective ways to achieve high-quality audio at a smaller file size, and it remains a staple of audio compression today.

Frequently Asked Questions about Joint Stereo Encoding in MP3

What is Joint Stereo Encoding in MP3?

Joint stereo encoding in MP3 is a compression technique that reduces file size while preserving sound quality. It works by encoding the similarities between the left and right audio channels as a single signal, while only storing the differences separately. This method allows for more efficient use of space without sacrificing the stereo effect, making it ideal for music and audio tracks with similar left and right channels.

How does Joint Stereo Encoding work?

Joint stereo encoding works by analyzing both the left and right channels of audio to identify the parts that are similar. The encoder then stores the common information only once, and the differences between the two channels are encoded separately. When decoding, the differences are applied to the common signal, restoring the full stereo effect for the listener.

What are the different types of Joint Stereo Encoding?

There are two main types of joint stereo encoding: Mid/Side stereo and Intensity stereo. In Mid/Side encoding, the audio is split into a central “mid” signal and a “side” signal that carries the differences between the left and right channels. Intensity stereo adjusts the stereo effect based on the perceived loudness of the audio, reducing the stereo separation for quieter sounds and enhancing it for louder ones.

What are the advantages of using Joint Stereo Encoding?

Joint stereo encoding offers several benefits, including reduced file sizes while maintaining high audio quality. It is especially useful for portable devices with limited storage, as it maximizes space without sacrificing the stereo effect. Joint stereo ensures that audio files retain their immersive listening experience, even at lower bitrates.

Can Joint Stereo Encoding affect audio quality?

At most bitrates, joint stereo encoding does not significantly affect audio quality. However, at lower bitrates, compression artifacts may become noticeable, especially in tracks with complex stereo separation. For high-fidelity audio or genres requiring precise stereo positioning, lossless encoding or standard stereo encoding might be a better option.

Is Joint Stereo Encoding suitable for all types of music?

Joint stereo encoding is highly effective for most types of music, especially tracks where the left and right channels share significant similarities, such as pop, rock, and electronic music. However, for genres like classical or ambient music, where a wide stereo image is essential, other encoding methods or higher bitrates might be preferable to preserve the full stereo effect.

What is the best bitrate for Joint Stereo Encoding?

For most listeners, a bitrate of 128 kbps to 192 kbps is sufficient when using joint stereo encoding. At these bitrates, the file sizes are reduced significantly, while the sound quality remains good. For higher-quality audio, especially in genres where detailed stereo separation is important, higher bitrates such as 256 kbps or 320 kbps are recommended.

How does Joint Stereo Encoding compare to Mono or Stereo Encoding?

Mono encoding combines the left and right channels into a single channel, drastically reducing file size but at the cost of losing the stereo effect. Regular stereo encoding treats both channels independently, resulting in larger file sizes compared to joint stereo. Joint stereo encoding strikes a balance, maintaining a full stereo experience while reducing file size by exploiting the similarities between the two channels.

Comments:

This article really opened my eyes to how joint stereo encoding works. I’ve been using MP3s for years, but I never really understood the technical side of it. Thanks for explaining everything so clearly! – Mike R.

I had no idea about Mid/Side stereo until I read this! It sounds like a great way to compress audio without losing quality. I might try it next time I’m encoding music. – Sarah J.

It’s amazing how joint stereo can save so much space without compromising sound quality. I’ve always used stereo encoding, but now I’m going to give joint stereo a try. – Tom H.

I’ve always wondered why MP3 files are smaller but still sound good. This article explained it perfectly. – Dave L.

I’ve used joint stereo for a while now, but I didn’t realize how much it can impact sound quality at lower bitrates. This article definitely helped me understand it better. – Emily G.

I’ve been encoding a lot of audio for a podcast, and the tips on joint stereo were super helpful. I’m going to implement this on my next set of files. – John K.

Interesting read! I didn’t know that joint stereo could be problematic for audiophiles. I’m going to keep that in mind when working with high-quality audio. – Chris M.

This is one of the most detailed explanations of joint stereo I’ve read. Very helpful! – Jenna T.

Thanks for the insights! I’ve always been curious about how compression works, and now I understand joint stereo much better. – Mark F.

I never realized that the differences between the left and right channels could be compressed so efficiently. I’ll have to try joint stereo next time I encode something. – Alex B.

I appreciate the real-life examples you used. They made the technical details so much easier to understand. – Rick D.

I’ve been having issues with audio quality at low bitrates. This article really helped explain why that happens and how joint stereo can help. – Steve A.

I was always confused about the difference between stereo and joint stereo. This article cleared things up! – Olivia P.

Great breakdown of the different joint stereo types! I’m definitely going to experiment with Mid/Side encoding next time. – Greg W.

mp3 audio format, the most popular

With the rapid development of file compression technology, MP3 has become the most popular music format today.

mp3 audio format, the most popular

MP3 File Format Analysis MP3 file data is made up of multiple frames, and the frame is the smallest unit of the MP3 file. Each frame consists of a frame header, additional information, and sound data. The playback time of each frame is 0.026 seconds, and its duration varies with the bit rate. Some MP3 files have extra bytes at the end to store description information for non-audio data. The structure of the MP3 file is shown in Figure 2. 3.1 Frame header format The frame header is 4 bytes long. For fixed bitrate MP3 files, the frame header format of all frames is the same. The data structure is as follows: typedef FrameHeader{ unsigned int sync:11;//Sync information unsigned int version:2 ;//version unsigned int layer:2;//layer unsigned int protection:1;//CRC check unsigned int bitrate:4;//unsigned bitrate int frequency:2;//unsigned frequency int padding:1;//unsigned frame length setting int private:1;//unsigned reserved word int mode:2; //unsigned channel mode int mode extension:2;//unsigned extended mode int copyright:1;//unsigned copyright int original:1 ;//unsigned original logo int emphasis:2;//emphasis mode }HEADER, *LPHEADER; See Table 1 for a description of the 4 byte frame header. Table 1 Explanation of the use of MP3 frame header bytes Name Length (bits) Description Synchronization information 11 All bits in the 1st and 2nd byte are 1, and the 1st byte is always FF. Version 200-MPEG 2. 5 01-undefined 10-MPEG 2 11-MPEG 1 layer 2 00-undefined 01-Layer 3 10-Layer 2 11-Layer 1 CRC check 1 0-check 1-no check Bit rate 4 The third bit Tuple sampling rate, the unit is kbps, such as MPEG-1 Layer 3, 64 kbps, the value is 0101. Frequency 2 Sampling frequency, for MPEG-1: 00-44.1 kHz 01-48 kHz 10 -32 kHz 11-setting frame length undefined 1 is used to set the length of the file header, 0-no setting, 1-setting, the specific setting calculation method see below. Reserved word 1 is not used. Channel Mode 2 The fourth byte indicates the channel, 00-Stereo 01-Joint Stereo 10-Dual Channel 11-Mono Expansion Mode 2 Only used when the channel mode is 01. Copyright 1 Whether the file is legal or not, 0-Illegal 1-Original logo legal 1 If original, 0-Not original 1-Original emphasis method 2 Used for classification of sound compensation after noise reduction and compression, which is rarely used and is it may not work in the future. 00-Undefined 01-50/15ms 10-Reserved 11-CCITT J.17 MP3 frame length depends on bit rate and frequency, the calculation formula is: frame length = 144×bit rate∕ frequency+padding For example: bit rate is 64kbps, frequency is 44.1kHz, when padding is 1, frame length is 210 bytes. After the table header there is additional information of variable length. For standard MP3 files, their length is 32 bytes, followed by compressed audio data, which will be decoded when the decoder reads here. For Constant Bit Rate (CBR) MP3 files, not all frames are the same length, and some frames may be one or more bytes longer. There is also Variable Bitrate (VBR) MP3, to minimize the length of MP3 file and ensure sound quality, compared to CBR file, except for the first frame, the rest is the same. The first frame of VBR does not contain audio data and its length is 156 bytes, which is used to store information such as standard audio frame header (4 bytes), VBR file identifier, frame number, number file byte, etc. See table 2 for the description of the structure. Table 2 Description of the first byte of the frame structure of the VBR 1-4 file The same standard sound frame header as CBR 5-40 Store the logo of the VBR file “Xing” (58 69 6E 67), the specific position of this logo depends on the adopted MPEG standard and the sound depends on the channel mode.

mp3 audio format, the most popular

With the rapid development of file compression technology, MP3 has become the most popular music format today.

The encoder transforms the original sound into the frequency domain through a hybrid filter bank. Using a psychoacoustic model, it is estimated that it may be sufficient to be The perceived noise level is then quantized and converted to Huffman coding to form an MP3 bitstream. The decoder is much simpler and its task is to extract the sound signal from the encoded spectral line components through inverse quantization and inverse transformation.
2.4 Modified Discrete Cosine Transform Modified Discrete Cosine Transform (MDCT) refers to converting a set of time-domain data to frequency-domain data for time-domain variation. MDCT is an enhancement of the DCT algorithm. The first fast algorithm is the Fast Fourier Transform (FFT), but FFT has operations on complex numbers and MDCT are all operations on real numbers, which is convenient for programming. When compressing audio data, first divide the original audio data into fixed blocks, and then perform forward MDCT (Forward MDCT) to convert the value of each block into MDCT 512 coefficients. When decompressing, the reverse MDCT (Reverse MDCT) The 512 coefficients are restored to the original sound data, and the original sound data before and after are inconsistent, because redundant and irrelevant data are removed during the compression process. The FMDCT transformation formula is: k=0, 1,…, N/2-1 where N is the length of the transformation window, that is, the number of sample points per block, N=8, 16 ,… ., 1024, 2048. n0=(N/2+1)/2, X(n) is the value in the time domain, X(k) is the value in the frequency domain. If N takes 1024 points, it will become 512 frequency domain values. The IMDCT transformation formula is: 4 Modified Discrete Cosine Transform Modified Discrete Cosine Transform (MDCT) refers to converting a set of time-domain data to frequency-domain data to learn the changes in the domain. weather. MDCT is an enhancement of the DCT algorithm. The first fast algorithm is the Fast Fourier Transform (FFT), but FFT has operations on complex numbers and MDCT are all operations on real numbers, which is convenient for programming. When compressing audio data, first divide the original audio data into fixed blocks, and then perform forward MDCT (Forward MDCT) to convert the value of each block into MDCT 512 coefficients. When decompressing, the reverse MDCT (Reverse MDCT) The 512 coefficients are restored to the original sound data, and the original sound data before and after are inconsistent, because redundant and irrelevant data are removed during the compression process. The FMDCT transformation formula is: k=0, 1,…, N/2-1 where N is the length of the transformation window, that is, the number of sample points per block, N=8, 16 ,… ., 1024, 2048. n0=(N/2+1)/2, X(n) is the value in the time domain, X(k) is the value in the frequency domain. If N takes 1024 points, it will become 512 frequency domain values. The IMDCT transformation formula is: 4 Modified Discrete Cosine Transform Modified Discrete Cosine Transform (MDCT) refers to converting a set of time-domain data to frequency-domain data to learn the changes in the domain. weather. MDCT is an enhancement of the DCT algorithm. The first fast algorithm is the Fast Fourier Transform (FFT), but FFT has operations on complex numbers and MDCT are all operations on real numbers, which is convenient for programming. When compressing audio data, first divide the original audio data into fixed blocks, and then perform forward MDCT (Forward MDCT) to convert the value of each block into MDCT 512 coefficients. When decompressing, the reverse MDCT (Reverse MDCT) The 512 coefficients are restored to the original sound data, and the original sound data before and after are inconsistent, because redundant and irrelevant data are removed during the compression process. The FMDCT transformation formula is: k=0, 1,…, N/2-1 where N is the length of the transformation window, that is, the number of sample points per block, N=8, 16 ,… ., 1024, 2048. n0=(N/2+1)/2, X(n) is the value in the time domain, X(k) is the value in the frequency domain.

mp3 audio format, the most popular

With the rapid development of file compression technology, MP3 has become the most popular music format today.

High-quality music quickly spreads to all parts of the world with the arrangement of 0 and 1, shaking people’s hearts. What is MP3? The full name of MP3 is MPEG Audio Layer 3. It is an efficient computer audio coding scheme. It converts audio files into smaller files with .MP3 extension with a higher compression ratio and basically maintains the sound quality of the file. original. MP3 is part of the ISO/MPEG standard. The ISO/MPEG standard describes audio compression using a high-performance perceptual coding scheme. This standard has been continuously updated to meet the pursuit of “high quality and small quantity”, and now has formed MPEG Layer 1, Layer 2. Layer 3 three audio encoding and decoding schemes. The compression rate of MPEG Layer 3 can reach from 1:10 to 1:12. A 1M MP3 file can be played for 1 minute, while a 1 minute CD-quality WAV file (44100Hz, 16bit, 2ch, 60sec) occupies 10M of space, so Calculated, the time The playback time of a 650M MP3 disc should be more than 10 hours, while the playback time of a CD with the same capacity is about 70 minutes. The advantages of MP3 are unmatched by CD. 2 Analysis of the principle of MP3 2.1 MPEG audio standard MPEG (Moving Picture Experts Group) is a moving picture expert group under ISO, and the MPEG standard formulated by it is widely used in various multimedia. MPEG standards include video and audio standards, among which MPEG-1, MPEG-2, MPEG-2 AAC, and MPEG-4 audio standards have been developed. The MPEG-1 and MPEG-2 standards use the same family of audio codecs: Layer 1, 2 and 3. A new feature of MPEG-2 is the use of low sample rate expansion kits to reduce data traffic , and another feature is the multi-channel expansion kit, which increases the number of main channels to five. Fraunhofer IIS and AT&T released the MPEG-2 AAC (MPEG-2 Advanced Audio Coding) standard in 1997 to significantly reduce data traffic. The MDCT (Modified Discrete Cosine Transform) algorithm adopted by MPEG-2 AAC, The sampling frequency can be between 8 KHz and 96 KHz, and the number of channels can be between 1 and 48. MPEG Audio Layer 1, 2 and 3 use the same filter bank, bitstream structure, and header information, and the sample rate is either 32 KHz, 44.1 KHz, or 48 KHz. Layer 1 is designed for DCC (digital compact cassette) digital compression tape, the data rate is 384 kbps, and layer 2 has made a compromise between complexity and performance, and the data rate has been reduced to 256 kbps- 192kbps. Layer 3 was designed for low data rate from the beginning, and the data rate is 128Kbps-112Kbps. Layer 3 adds MDCT transform, which makes its frequency resolution 18 times higher than that of Layer 2. Layer 3 also uses information averaging similar to MPEG video entropy coding to reduce redundant information. The vast majority of MP3 uses the MPEG-1 standard. 2.2 The purpose of audio compression The MP3 format began in the mid-1980s, and the Fraunhofer Institute in Erlangen, Germany, was committed to high-quality, low-data-rate audio coding. Let’s look at an example: You want to sample a song you like that is about 4 minutes long, store it on a disc, and sample it in CD-quality WAV format at a sample rate of 44.1 kHz, which means receiving 44100 per second. , stereo, each sample data is 16 bits (2 bytes), so the space occupied by this song is: 44100×2 channels x2 bytes x60 seconds x4 minutes=40.4MB If you download this song from the Internet, assume the transmission speed is of 56kbps, the download time is: 40.4x106x8/56x103x60=96 minutes. Even a 1M broadband network takes more than 5 minutes. It can be seen that audio compression is especially important to reduce the storage space of audio data. 2.3 MP3 encoding and decoding MP3 audio compression involves encoding and decoding in two parts. Encoding is turning the data in a WAV file into a highly compressed bitstream, and decoding is taking the bitstream and reconstructing it into a WAV file. MP3 uses a distortion algorithm called Perceptual Audio Coding. The frequency range of sound perceived by the human ear is from 20 Hz to 20 kHz. MP3 cuts out a lot of redundant and irrelevant signals.

MP3 encoder

1. MP3 Encoder FAQ

: what is an MP3 encoder?
An MP3 encoder is a piece of software that uses the MP3 codec algorithm (compression/decompression) to create mp3 files. Most encoders only convert
a WAV file to an MP3 file, although many can convert other formats such as WMA, Real Audio, Ogg, etc.

There are only a few standalone encoders, and a lot of software also only uses 4 main encoding engines, largely due to
to Fraunhofer Gesellschaft patents and various companies helping with ISO sources. Although no company owns the license, the
Developers must pay expensive license fees no matter what proprietary MP3 encoder they use. Major MP3 encoding engines include: LAME (
non-ISO source), BladeEnc, Fraunhofer, and Real Networks’ Xing encoder.

– How does the MP3 encoder work?
The core technology under MPEG-Layer 3 is included in the MP3 encoder. The decoding process uses a series of algorithms and rules to compress audio.
The encoder also detect sounds that occur at the same time
and they try to rule out any that might be “masked” or “inaudible” by other sounds.

– What is a good MP3 encoder?
Xing is the fastest encoder in terms of speed, but the worst in quality. For smaller file sizes, Fraunhofer FastEnc
offers the best quality. LAME is a very good encoder, and one version is faster than the previous one, BladeEnc
it is the best quality for large files, but very slow.

2. Dissection of MP3 files
In addition to proficiency in using the basic features of the MP3 encoder, ordinary users do not need to know how the internal structure of the MP3 file is encoded, just like the situation when
face JPEG or DOC files. Out of morbid curiosity, here’s an X-ray view of an MP3 file:

– Box header
As mentioned above, MP3 files are made up of thousands of “frame frames”, each frame containing a part (second part) of valuable audio data.
for the decoder to reconstruct the audio data. The first part above is the box header. (Frame Header), which consists of 32-bit metadata related to the
later data, see the figure below. The MP3 header begins with an 11-bit “sync timing” block, which allows the player to seek and lock the first
legal framework available, which is useful in MP3 streaming, which can quickly move or jump ID3 from the playback source block to a normal one.
position . However, simply detecting synchronized blocks is theoretically not enough, so it is necessary to check the header.

– transmission lock
MP3 was originally designed for broadcast, and as a result it became important that the MP3 receiver could be synchronized with the signal at any part of the broadcast,
so the frame header is placed at the beginning of any frame transmission, so when an MP3 receiver “tunes” to a data stream, it picks up the
signal instantly and you can play it immediately. Interestingly, this fact makes it possible to cut MPEG files into small segments, each of which can be played independently. But unfortunately
not possible in 3-layer (MP3) files, where frames often depend on other frames, so you can’t just
Edit .

– Frames per second
Just as the movie industry has a standard for the number of frames per second in film to ensure proper viewing on any projector,
A similar standard is used in the MP3 standard, regardless of the file’s bitrate, MPEG-1 A frame in the file is 26 ms, approximately 38 fps frames per second. If the bit rate
is , the frame size is correspondingly larger, and vice versa. Also, the number of samples contained in an MP3 frame is constant, 1152 samples per frame.

The total size of any given frame can be calculated with the following formula:

FrameSize = 144 * BitRate / (SampleRate + Padding).

Mp3 (an audio encoding method) Part 3

MP3 ENCODING

To generate bit-compliant (Layer 1.Layer 2.Layer 3) MPEGAudio files, ISO MPEG Audio committee members developed reference simulation software in C called ISO 11172-5.

MP3 ENCODING

It can demonstrate the first real-time DSP-based hardware decoding of compressed audio on some non-real-time operating systems. Various other MPEG audio was developed in real time for digital broadcasting (DAB radio and DVB TV) for consumer receivers and set-top boxes.
Later on July 7, 1994, Fraunhofer-Gesellschaft released the first MP3 encoder called l3enc.
The Fraunhofer development team selected the .mp3 extension on July 14, 1995 (previously the extension was .bit). Using Winplay3 (released September 9, 1995), the first real-time software MP3 player, many people were able to encode and play MP3 files on their own personal computers. Since hard drives at the time were relatively small (such as 500MB), this technology was essential for storing entertainment music on computers.
MP2, MP3 and Internet
In October 1993, MP2 (MPEG-1 Audio Layer 2) files appeared on the Internet and were often played by Xing MPEG Audio Player and later MAPlay developed by Tobias Bading for Unix. MAPplay was first released on February 22, 1994 and ported to the Microsoft Windows platform.
The only MP2 encoder products at first were Xing Encoder and CDDA2WAV, a CD ripper that converts audio tracks from CDs to WAV format.
Often considered the father of the online music revolution, the Internet Underground Music Archive (IUMA) was the first hi-fi music site on the Internet, with thousands of licensed MP2 recordings before MP3 and the web became popular. .
From the first half of 1995 to the end of the 1990s, MP3 began to flourish on the Internet. MP3’s popularity is largely due to the success of companies and software packages such as Winamp released by Nullsoft in 1997 and Napster released by Napster in 1999, and they are mutually reinforcing. These programs make it easy for normal users to play, create, share and collect MP3 files.
The debate about sharing MP3 files between peers has spread rapidly in recent years, mainly because compression makes file sharing possible, uncompressed files are too large to share. Since MP3 files are widely spread over the Internet, Napster has been sued by some of the major record labels to protect their copyright (see Copyright).
Commercial online music distribution services, such as the iTunes Music Store, often choose other proprietary or DRM-enabled music file formats to control and limit the use of digital music. Formats that support DRM are used to protect copyrighted material from copyright infringement, but most protection mechanisms can be broken in some way. Computer experts can use these methods to generate unlocked files that can be freely copied. One notable exception is Microsoft’s Windows Media Audio 10 format, which has yet to be cracked. If a compressed audio file is desired, the recorded audio stream must be compressed and the sound quality will be degraded.
streaming audio quality
Because MP3 is a lossy compression format, it offers a variety of options for different “bit rates,” that is, the number of encoded data bits needed to represent the audio per second. Typical speeds are between 128 kbps and 320 kbps (kbit/s). In contrast, the uncompressed audio bitrate on a CD is 1411.2 kbps (16 bits/sample × 44100 samples/sec × 2 channels).
MP3 files encoded with lower bit rates generally play at a lower quality. If you use too low a bitrate, “compression artifact” (sounds not present in the original recording) will appear during playback. A good example of compression noise is the sound of compressed cheering; due to its randomness and sharp changes, encoder errors are more pronounced and sound like echoes.

Mp3 (an audio encoding method) Part 2

mp3 3ncoding

MPEG-1 Audio Layer 2 encoding began as a digital audio broadcast (DAB) managed by Egon Meier-Engelen at the German Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt (later known as Deutsches Zentrum für Luft- und Raumfahrt, German Space Center). )draft.

mp3 encoding

This project is funded by the European Union as a EUREKA research project, and its name is commonly known as EU-147. The study period for EU-147 was from 1987 to 1994.
2. By 1991, two proposals had emerged: Musicam (called Layer 2) and ASPEC (Adaptive Spectrum Sensing Entropy Coding). The Musicam method proposed by Philips of the Netherlands, CCETT of France, and the Institut für Rundfunktechnik of Germany was chosen due to its simplicity, error robustness, and lower computational effort in high-quality compression. The Musicam format based on subband coding is a key factor in determining the MPEG audio compression format (sample rate, frame structure, header, sample points per frame). This technology and its design philosophy are fully integrated into the definition of ISO MPEG Audio Layer I, II and later Layer III (MP3) formats. The standard was developed by Leon van de Kerkhof (Layer I) and Gerhard Stoll (Layer II) under the auspices of Prof. Mussmann (University of Hannover).
3. A working group consisting of Leon Van de Kerkhof from the Netherlands, Gerhard Stoll from Germany, Yves-François Dehery from France and Karlheinz Brandenburg from Germany absorbed design ideas from Musicam and ASPEC and added their own design ideas to develop an MP3. MP3 can achieve MP2 sound quality from 192 kbit/s to 128 kbit/s.
4. All of these algorithms eventually became part of the first group of MPEG standards, MPEG-1, in 1992, resulting in the international standard ISO/IEC 11172-3 published in 1993. Further work on MPEG audio was eventually became part of the MPEG-2 standard, a second group of MPEG standards developed in 1994, officially known as ISO/IEC 13818-3, first published in 1995.
5. The compression efficiency of the encoder is generally defined by the bit rate, because the compression rate depends on the number of bits (: in: bit depth) and the sampling rate of the input signal. However, there are often products that use CD parameters (44.1 kHz, two channels, 16 bits per channel, or 2×16 bits) as the compression ratio reference, and the compression ratio using this reference is usually higher, which which also shows that the compression ratio is very important for lossy compression problems.
6. Karlheinz Brandenburg used Suzanne Vega’s song Tom’s Diner on CD to test MP3 compression algorithms. This song is used because the song’s smooth and simple melody makes it easier to hear glitches in the compressed format during playback. Some jokingly refer to Suzanne Vega as “the mother of MP3”. Some more serious and critical audio extracts (glockenspiel, triangle, accordion…) from the EBU V3/SQAM reference CD are used by professional audio engineers to assess the subjective perceived quality of the MPEG audio format.

Mp3 (an audio encoding method)

Mp3 encxoding

MP3 is an audio compression technology, its full name is Moving Picture Experts Group Audio Layer III, called MP3.

mp3 encoding

It is designed to drastically reduce the amount of audio data. Using MPEG Audio Layer 3 technology, music is compressed into a smaller capacity file with a compression ratio of 1:10 or even 1:12, and for most users, playback quality is not as good as the original uncompressed. audio Significant decrease. It was invented and standardized in 1991 by a group of engineers at the Fraunhofer-Gesellschaft research organization in Erlangen, Germany. Music stored in the form of MP3 is called MP3 music, and a machine that can play MP3 music is called an MP3 player.

Motion Picture Expert Compression Standard Audio Layer 3 foreign name Moving Picture Expert Group Audio Layer III research organization Fraunhofer-Gesellschaft type audio coding advantage Drastically reduce the amount of audio data defect sound quality loss
content
1 Features
2 story
▪ origin
▪ go to the masses
3 audio quality
4 patent issues
transmission characteristics
MP3 converts the time-domain waveform signal to a frequency-domain signal by taking advantage of the human ear’s insensitivity to high-frequency sound signals and splits it into multiple frequency bands, using different compression rates. for different frequency bands and increasing the compression ratio for high frequencies (even ignoring the signal) Use a small compression ratio for low frequency signals to ensure that the signal is not distorted. In this way, it is equivalent to discarding the high-frequency sound that is basically inaudible to the human ear [1], keeping only the audible low-frequency part, thus compressing the sound with a compression ratio of 1:10 or even 1: 12. Because the full name of this compression method is called MPEG Audio Player3, people call it MP3 for short.
According to the MPEG specification, AAC (Advanced Audio Coding) in MPEG-4 will be the next generation of the MP3 format.
Compared to CD, FLAC and APE lossless compression formats, the sound quality of the highest parameter MP3 (320 Kbps) is not much different.
MP3 players are dying
When they first came out, MP3 players were at the forefront of the digital revolution. However, sales of iPods and other MP3 players in the UK fell sharply in 2012 as consumers turned to other digital products such as smartphones.
In 2012, sales of MP3 players in the UK market were £110m ($178m), just 29% of the £381m in 2011, according to market research firm Mintel. Mintel expects total MP3 player sales in the UK market to halve by 2017. In the worst case scenario, total MP3 player sales in the UK market will be just 25 million dollars five years later. [23]
1. MP3 is a data compression format;
2. Discards pulse code modulation (PCM) audio data that is not important to the human ear (similar to JPEG is a lossy image compression), resulting in a much smaller file size;
3. MP3 audio can be compressed according to different bit rates, providing a variety of trade-offs between data size and sound quality. The MP3 format uses a mixed conversion mechanism to convert audio domain signals. time in frequency domain signals;
4. 32 band polyphase integral filter (PQF);
Modified discrete cosine filter (MDCT) of 5, 36 or 12 taps; each subband size can be independently selected between 0…1 and 2…31;
6. MP3 not only has extensive client software support, but also has a lot of hardware support, such as portable media players (referring to MP3 players), DVD and CD players, outgoing calls

Encoding an mp3

Encoding an mp3

encoding mp3

What is masking

mp3 encoding

The lossy MP3 audio compression algorithm uses a limitation of human hearing perception called auditory masking. In 1894, the American physicist Alfred M. Mayer reported that a tone could be made inaudible by another tone of a lower frequency. In 1959, Richard Amer described a complete set of auditory curves related to this phenomenon. Between 1967 and 1974, Eberhard Zwicker worked on tuning and masking critical frequency bands, which in turn built on the fundamental research of Harvey Fletcher and his collaborators at Bell Labs in this area. Perceptual coding was first used to compress speech coding with Linear Prediction Coding (LPC), which has its origins in the works Fuminada Itakura (Nagoya University) and Shuji Saito (from Nippon Telegraph and Telephone) in 1966. In 1978, Bishnu S. Atal and Manfred R. Schroeder of Bell Labs proposed an LPC speech codec called adaptive predictive coding. , which used a psychoacoustic coding algorithm using the masking properties of the human ear. Schroeder and Atal’s further optimization with J.L. Hall was later described in a 1979 article. In the same year M.A. Krasner proposed a psychoacoustic masking codec, which published and produced hardware for speech (not used to compress musical bits), but the publication of its results in a relatively obscure technical report from the Lincoln Laboratory did not immediately influence the mainstream of the development of psychoacoustic codecs. The Discrete Cosine Transform (DCT), a type of transform coding for lossy compression, proposed by Nasir Ahmed in 1972, was developed by Ahmed with T. Natarajan and KR Rao in 1973; published their results in 1974. This led to the development of the Modified Discrete Cosine Transform (MDCT) proposed by JP Princen, AW Johnson, and AB Bradley in 1987 after earlier work by Princen and Bradley in 1986. MDCT later became the main body of the MP3 algorithm. Ernst Terhardt et al. Built an algorithm that describes auditory masking with high precision in 1982. This work adds to many reports by authors dating back to Fletcher, as well as work that originally defined critical ratios and critical bandwidth. In 1985, Atal and Schroeder introduced Code Excited Linear Prediction (CELP), an LPC-based perceptual speech coding auditory masking algorithm that achieved a significant degree of data compression for its time. IEEE peer-reviewed journal “Favorite Communications” reported on a wide variety of audio compression algorithms (mainly perceptual) in 1988. The February 1988 issue of Voice Coding for Communication reported on a wide range of audio compression algorithms bit-based established and operational. technologies, some of which use auditory masking as part of their core design, and some of which show real-time hardware implementations. – https://ru.qaz.wiki/wiki/MP3