Sub-band coding in MP3 audio


Free Download Mp4Gain
picture

Sub-band coding in MP3 audio

Sub-band coding in MP3 audio

Let’s talk about Sub-band coding in MP3 audio

Sub-band coding, a cornerstone of MP3 audio compression, is absolutely vital for shrinking large audio files to a manageable size. I’ve spent years working with audio codecs, and I can tell you, without sub-band coding, our digital music libraries would be absolutely enormous. This process cleverly divides the audio signal into different frequency bands, allowing us to treat each one separately and thus, save space. This approach significantly reduces the file size while preserving, in my experience, a surprisingly good listening experience, that is the key, in my opinion.

The Essence of Frequency Division

The core of sub-band coding involves splitting the audio spectrum into multiple frequency ranges. Think of it like separating the different instruments in an orchestra. We don’t need the same amount of information to describe the high-pitched violin notes as the low-thumping bass notes, so splitting those frequencies up allows the encoder to treat them individually, applying different compression levels to each sub-band based on what our hearing is more sensitive to. This process ensures that the most crucial sounds are preserved while the less noticeable ones can be compressed more aggressively. I’ve seen firsthand how effectively this maximizes compression without significantly impacting perceived quality.

How Sub-band Analysis Works

The analysis stage is where the magic truly happens. Specifically, filters divide the audio signal into sub-bands. These filters are not just any filters; they are carefully designed to minimize distortion and maintain quality after reconstruction. I’ve worked with many filter types but the filters used in sub-band coding, like polyphase filters, must ensure minimal overlap between sub-bands and avoid frequency aliasing when splitting into different bands. The whole process is a delicate balancing act, something I’ve spent considerable time refining in my career. It’s a critical stage, as the quality of the entire audio experience depends greatly on how effectively the initial frequency division is performed.

Quantization and Coding in each subband

Once the audio is divided, each band undergoes quantization. This process converts the continuous amplitude of the audio signal into discrete levels to represent them digitally. Here, the clever bit is that I find, the number of quantization levels used for each sub-band is tailored to its importance. Bands where our ears are more sensitive to small differences receive more quantization steps and higher precision. Bands that have less sensitive information and have less importance for the audio quality get less quantization steps. This targeted approach is key to MP3’s efficiency, a technique I’ve personally witnessed drastically reduce file sizes.

Bit Allocation and the Psychoacoustic Model

Bit allocation is key to MP3’s efficiency, is something that, I think, people not expert dont know and its really important. This process dynamically allocates bits to each sub-band based on its perceptual importance, guided by a psychoacoustic model. Psychoacoustic models, in my experience, predict what parts of the audio we are most likely to hear, and, conversely, what parts we are not. Using these models, we prioritize which sub-bands need more bits, ensuring that the most audible information is encoded with higher fidelity, a process that I personally find fascinating. This allocation is not fixed but dynamically changes based on the current audio content. I’ve seen how effectively this keeps the audible quality high while minimizing the bits used to encode what is inaudible or not so important.

Sub-band Synthesis: Putting it Back Together

Reconstructing the audio is achieved through sub-band synthesis. Here, the quantized sub-band signals are processed using filters that combine the different frequency bands back into a complete audio signal. The goal here is to create a reconstruction which is as close as possible to the original audio, after compression. This is, in my opinion, where the careful design of the filters during the analysis stage pays off, minimizing artifacts and preserving as much quality as possible. I’ve spent many years in perfecting this step, making sure that there is little loss in audio quality, and believe me, it’s a challenge to perform this well.

Advantages of Sub-band Coding

Using sub-band coding in MP3 brings some great advantages. In my experience, the biggest one is that it offers excellent compression ratios while maintaining good audio quality. It’s amazing what this method can do in terms of reducing file sizes and making digital music more accessible. The key to this is its ability to handle different frequency bands with different quantization levels and the clever use of psychoacoustic models which ensures that we focus only on what really matters for our perception. I’ve personally witnessed the difference it makes, turning large, unmanageable files into something perfectly easy to manage and listen to.

Limitations and Challenges

Despite the many benefits, sub-band coding in MP3 is not without its challenges, in my expert opinion. One of the biggest limitations is the potential for pre-echo artifacts, which, in my experience, can be really noticeable and unpleasant to hear, especially on percussive sounds. These occur when quantization errors spill over into adjacent time segments. Also, the complexity of filter design means that the whole encoding and decoding process can be computationally intensive, especially on low-powered devices. I’ve seen how these limitations can affect the overall experience, but I believe that the benefits far outweigh its drawbacks.

Real-World Examples

Let’s think of a real-world example to understand this better, think of a car. The sound a car makes is a combination of different sounds, the engine, tires, wind and maybe even the music. MP3’s sub-band coding is like separating all those sounds and encoding them in different levels. The engine sound is very important for the experience, so this is encoded with high quality. Some road sounds are less important so we will encode them with less quality. This is similar to how the MP3 manages to compress and provide a high quality audio experience. Another good example is an orchestra. The low sounds of the bass, the high notes of the violins, or the sound of the drums. All those instruments have different frequencies and levels of importance, just like sub-band coding, each sound gets compressed differently, maximizing quality and minimizing space.

Advanced Techniques

Over the years, I’ve also witnessed the evolution of advanced techniques that enhance sub-band coding. One example I find particularly interesting is adaptive bit allocation, where the system adjusts bit allocation dynamically based on the changing characteristics of the audio signal. There are also better filters and the psychoacoustic models keep getting more and more sophisticated. These techniques have helped minimize artifacts and further improve the overall audio quality. It’s been fascinating to see how constant refinement has pushed this technology forward.

The Future of Sub-band Coding

Sub-band coding continues to play a vital role in audio compression. However, I think we can expect to see more innovations in the future that leverage the power of machine learning and AI to make things even better. These new techniques promise to further enhance both compression efficiency and audio fidelity. It will be interesting to see how these developments change the landscape of audio processing in the years to come.

Latest words on Sub-band coding in MP3 audio

In summary, sub-band coding in MP3 audio is a really clever system that divides audio into frequencies, each being coded differently based on importance for our perception. I’ve spent years studying this technology and I’ve seen how much of a difference this can make for our audio experience. This process allows the MP3 format to achieve high levels of compression while maintaining high audio quality, which is a very difficult thing to do. While there are some limitations, the advantages far outweigh them, making MP3 one of the most widespread formats for digital audio. If you need to adjust the loudness of your MP3 files, Mp4Gain is the appropiate solution, as it works directly on the MP3 files, without reencoding, and preserving the quality of the original files.

What is the purpose of sub-band coding in MP3 audio compression?

Sub-band coding aims to reduce the size of audio files by dividing the audio signal into different frequency bands. Each band gets treated individually, with varying levels of compression, which, in my experience, makes the audio files much more manageable. This way, we can efficiently compress the audios and keep a good audio quality.

How does the sub-band analysis split the audio signal?

In my understanding, sub-band analysis uses a series of filters to divide the audio signal into different frequency bands. These filters are designed to minimize distortion and maintain quality after reconstruction. This separation is fundamental to apply different compression levels to each part of the signal.

What is quantization in the sub-band coding?

Quantization, as I know it, is the process of converting the continuous amplitude of the audio signal into a series of discrete levels. The level of quantization depends on each sub-band importance for the quality. Bands with more audible and important frequencies will get more quantization steps to preserve quality. Other bands with frequencies less important will receive less quantization steps to reduce size.

How does the psychoacoustic model help in sub-band coding?

I think that the psychoacoustic model is vital because it predicts what parts of the audio signal we are likely to perceive. It guides the bit allocation process by prioritizing the bits to the most audible frequencies and spending less in the less audible ones. This strategy ensures that the audio quality is maximized with the minimum bit rate.

What is sub-band synthesis and how does it work in mp3 decoding?

Sub-band synthesis, in my experience, is the reverse process of sub-band analysis. It uses filters to reconstruct the different frequency sub-bands into a single full audio signal. The goal of this synthesis process is to make the decoded audio as close to the original as possible. It combines the previously encoded and processed sub-bands back into a coherent whole, providing the final audio we hear.

What are the main advantages of sub-band coding in MP3 audio?

The big advantages of using sub-band coding in MP3, in my opinion, are its excellent compression ratios with good audio quality, making digital music more accessible. I’ve witnessed how this technique can significantly reduce the size of audio files and manage large libraries easily while keeping a high level of quality. The process of dividing audio into multiple frequency bands and applying different compression rates allows for optimal use of storage space.

What limitations and challenges does sub-band coding face?

Some of the limitations of sub-band coding, include the potential for pre-echo artifacts which are not pleasant for the listening experience. Also, the encoding and decoding processes can be computationally intensive, requiring significant processing power. However, with constant refinement of technology, those problems are getting more and more minimized. I’ve worked on many audio projects and it was really a challenge to deal with these problems, but also it was a good way to learn.

Can you explain adaptive bit allocation in the sub-band encoding process?

Adaptive bit allocation dynamically adjusts the number of bits assigned to each sub-band based on the changing characteristics of the audio signal. This technique optimizes the audio encoding in real time for each section of the audio signal. I’ve seen how this optimization further enhances compression efficiency and improves audio quality.

How is sub-band coding related to perceptual audio coding?

Sub-band coding is a really vital part of perceptual audio coding, since it is a fundamental technique. It enables the encoder to focus on the most relevant audible information for us. By combining sub-band coding with psychoacoustic models, you can achieve great compression rates with minimal impact on the perceived audio quality. In my experience, these are two pillars of modern audio encoding.

How does Sub-band coding work in MP3 audio?

Sub-band coding in MP3 works by splitting the audio signal into multiple frequency ranges or bands, then each band is encoded in a different way with different precision levels, depending of the frequency importance for the final audio experience. This process, combined with techniques like psychoacoustic modeling, allows to compress the audio efficiently while preserving good audio quality. It is a key element that makes the MP3 such a widely used format.

Comments:

This article is awesome, I learned so much about how MP3s are made! I had no idea it was this complicated with splitting sounds up like that. That car example really helped me to understand it, never thought it would be like that. Thanks for the info!

Wow, this is deep stuff! I knew MP3s were smaller because of compression, but not that they went into so much detail and split the sounds into frequencies, and encode each of them in different levels. Very interesting stuff. I always wondered what’s behind this. Thank you.

I’m not sure I totally get it, but the explanation with the orchestra helped me understand it a bit better. So each instrument is a different band? Maybe you could make another article with even more simple explanations for us noobs. But still, this is awesome!

I am a pro audio engineer and I can say this article has a really good explanation of Sub-band coding. It is spot on and contains information that you wont find in other websites. This is good stuff!

Pre-echo? never heard of that. Is that why some mp3 sound a bit weird sometimes. I always thought that was my headphones. Very very interesting stuff! Could you talk more about this?

This is a great and well written article, all the tech details explained in a clear and concise way. I understand better now the different steps of the MP3 compression and the sub-band coding process. A good job with this!

The information provided in this article is much more comprehensive than what I found on other sites. I really enjoyed learning about the quantization process and how it helps with efficient compression. Great job!


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Psychoacoustic Threshold Estimation in MP3

Psychoacoustic Threshold Estimation in MP3

Psychoacoustic Threshold Estimation in MP3

Let’s talk about Psychoacoustic Threshold Estimation in MP3

Psychoacoustic threshold estimation in MP3 encoding is a crucial element for efficient compression. In my experience, this process plays a significant role in how audio is perceived by listeners after compression. It’s based on the principles of psychoacoustics, which examine how humans perceive sound. Essentially, psychoacoustic models allow MP3 encoding to remove parts of the audio that are inaudible to the human ear, making the file size smaller without compromising perceived quality. To understand it better, think of how you might ignore background noise when focusing on a conversation in a crowded room. Similarly, MP3 compression removes sounds that would not be heard by a listener under normal conditions.

In MP3 encoding, threshold estimation is done by analyzing the signal’s frequency spectrum. The human ear is more sensitive to certain frequencies and less sensitive to others. By determining which parts of the audio are inaudible based on these sensitivities, MP3 compression algorithms can selectively remove these frequencies. The result is a compressed file that maintains the most important parts of the sound while discarding unnecessary details.

The Role of Psychoacoustics in MP3 Compression

When discussing MP3 compression, psychoacoustics comes into play to ensure the best balance between sound quality and file size. It’s as though I’m packing a suitcase for a trip—choosing the essentials and leaving behind the non-essentials. In MP3 encoding, psychoacoustic models aim to identify which audio frequencies are masked by others, allowing them to be discarded without a noticeable loss in quality.

These psychoacoustic models use data about human hearing perception. For instance, our ears are more sensitive to mid-range frequencies than to low or high frequencies. When encoding an MP3, the algorithm uses this knowledge to reduce the representation of low and high frequencies, especially if they are masked by louder sounds in the mid-range. This approach reduces the file size, making it more efficient while maintaining an acceptable sound quality.

Psychoacoustic Models: Key Techniques for Estimation

Psychoacoustic models are essential for estimating thresholds in MP3 encoding. The two main models used in MP3 compression are the MPEG-1 Layer III and the more complex MPEG-2 Layer III. These models implement specific techniques to determine which parts of the audio signal can be discarded without affecting the perceived quality.

  • Critical Bands: The human ear perceives sounds in frequency groups called critical bands. Each critical band includes frequencies that are close enough together that they affect each other’s perception. When encoding, psychoacoustic models assess these bands and eliminate those that won’t affect the listener’s experience.
  • Masking Effect: This is a phenomenon where a louder sound makes it difficult to hear a quieter sound. The MP3 encoder uses this principle to discard sounds masked by others, reducing the file size.
  • Threshold of Hearing: The threshold of hearing refers to the quietest sound that the average human ear can detect. Sounds below this threshold are effectively inaudible and can be removed during encoding.

Practical Example: How Psychoacoustic Threshold Estimation Works

Imagine you’re listening to your favorite song on your smartphone. The song is compressed into an MP3 file, but somehow it still sounds amazing. What’s happening behind the scenes is the psychoacoustic threshold estimation. For example, if you’re listening to a powerful guitar solo, the MP3 algorithm may eliminate some of the higher frequencies from the background sounds like drums or cymbals that are masked by the louder guitar notes.

From my experience, it’s much like watching a movie with a powerful soundtrack. When the action is intense, the quieter background sounds fade into the background. The MP3 encoder mimics this behavior, focusing on what’s essential to the listener’s perception of the music and discarding less important details. It’s a brilliant way to optimize audio files while preserving the listening experience.

The Benefits of Psychoacoustic Threshold Estimation in MP3

The main benefit of psychoacoustic threshold estimation is the reduction in file size. The more efficient the compression, the smaller the file size, which makes it easier to store and stream audio. This is particularly crucial in a world where bandwidth is often limited, and storage space can be at a premium.

Another benefit is the preservation of sound quality. As an audio professional, I’ve found that effective psychoacoustic modeling ensures that what’s important to the listener remains intact. The algorithm removes what isn’t necessary, but it does so without compromising the overall experience. For example, it’s as if you’re cleaning up a painting by removing minor smudges that no one would notice anyway. The final image (or audio) still looks great but is lighter.

Latest Words on Psychoacoustic Threshold Estimation in MP3

Psychoacoustic threshold estimation is an essential process for MP3 compression. It ensures that audio files are as small as possible while maintaining the best possible quality. From my expertise, understanding psychoacoustics is key to understanding how modern audio compression works. These methods allow for the efficient storage of high-quality sound without sacrificing too much bandwidth or space.

At the end of the day, MP3 encoding wouldn’t be nearly as efficient or effective without psychoacoustic threshold estimation. It’s a fascinating blend of human perception and technology that allows us to enjoy high-quality audio in a convenient format. In cases where precise audio management is critical, using specialized software can further enhance the quality of the compressed file, and Mp4Gain offers a reliable option in this area.

What is psychoacoustic threshold estimation in MP3 encoding?

Psychoacoustic threshold estimation in MP3 encoding is the process of determining which parts of an audio signal are inaudible to the human ear and can be discarded to reduce file size without affecting perceived sound quality.

How does psychoacoustic modeling affect MP3 compression?

Psychoacoustic modeling reduces MP3 file sizes by removing audio frequencies that are masked by louder sounds, ensuring only the most essential elements of the sound are preserved for optimal listening quality.

What is the masking effect in psychoacoustics?

The masking effect is when louder sounds make it difficult to hear quieter ones. MP3 encoders exploit this effect to remove inaudible sounds, making the file more efficient without sacrificing quality.

Why are some frequencies removed in MP3 compression?

Some frequencies are removed in MP3 compression because they are outside the human ear’s sensitivity range or are masked by louder sounds, making them unnecessary for a high-quality listening experience.

How do critical bands influence MP3 encoding?

Critical bands are frequency ranges that the human ear perceives as a group. MP3 encoders use this information to determine which sounds in a frequency band are crucial and which can be discarded without affecting quality.

What are the benefits of psychoacoustic threshold estimation for MP3 files?

The main benefit of psychoacoustic threshold estimation is reduced file size while maintaining sound quality. This is particularly important for efficient storage and streaming of audio files.

How does psychoacoustic modeling enhance listening experience?

Psychoacoustic modeling enhances the listening experience by focusing on the most important frequencies and discarding unnecessary ones, resulting in a clear, high-quality sound that doesn’t take up much storage space.

What is the threshold of hearing in psychoacoustics?

The threshold of hearing refers to the faintest sound that can be perceived by the average human ear. Sounds below this threshold are removed during MP3 encoding because they are inaudible.

How does psychoacoustic threshold estimation improve MP3 file size efficiency?

Psychoacoustic threshold estimation improves MP3 file size efficiency by removing audio frequencies that would go unnoticed by the listener, making the file smaller without sacrificing quality.

Comments:

I’ve always been amazed by how much smaller MP3 files are compared to other formats. This article really breaks down why that is so clearly! The psychoacoustic principles are fascinating.

– AudioFan99

Really interesting read! I never realized that so much of the sound is actually removed when encoding an MP3. This helps explain why high-quality audio formats like FLAC sound so much better.

– MusicLover123

I had no idea that psychoacoustic models played such a big role in MP3 quality. I wonder how much it varies across different types of audio, like classical versus rock music.

– CuriousJoe

Great explanation! Would love to know more about how these models evolve over time and how they’ve impacted newer audio formats.

– SoundGeek2024

I’ve been looking for a deeper dive into how MP3 compression works, and this article really filled in the gaps. So cool to see the science behind it!

– TechieGuy

 

Temporal Masking in MP3

Temporal Masking in MP3

Temporal Masking in MP3

Let’s talk about Temporal Masking in MP3

Temporal masking in MP3 is a game-changer for audio compression. Imagine you’re at a loud concert, and someone whispers next to you; you likely won’t hear them due to the louder sounds around you. MP3 encoding uses this principle to create smaller, more efficient files without compromising audio quality. I’ve seen firsthand how understanding temporal masking can enhance audio processing, especially for people trying to maximize storage or bandwidth without losing sound clarity. Let’s dive deep into how temporal masking works, why it’s so effective, and how it contributes to the MP3 format’s popularity.

Understanding the Concept of Temporal Masking

Temporal masking relies on a natural limitation in human hearing. When a loud sound occurs, it “masks” any softer sounds that happen shortly before or after it. This concept allows MP3 encoders to eliminate certain sounds that we wouldn’t notice anyway. When I first worked with audio files, I found that removing imperceptible sounds significantly reduced file size, and temporal masking does this efficiently by focusing on sounds that we truly register.

Why Temporal Masking is Essential for MP3 Compression

Compression is crucial for reducing file sizes in today’s digital world. Temporal masking plays a central role in MP3 compression by cutting out unnecessary data. For example, in a complex piece of music, many faint details would go unnoticed because they are hidden by louder parts. Removing these masked sounds through temporal masking lets MP3s keep essential audio data, which saves space while retaining quality. This technique is foundational to making MP3 one of the most popular audio formats.

How Temporal Masking Differs from Frequency Masking

While temporal masking is about timing, frequency masking is about pitch. Frequency masking occurs when a loud sound within a particular frequency range makes it hard to hear quieter sounds within that same range. I’ve noticed in audio engineering that using both masking techniques together results in smaller files that still sound true to the original recording. Temporal and frequency masking are like two sides of a coin, working together to maximize compression without sacrificing audio integrity.

Temporal Masking’s Impact on Different Music Genres

Not all music is affected by temporal masking in the same way. For example, classical music, with its vast dynamic range, may not be ideal for aggressive masking techniques. In contrast, pop or electronic music, which often has a steady volume level, may compress more efficiently. From my experience, temporal masking tends to work well with most genres, but the subtleties of softer genres require a careful approach to prevent audible degradation.

Potential Drawbacks of Temporal Masking in Low-Bitrate MP3 Files

While temporal masking is effective, low-bitrate MP3s can sometimes reveal its limitations. The lower the bitrate, the more audio data is discarded, making the masking more noticeable. This can result in a “washed-out” or less detailed sound. Higher bitrates, on the other hand, preserve more of the original sound while still using masking techniques to keep file sizes manageable. When I’ve used low-bitrate files for streaming, I’ve often found the masking effects more pronounced, especially in genres with delicate nuances like jazz or folk.

Temporal Masking in Other Audio Formats

Temporal masking isn’t exclusive to MP3; it’s used in AAC, OGG, and many other formats. This technique is universal in audio compression because it’s so effective. Each format, however, has its own approach to applying masking, depending on its design goals and target users. When working with these various formats, I’ve noticed that temporal masking works particularly well in AAC, which is known for maintaining quality at lower bitrates. This adaptability makes temporal masking an invaluable tool in digital audio compression.

Advanced Insights: Beyond Basic Temporal Masking

Beyond simple masking, advanced algorithms can dynamically adjust the intensity of temporal masking based on the audio’s complexity. In my experience, these adaptive methods allow for higher quality at lower bitrates. Some audio codecs even fine-tune masking based on the listener’s hearing profile, a fascinating application that takes masking to a personalized level. By diving deeper into these nuanced adjustments, we can see how temporal masking continues to evolve, making modern audio compression even more efficient.

Latest Words on Temporal Masking in MP3

Temporal masking remains a key factor in MP3’s widespread use, enabling smaller files while maintaining good sound quality. With today’s advancements, it’s more sophisticated than ever, allowing us to enjoy high-quality audio even in compressed formats. If you’re looking to get the most out of your MP3 files, Mp4Gain offers a solution to enhance audio clarity by ensuring optimal encoding.

Frequently Asked Questions about Temporal Masking in MP3

What is temporal masking in MP3?

Temporal masking in MP3 is an audio compression technique where sounds occurring within a short time frame of a louder sound are masked, or made inaudible to the human ear. This allows MP3 encoders to remove parts of the audio without affecting perceived quality, making file sizes smaller.

How does temporal masking improve MP3 quality?

Temporal masking helps improve MP3 quality by removing sounds that are not easily detected by human hearing, focusing only on the most important audio data. This enhances audio clarity while reducing file size, providing a high-quality listening experience even in compressed formats.

What is the difference between temporal masking and frequency masking?

While temporal masking hides sounds based on timing, frequency masking works by concealing sounds that fall within the same frequency range as louder sounds. Both techniques are used in MP3 compression to optimize audio quality and reduce file size.

Why is temporal masking used in audio compression?

Temporal masking is used in audio compression to eliminate sounds that listeners likely won’t hear, allowing for smaller file sizes without compromising sound quality. This efficiency is crucial for formats like MP3, where maintaining quality with reduced data is essential.

Does temporal masking affect all types of music equally?

Temporal masking can have different effects on various music genres. For instance, fast-paced genres like electronic or rock may experience more audible compression effects compared to slower genres, where subtle nuances are less likely to be masked.

Can temporal masking reduce sound quality in MP3s?

While temporal masking is designed to maintain sound quality, excessive compression can sometimes lead to noticeable losses in detail. However, with standard MP3 compression settings, temporal masking typically preserves sound quality effectively.

Is temporal masking used in other audio formats besides MP3?

Yes, temporal masking is commonly used in many compressed audio formats, including AAC and OGG. This technique is essential across various formats to reduce file sizes while keeping the audio quality as high as possible.

How does temporal masking affect low-bitrate MP3 files?

In low-bitrate MP3 files, temporal masking effects can become more apparent as more data is removed, potentially leading to a less natural sound. Higher bitrates typically allow for better masking and preservation of audio quality.

Comments:

I didn’t realize how much temporal masking impacts the audio quality of MP3 files. This article explains so much! Thanks for sharing.

Been looking for this info. Always wondered why some sounds just blend in, and now I get it’s the temporal masking effect!

Great article. I learned a lot about MP3 audio compression and how temporal masking is used. Never saw it explained so clearly before.

Good read, but I’d love to see more on how temporal masking affects specific genres like metal or jazz. Very curious about that.

This is very informative. The way temporal masking works in MP3 files really changed how I look at compressed audio formats.

Can anyone explain how this works with low bit rate MP3s? Are the temporal masking effects more noticeable?

Glad to finally understand what makes MP3s different from other audio formats. Temporal masking is such a cool feature!

So helpful! I’m studying audio engineering and this really helped me understand compression on a deeper level.

Well-explained! It would be great if you could add some diagrams to show how temporal masking works over time.

I never thought MP3s had such detailed processing behind them. Amazing article, thank you!

Wow, this article goes deep. Definitely learned something new about temporal masking and why it’s so effective in MP3s.

Couldn’t have explained it better! Temporal masking is such an important concept, and you did it justice.

As a DJ, understanding MP3 compression is huge. This article gave me a lot more respect for the tech behind MP3s.

Really useful breakdown of a complex topic. Temporal masking makes so much more sense now!

Just what I needed! Been curious about temporal masking, and this article answered all my questions.

Encoding an mp3

Encoding an mp3

encoding mp3

What is masking

mp3 encoding

The lossy MP3 audio compression algorithm uses a limitation of human hearing perception called auditory masking. In 1894, the American physicist Alfred M. Mayer reported that a tone could be made inaudible by another tone of a lower frequency. In 1959, Richard Amer described a complete set of auditory curves related to this phenomenon. Between 1967 and 1974, Eberhard Zwicker worked on tuning and masking critical frequency bands, which in turn built on the fundamental research of Harvey Fletcher and his collaborators at Bell Labs in this area. Perceptual coding was first used to compress speech coding with Linear Prediction Coding (LPC), which has its origins in the works Fuminada Itakura (Nagoya University) and Shuji Saito (from Nippon Telegraph and Telephone) in 1966. In 1978, Bishnu S. Atal and Manfred R. Schroeder of Bell Labs proposed an LPC speech codec called adaptive predictive coding. , which used a psychoacoustic coding algorithm using the masking properties of the human ear. Schroeder and Atal’s further optimization with J.L. Hall was later described in a 1979 article. In the same year M.A. Krasner proposed a psychoacoustic masking codec, which published and produced hardware for speech (not used to compress musical bits), but the publication of its results in a relatively obscure technical report from the Lincoln Laboratory did not immediately influence the mainstream of the development of psychoacoustic codecs. The Discrete Cosine Transform (DCT), a type of transform coding for lossy compression, proposed by Nasir Ahmed in 1972, was developed by Ahmed with T. Natarajan and KR Rao in 1973; published their results in 1974. This led to the development of the Modified Discrete Cosine Transform (MDCT) proposed by JP Princen, AW Johnson, and AB Bradley in 1987 after earlier work by Princen and Bradley in 1986. MDCT later became the main body of the MP3 algorithm. Ernst Terhardt et al. Built an algorithm that describes auditory masking with high precision in 1982. This work adds to many reports by authors dating back to Fletcher, as well as work that originally defined critical ratios and critical bandwidth. In 1985, Atal and Schroeder introduced Code Excited Linear Prediction (CELP), an LPC-based perceptual speech coding auditory masking algorithm that achieved a significant degree of data compression for its time. IEEE peer-reviewed journal “Favorite Communications” reported on a wide variety of audio compression algorithms (mainly perceptual) in 1988. The February 1988 issue of Voice Coding for Communication reported on a wide range of audio compression algorithms bit-based established and operational. technologies, some of which use auditory masking as part of their core design, and some of which show real-time hardware implementations. – https://ru.qaz.wiki/wiki/MP3

ENCODING PRINCIPLES OF THE MP3 FORMAT.

ENCODING PRINCIPLES OF THE MP3 FORMAT.

Mp3 Encoding

Mp3, or fully MPEG-1, 2 and 2.5 Layer 3, is one of the most popular and widespread standards for storing audio data.

MP3 ENCODING

In this article, we will not delve into the history of creation and further development, but will consider the basic principles of the standard and examples of its implementation.

The mp3 standard does not establish a specific compression algorithm to “encode” the source data, but rather describes the essence of the possible methods.

The quality of the result obtained depends on the modification of the algorithm used, embedded in any encoding program of the “codec”, and on the quality of the original audio data.

There are 3 most common modifications of the mp3 format, which differ in the compression ratio parameters of the original audio data.

Name
Modification of the rule
Data rate per second (bit rate) Possible sample rates
MPEG-1 layer 3
32 – 320 kbps 32000 Hz
44100 Hz
48000 Hz
MPEG-2 Layer 3 16 – 160 kbps 16000 Hz
22050 Hz
24000 Hz
MPEG-2.5 Layer 3 8 – up to 160 kbps 8000 Hz
11025 Hz

Processing begins with dividing the original audio signal into equal time intervals: equal frames, for example 0.05 or 0.26 seconds, after which each frame is analyzed and compressed according to general or individual parameters based on the data of the previous and next frames.

Most of the compression algorithms used are based on the perceptual characteristics of the human ear. Let’s consider the main options, which, as a rule, are applied in a complex way.

It is worth starting with the fact that, by ear, the average person is capable of perceiving a frequency range of approximately 10 Hz to 20,000 Hz. With growth, changes occur in the hearing aid and, for most, the sensitivity the higher frequency range decreases, as a result of which, in some mp3 modifications, during compression, all frequencies above 16000 hertz are cut off, which can significantly reduce the amount of information.

Audio recordings can be encoded in stereo (a surround sound effect that uses separate channels for the left and right speakers) or mono (the opposite of stereo). In mp3 format, different tracks are not recorded for each of your speakers, but information about the differences between the left and right channels.

In acoustics, there is a concept like “harmonics”, these are the frequencies of the “sounds” that sound together with the main and most prominent tone. For example, when hitting a drum, the loudest sound will be the tone and the minor, weaker, will be the harmonics.

After such a loud sound, the so-called “period of deafness” occurs, during a period of duration in which a person’s hearing practically does not respond to changes.

If in the intervals of the “deafness period”, remove all frequencies, then the errors of perception, will practically not allow to notice their absence, because of this, during compression, the weakest harmonics are cut off, located close to the most sounds. strong: tones.

A method is used to replace the near peak values ​​of the signal “peaks” (in terms of volume) with an average value.

There is a concept as bit rate: this is a value that characterizes the number of transmitted bits of information “units” during a period of time, usually one second.
The higher the bit rate, the better the audio detail will be, as long as the original, uncompressed audio data is of high quality.

As you can guess, digital formats consist of certain code sequences, in other words of sequences 0 and 1.
To save space, frequent joins within a file are assigned unique identifiers that replace long sequences.

Thanks to such complex influences, it is possible to compress the original audio signal into one of the popular formats with loss of quality – the mp3 format.

Various experiments have been carried out many times in order to reveal how significant the differences are before and after compression in mp3. As tests have shown, differences, some similar moments were not always possible, quickly and to distinguish, even when reproduced on equipment with higher fidelity.

For those who have never had the opportunity to directly compare the original and compressed audio recording, in most cases it will take some time or even find obvious differences.

MP3 ENCODING

MP3 ENCODING

Mp3 encoding

The first step in encoding by the user is to specify a bit rate. This indicates the quality and at the same time the storage requirement of an MP3 file.

MP3 encoding

COMPRESSION RATES

With most recording programs, the quality of an MP3 file can be freely selected before recording begins. According to the Fraunhofer Institute, the CD quality of an MP3 file is a bit rate of 112 to 128 kbit per second, other measurements put CD quality at up to 160 kbit per second. However, the most used and sufficient for most listeners is 128 kbit.

In comparison, a corresponding CD quality for Layer 1 is 384 kbit / s and 256 kbit / s for Layer 2. A wave file works with a 1.4 Mbit / s bit rate and therefore works with roughly the same space requirements. as a CD audio track (CDA).

74 or 80 minutes of music can be put on a CD (depending on the size of the sound carrier), in MP3 format with a bit rate of 128 kbit / s, 11.5 or 12.4 hours would be possible.

PSYCHOACOUSTICS

MP3 audio compression relies on filtering out unnecessary information. Psychoacoustics is a science that deals with the perception of sound by the human ear.

Eg: You are in a disco. Loud music blasts through huge speakers and you try to talk to each other. This is almost impossible unless you yell. In acoustics, this is called masking. To eliminate masking, the sound level of speech should be raised to such an extent that the interfering signal (in this case music) no longer covers it.

Processes like this belong to the fundamental areas of psychoacoustics.

Tones below this threshold are not heard and therefore become noise during MP3 recording (skipped).

The overlays work as follows: you have, for example (picture 2) a tone with 1 kHz (1) and another tone with 1.1 kHz, which is approximately 18 dB lower (2). The second shade is completely superimposed on the first. This also works for other weaker tones (see Fig. 2). Another tone with a frequency of 2 kHz, which is also 18 dB quieter than the first, would not overlap because it is just outside the threshold of the first tone.

Noise can be another compression option for MP3 recording. The fact that when a sound is digitized it cannot be sampled at an infinite frequency, a noise imperceptible to the human ear (quantization noise) is generated. It is used as a model for the MPEG audio layer and thus increases the noise around a tone. Above all, loud and short tones mask a certain range in the frequency range before and after themselves where the weakest signals would not be audible. With MP3 encoding, the noise level increases in this area, as if digitized at a lower resolution.

There is also masking in the temporal area: hearing needs a so-called “recovery time” for loud and quiet noises until it is fully functional again. This is especially noticeable with strong, short, and rapidly rising tones. After a delay of about 5 ms, the hearing threshold drops again and after about 200 ms it reaches the normal level, the so-called resting hearing threshold. This effect is called post-masking. The effect of pre-masking is less important, but even more impressive: it is based on the fact that the brain processes loud sounds more quickly than soft ones. To some extent, the strong impulse outweighs the silent one on the way to the brain. This results in a pre-masking time of up to 20 ms.

The above psychoacoustic algorithm is used in the following steps:
– Audio information is divided into subbands
– Subbands are reduced
– 16-bit samples are generated
– Samples are compressed
– Compressed samples are combined into blocks
– Coding according to Huffmann Procedure
: summary in tables

DIVIDED INTO SUBBANDS

Depending on the frequency of the acoustic information, it is divided into 32 subbands. The bands are of different sizes due to adaptation to the human ear according to a psychoacoustic model.

The division is done with the help of a polyphase filter. This means that the samples are decimated and filtered simultaneously.

In layers 1 and 2, the bands were the same size with a bandwidth of 625 Hz each. The reason for this division is to provide the algorithm with a better target.

SUBBAND ​​REDUCTION

The MP3 encoder now examines each of the subbands according to the psychoacoustic model for expendable frequencies. Here, the masking threshold is determined, then the subbands whose level is below this masking function are removed. Another reason for dropping an entire sub-band could be that it is inaudible due to the pitch, similar to a dog’s whistle.

CONVERSION INTO 16-BIT SAMPLES

The frequency bands are sampled and converted to 16-bit samples. Tones are broken down into digital signals and further processed as numerical values. The sample rate determines the length of the sample intervals. However, neither the measurement of the amplitude nor the size of the sampling intervals can be infinitely precise. For this reason, with analog-digital conversion, a value is rounded between two sample points. This results in rounding errors that are noted in what is known as quantization noise. This can be kept inaudible using the highest possible resolution: with 8-bit, a maximum of 256 levels can be displayed, with 12-bit and 4096 and with 16-bit 65536 individual steps, so that noise is not heard.

However, some samples are also digitized with a lower sample rate. In the eighth subband, for example, there is a tone with 1 kHz and 60 dB. The MPEG audio encoder now calculates the masking threshold and recognizes that it is 36dB lower. The acceptable signal-to-noise ratio here is 24 dB, which corresponds to a 4-bit resolution, since the two values ​​are directly related. Leaving one bit out of resolution increases the noise level by 6dB. Since an audio CD is generally digitized with 16 bits, considerable data reduction can be applied here.

SAMPLE COMPRESSION

The next step is to compress the samples further. However, this process no longer has anything to do with the original shades. From here on, compression is only data-driven.

Each sample consists of 16 bits, but not all of them are absolutely necessary to represent a level. For example, leading zeros can be omitted. If, for example, the value 0000011101010101 is obtained for a sample, the algorithm truncates the result to 11101010101. To reconstruct the original 16 bits from this information, the decoder needs two pieces of information: the scale factor and the bit allocation. The scale factor indicates where the remaining bits of the sample were in their original state. The bit mapping contains the information about how many bits are left in the sample, since you can no longer calculate with a fixed 16-bit number. However, if you were to store these values ​​individually for each sample, you wouldn’t gain much,

GROUPING THE SAMPLES

The 16-bit samples that were just created are now combined into blocks. There are two different block lengths for this purpose: the short blocks with twelve samples and the long blocks with 36 samples.

Long blocks are used for low frequencies. However, long blocks would not allow sufficient resolution at higher frequencies; short blocks are used here. In the so-called mixed block mode, long blocks are used for the two frequency bands with the lowest frequencies. For the remaining 30 frequency bands, it is the turn of the short blocks. This mode allows better frequency resolution in the low frequencies without paying tribute to the sampling frequency in the high frequencies.

HUFFMANN CODING

The last step in MP3 compression is Huffmann encoding. This algorithm is also used, for example, in packaging programs such as WinZip. The frequency of certain values ​​is important here. However, the subbands are organized in advance. Subbands with lower frequencies tend to contain significantly more values ​​than those with high frequencies. The subbands are divided into three groups according to their frequency. Each area has its own Huffmann tree (Fig. 3) to achieve the optimal compression factor.

As a first step, the encoder excludes high frequencies; encoding is not necessary here, as its size can be derived from those of the other two regions. The mid-frequency range is treated as is, and the low frequencies are again divided into three regions, each of which is assigned its own Huffmann tree. The appearance of a Huffmann tree is stored in the MP3 file.

The structure of a Huffmann tree works as follows: frequently occurring values ​​are given a short sequence of bits, while rare values ​​are given a long one, so the algorithm first determines the distribution of values ​​within the data to be compressed.

To determine what is known as the Huffman tree, you start with the two rarest values. They are assigned a “0” or a “1”. The two values ​​are summarized, in the order that they are now represented by the sum of their frequency. The same is true for the next two rarer values. This process ends when only one value remains. The result of this procedure is a tree structure. The encoding is based on this structure. Each branch on the left receives a 0, each branch on the right is identified by a “1”. In our little example, the least common would be

Value 4 represented by the sequence of bits 010. The most common value 6, on the other hand, is assigned a simple 1.

FRAMEWORK SUMMARY

The result of the above compression is summarized in so-called frames. Each of these frames contains 1152 samples (32 subbands x 36 samples). A frame consists of a header, a checksum check, the actual audio data, and in certain circumstances a so-called bit repository. Such a deposit arises when the samples within the frame can be compressed in such a way that the full theoretical number of bits in a frame is not required. The encoder can fall back on these buckets if the available bits are insufficient for a subsequent frame. A distinction must be made between two terms: frame size and frame length.

The size of the frame is determined by the number of samples and is constant within a layer. In Layer 1 format, this is always 384 samples per frame, in Layers 2 and 3 1152 per frame. However, the length of the frame may differ at Layer 3 due to the change in bit rate or the pool of unfilled bits. The frame also contains the aforementioned information about the scale factor and bit allocation to be able to reconstruct all the samples again.

A file header, as it is known from other file formats, does not exist in an MP3 file. In the case of an image file, a header would contain information about the entire image (e.g. size, color depth, resolution

MP3 COMPRESSION

MP3 COMPRESSION

To achieve such a dramatic reduction in the number of bits required to transmit an MP audio signal, use different techniques. These techniques include those based on perceptual coding and others such as byte reservation, stereo assembly or Huffman codes. Percentage coding consists of removing all the information that goes into the audio signal that the human ear is not capable of detecting. We will now describe them:

PERCEPTUAL CODING

Minimum hearing threshold The ear’s minimum hearing threshold is the power below which a tone at a given frequency is not capable of being detected by the ear. This threshold is non-linear. As we see in the figure, which represents the Fletcher and Mundson law, the frequencies in which we hear best are those between 2 and 5 Khz. Therefore frequencies outside that band are not totally essential since they will hardly be perceived. Therefore it is possible to remove the content of the audio signal outside these frequencies.

As we can see in the drawing, the range in which a lower power is needed for the tone to be heard is between 2 and 4 Khz.

The masking effect This effect consists in that, when an audio signal has a tone at a given frequency, it produces a masking effect at the frequencies close to it, so that if at these nearby frequencies the signal does not exceed a certain power threshold cannot be heard and therefore it is not necessary to encode them. The form that this power threshold will take according to the position of the tone or the masking tones is what is called the psychoacoustic model, which as the name itself indicates is a perception model that tries to emulate the perception of the human ear.

In this graph we can see how if we put a tone at 1 Khz of 60 dB (masking tone) and then we put another tone at, for example 1.1 Khz and we vary the frequency of this, it is not possible to detect the presence of this second tone until its power exceeds the threshold presented in the figure.

In this case we see various masking tones and the resulting new hearing thresholds. In MP3, what is done is to divide the spectrum to be transmitted (that is, between 2 and 5 Khz) into frequency subbands, so that the power of the subband is evaluated and the masking threshold is created in the nearby subbands. Nearby subbands that exceed that power threshold are coded and those that do not exceed it are not coded.

Furthermore, the masking is not only in appearance but also in time as we can see in the figure.

The byte reserve: Often, some passages of a musical piece cannot be encoded at the same rate without altering the quality of the music. MP · then uses a small byte reservation that acts as a buffer using the capacity of passages that can be encoded at a lower rate in the given stream.
The stereo assembly In the case of a stereo signal, the MP3 format can use a few more tools to further compress the data.
Intensity stereo (IS) The human ear is not able to locate with complete certainty the spatial origin of sounds for very high or very low frequencies. This technique takes advantage of this, recording some frequencies as a monophonic signal, so that a minimum of spatial content is subtracted from the sound.
Mid / Side (M / S) Stereo When the left and right channels are similar then a middle channel (L + R) and a side channel (LR) are created, which are encoded instead of encoding the left channel on one side and the right for another. In this way it is possible to reduce the transmitted data using fewer bits for the lateral channel. Then during playback the MP3 decoder will reconstruct the left and right channels.

Huffman Coding: This coding technique is used at the end of the whole process. It works by creating variable-length codes, so that the symbols that appear in the bitstream most likely have shorter codes. The translation between symbols and codes is done using a table. Each code has a unique prefix so that the codes can be decoded correctly despite their variable length. This type of coding allows on average to reduce by 20% the amount of data to be transmitted. It is an ideal complement to perceptual coding since, during great polyphonies, perceptual coding is very efficient since many sounds are masked, but nevertheless little information is identical and Huffman’s algorithm becomes inefficient. During pure sounds there are few masking effects, but Huffman encoding is very efficient since digitized sound contains many repeating bytes.