how audio compression is carried out in mp3 Archives - Page 2 of 4

PCM conversion flow

Free Download Mp4Gain

PCM conversion flow

Pulse Code Modulation

Let’s summarize how analog music signals are digitized in PCM and burned to CD. PCM is an abbreviation for pulse code modulation. In Japanese, it translates to pulse code modulation method.

PCM

The music signal is originally a continuous analog signal. A continuous waveform that ripples like a wave will not fit in the hole of a CD as is, so test it first. What part of the rippling wave should be used as a sample? Of course, it is necessary to have regular intervals, and in the case of CD, it is decided to sample at 44.1 kHz. kHz is a unit of frequency and is the number of repetitions per second. We’re going to sample at a tremendous rate of 44,100 times per second. The job of sampling is sampling, and it does not mean that the waves are crushed separately.

After sampling in the direction of the time axis in this way, the next step is how to read the discrete data (points) with what precision. This is the quantification. It’s not used often, but in English it’s called quantizing. Since the vertical axis of the graph is the signal level, that is, the magnitude, the precision point is how many steps to read to the highest point of the wave. The unit is the number of bits.

The bits are a binary number in the digital count. Binary numbers are a game, and as the number of bits increases, the number that can be expressed at an accelerated rate increases (number of steps = sampling precision). The calculation is “2 raised to the power of the bits.” For example, 3 bits would have 2 x 2 x 2 = 8 steps, but 5 bits would have 2 x 2 x 2 x 2 x 2 = 32 steps. It seems that it will be incredible if we continue like this. Yes, 16 bits is 2 to the power of 16, so multiply 2 16 times to get 65536 steps. Remember the “65,000 steps”.

Still, it’s not analog per se, but if you play it on a CD player it will play the original continuous analog wave, which is why digital is Erai. Actually, after quantization, the encoding work is done and a 16-bit PCM digital signal is obtained as “010011 … 10”.

Digital is strict and, in fact, there are some rules. It is often said that “CD has a frequency range of 20 kHz and a dynamic range of 96 dB”. This is determined solely by the format. To put it bluntly, the 20 kHz high-frequency range comes from the sample rate, while the 16-bit quantization defines the D range as 96 dB.

It’s kind of logical, but it’s called “Shannon’s Sampling Theorem (Erai scholar)”, and it can record high frequencies up to almost half the sampling frequency (fs). For quantization, there is a guideline of 6 decibels per bit, which is 6 x 16 = 96 decibels.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

What are the sample rate, the number of quantization bits, and the clock?

What are the sample rate, the number of quantization bits, and the clock?

Sample Rate and Bit Depth

There is some format jargon that you really need to know about CDs. It is the “sample rate” and the “quantization bit number”.

Sample Rate and Bit Depth

Related to that, you will deepen your understanding if you also learn about the “clock” from the CD. The next time you learn “How to Read Specifications / Optical Discs”, it will go into your head.

■ What is the sampling frequency and the number of bits?

Digital audio recorded on a CD has a 44.1 kHz sample rate and a 16-bit quantization bit rate, right? Yes, that is correct. It has appeared several times so far, but this is the first time that we have explained it in detail from the basics.

First, let’s start with the image. Just the esoteric feeling of sampling and quantizing, and the “vertical slice” and “horizontal slice” of the signals first. Think of it like cutting a radish. First of all, I’ll cut it vertically with a kitchen knife. You can make a lot of cuts, but they were originally continuous. The solid curve is the analog voice, and the first thing to do when digitizing it is the “vertical slice” = “sample” image.

Next is the quantification work. Even if the cut is a cut, it is quantified to “cross” the kitchen knife on its side. Then the radish will be divided into small squares. Did you imagine that the finer the square, the closer it is to the original analog signal?

The CD format is the rule of how fine the radish is cut (analog signal). “The sampling frequency is 44.1 kHz and the number of quantization bits is 16 bits” means that the first sampling is done at a rate of 44,100 times per second, and then the level is read with an accuracy of 16 bits (2 to power step 16). . Sampling is also called sampling, but in the first place, sampling is the norm, and without sampling, the quantification work cannot be done.

What is the so-called bit rate?

What is the so-called bit rate?

BitRate

A value indicating how many bits of information are processed or sent / received per unit of time.

AUDIO COMPRESSION

Also called transfer fee. The amount of information in one second of audio data and video data is expressed in “bits per second” (bps: bits per second). Usually used in conjunction with “k (kilo)” which represents a unit of thousand or “M (mega)” which represents one million units because the number of digits increases and is expressed as “kbps” or “Mbps” . (1 kbps is 1000 bps, 1 Mbps is 1 million bps). It is often used in the audiovisual (AV) genre, and in the case of audio and image data, the higher the value, the more detailed the information, and the better the sound quality and picture quality. The standard bit rate for MP3, one of the audio compression formats, is 128 kilobits per second (kbps), which compresses uncompressed WAV files (approximately 1400 kbps) with CD sound quality to approximately one-tenth of the amount of information. what are you doing. The video bit rate is higher due to the large amount of information, and the high definition terrestrial digital transmission is about 18 megabits per second (Mbps), and the BS high definition digital transmission is about 24 Mbps. Also There is a unit that expresses the transfer speed, “bytes per second” (Bps or B / s), which is a reference value that expresses the number of bytes per second. Since 1 byte is 8 bits, Bps can be calculated by dividing bps by 8.

Bit rate

It is the data communication speed, which is the amount of data that can be sent and received in a certain period of time. The unit is “bps”, which is short for “bits per second”. It is also used to refer to the amount of data used to express one second of video or audio when compressing video or audio. The greater the amount of data (= lower the compression rate), the more faithful it will be to the original, but a high-speed communication line is required.
On the other hand, as the amount of data is reduced (= the compression rate is higher), the image quality and sound quality deteriorate, but transmission is possible even in an environment where the communication speed is slow .
⇨  bps, transmission.

Processed per unit of time, or the transfer is a bit number. It is generally expressed as a number per second and uses bps as the unit. In a computer network, it is represented by a physical quantity as a communication speed, and in data transfer with a peripheral circuit or device within a computer, it is represented by a physical quantity as a transfer speed. It is also used as a unit to express the amount of information per second when compressing audio and video data, and if this value is the same, the higher the value, the higher the sound quality and picture quality. ◇ Also called “bit rate”, “bit efficiency” and “bit rate”.

What is the best way to use compressed sound sources like MP3, AAC and WMA correctly? Part 2

What is the best way to use compressed sound sources like MP3, AAC and WMA correctly? Part 2

audio compression

User-friendly bit rate of sound quality and capacity is 128 kbps to 160 kbps
The problem is the compression rate (= bit rate) expressed in the unit of “kbps”. Difficult theory aside, it’s okay if you think the point is “bitrate = standard for numerically expressing sound quality”.

“Reduce the amount of data by reducing the sounds that are not harmful to the human ear” In a compressed sound source, the lower the bit rate, the lower the capacity, but the higher frequencies are cut off. So if you lower the bitrate too much during encoding, you will get some moody sound quality somehow.

・～ 96 kbps …… Since the sound does not lengthen, it is suitable for talk-centric radio programs, etc.
・ 128 kbps …… No matter who listens to it, there is not much discomfort. Suitable for pop and rock with PC speakers and car audio
・ 160 kbps …… Sound quality that can be satisfied even with general audio. Suitable for loud jazz
・ 192 kbps …… There are few glitches even when listening with headphones. Even classical music with a wide range is fine.
・ 256kbps / 320kbps …… High sound quality close to that of a CD (1411kbps equivalent)

Although there are individual differences, let’s think about it based on the above. The maximum difference in sound quality that a normal person can hear is 160 kbps. Beyond 192 kbps, you will not notice any difference unless you are a very “hearing” person.

Also, as the number of songs increases to 100 songs and 200 songs, the difference in capacity will be large, so choose a bit rate that is easy to use. If you convert a 4-5 minute song, often found in pop music, to MP3, the capacity will be roughly as follows.

·
128 kbps: Approximately 4 MB · 160kbps: Approximately 5-6MB · 192kbps:
About 7 MB320 kbps
: Approximately 10 MB

AAC and WMA have a higher compression rate than MP3 and the capacity is lower even at the same bit rate. Since it is also resistant to low bit rates, AAC and WMA can sound better at 128 kbps or less.

On the contrary, when it exceeds 160 kbps, MP3 has a superior sound quality in theory. Keep in mind that the higher the bitrate, the better the MP3 will be in terms of sound quality, whether you can listen to it or not.

What is the best way to use compressed sound sources like MP3, AAC and WMA correctly?

What is the best way to use compressed sound sources like MP3, AAC and WMA correctly?

Audio Compression

When listening to music on a smartphone or iPod, what you seem to know but not understand is digitally compressed sound sources like MP3, AAC, and WMA. Let’s think again about “in what format” and “how much bit rate” is good.

audio compression mp3 acc wma

◆ World standard MP3, Apple standard AAC, Windows standard WMA
You all know that there are various formats of “digital sound sources”.

The best known is the WAV format, which is also used for CDs. Since it is an uncompressed format, there is no deterioration in sound quality and it is very versatile, but the capacity is not small, just over 50MB in 5 minutes.

Therefore, when used with a portable music player such as a smartphone, iPod, or Walkman, it is common to convert (= encode) from WAV to compressed sound sources such as MP3, AAC (M4A / M4P), and WMA.

By the way, compressed sound sources are used from the beginning for download distribution like iTunes. AAC for iTunes, MP3 for Amazon, and WMA for major national distribution sites are mainstream.

・ MP3 …… The oldest compression format established in 1995. There are many supported products, and it is the de facto standard that can be used in any case. “MP4” is a video standard, so don’t get it confused.

・ AAC (M4A / M4P) …… A standard established after MP3, which is a standard format for Apple products such as iPod and iPhone. M4P is a file protected by copyright. AAC is also used for audio on digital terrestrial broadcasts and digital BS on television.

・ WMA …… A format advocated by Microsoft. It has a strong affinity for Windows and many products are also used in voice recorders.

Sample rate and bit rate Part 2

Sample rate and bit rate Part 2

Sample Rate Bit Rate

Listen and compare

sample rate and bit rate

Why don’t you really ask? In my memory, when I checked it in the past, I remember that it was difficult to distinguish it from the original sound (PCM) at 128 kbps of AAC under the conditions in the table above. I think this varies from person to person, and although I am involved with the audio and sound, I am aware that my ears are not a big problem, so even at a slightly higher rate, it is the same as the sound. original. I’m sure there are people who can tell the difference. At the low 32 kbps, you can clearly see the difference in sound quality. In terms of music, you can understand the metallic sound of the drum hi-hat.
Personally, I think that 44.1 Hz 16-bit (stereo) music CDs can be saved even at 128 kbps (1/10 compression or less) without losing sound quality. About 128 kbps is enough for my ears for both MP3 and AAC.

The bit rate is the compression rate
What happens if you set the encoding bit rate to 256 kbps for 16 kHz audio (monaural with 16 quantization bits)? .. .. Since the compression rate is 100%, it will be the same as the original sound. The sound quality should be the same as the original sound, but it may cause strange behavior depending on the encoders that are available for free (a configuration error may occur).

Sampling frequency Number of quantization bits Number of channels Original sound bit rate (PCM) Remarks
32 kHz 16 1 512 kbps Super Wide Band
24 kHz 16 1 384 kbps
16 kHz 16 1 256 kbps Broadband
8 kHz 16 1 128 kbps Narrowband
Regarding lossy compression of AAC and MP3, I think it is the result of research on how to encode at a low rate, so I personally think that setting a bitrate of 50% or more is not good. Lossless is recommended for compression ratios around 50% (lossless compression, MPEG-4 ALS, etc.). If you only think about saving, even if you compress it as is in PCM, it seems like it’s about half for audio with quiet sections. For lossy compression AAC, MP3, etc., if sound quality is important, about 15-20%, and if high compression is important, about 10% is sufficient sound quality.
Also, for audio purposes less than 10% and 5% is fine, but for audio it is recommended to lower the sample rate rather than suppress the bit rate to 48 kHz or 44.1 kHz (8 kHz or 16 kHz).

Stereo M / S (middle side)
The left and right signals are sum / difference signals. When encoding the sum signal (L + R) and the difference signal (LR) of both channels, the code is used when the correlation between channels is high, such as in stereo. The conversion efficiency is improved. For example, you can improve the coding efficiency of musical voices (L / R in phase, same amplitude).

Intensity stereo
When listening to high frequencies, the bit rate is reduced by combining the high frequency information (quantization coefficient) into one using the property that it is more susceptible to loudness than the L / R time difference.

In the end
Although bit rate may seem like a measure of sound quality, the digital audio field does not specify an encoded bit rate that exceeds the original sound bit rate. In short, I think it is important to use the proper bitrate for each encoder (encoder).

Sample rate and bit rate

Sample rate and bit rate

Sample Rates and Bit Depth

The compression ratio of audio encoding is determined by the bit rate at the time of encoding.

Sample Rate and Bit Depth

Last time I mainly wrote about the original sound bit rate (PCM), but this time I would like to write about the bit rate and compression rate of the encoding.

Specifically, setting a lower bitrate will increase the compression ratio and reduce the size of the file when it is saved. As I wrote last time, the bit rate of the sound source (PCM) before compression is as follows.

PCM bit rate = sample rate (Hz) x number of quantization bits x number of channels
For example, a music CD has the following 44.1 kHz stereo bit rate.

Music CD bit rate: 44100Hz x 16bit x 2ch (stereo) = 1411.2kbps
If it is encoded with MP3, AAC, etc., for example 256 kbps, the compression rate (assuming the original sound is 100%) is approximately 18% and the file size is 1/5 or less.

Encode Music CDs at 256 kbps: 256 kbps / 1,411.2 kbps = approximately 18%
If it’s 4 minutes of music, the file size is as follows.

Original sound: 1,411.2 kbps x 240 seconds = approximately 40.4 MB
Encode at 256 kbps: 256 kbps x 240 seconds = approximately 7.3 MB (+ header)
If a song is about 4 minutes long, 16 songs can be saved on CD650MB as original sound, but if it is encoded at 256 kbps as MP3 or AAC, 89 songs can be recorded.

Original sound: CD650MB / 40.4MB = about 16 songs
256 kbps encoded: CD650MB / 7.3MB = approximately 89 songs
If you check the web, you can compare the sound quality due to the difference in the bit rate. I think all the conditions are the same except the bit rate, but first of all there is a difference in the sound quality depending on the sample rate of the original sound source (PCM) and the number of quantization bits (the bit rate of the original sound changes). At the time of analog to digital conversion (ADC), the sound quality is determined by the conditions. No matter how high the bit rate is encoded for a sound source in poor condition, the sound quality is still poor. Even with the same bit rate, the compression rate changes depending on the number of channels (stereo or monaural). Therefore, strictly speaking, the evaluation of the sound quality cannot be judged only by the difference in the bit rate.
For example, when 48 kHz and 44.1 kHz 16-bit PCM is encoded at 32 kbps to 320 kbps, the compression ratio is as follows.

16-bit PCM compression ratio (when original sound is 100%)
Encoded bit rate 48 kHz stereo (1,536 kbps) 48 kHz monaural (768 kbps) 44.1 kHz stereo (1,411.2 kbps) 44.1 kHz monaural (705.6 kbps)
320 kbps 320/1536 = about 21% About 42% 320 / 1,411.2 = about 23% About 45%
256 kbps 256/1536 = about 17% About 33% 256 / 1,411.2 = about 18% About 36%
192 kbps 192/1536 = about 13% About 25% 192 / 1,411.2 = about 14% About 27%
160 kbps 160/1536 = about 10% About 21% 160 / 1,411.2 = about 11% About 23%
128 kbps 128/1536 = about 8% About 17% 128 / 1,411.2 = about 9% About 18%
64 kbps 64/1536 = about 4% About 8% 64 / 1,411.2 = about 5% About 9%
32 kbps 32/1536 = about 2% About 4% 32 / 1,411.2 = about 2% About 5%
Comparison with the original sound
It’s a bit of a twisted idea, but for example, which one is closer to the original sound, stereo or monaural in the above conditions?
Considering the compression ratio, it is the latter. Of course, stereo is superior to monaural in terms of expression, like expressing the depth of sound, so it makes sense to compare this and evaluate the sound quality, but in encoding, compression is done efficiently using stereo. Since there are algorithms (Stereo M / S and Stereo Intensity), the quality is not half that of monaural and the stereo is compressed efficiently.

What is the best way to use compressed sound sources like MP3, AAC and WMA correctly?

What is the best way to use compressed sound sources like MP3, AAC and WMA correctly?

Audio Compression

You all know that there are various formats of “digital sound sources”.

By the way, compressed sound sources are used from the beginning for download distribution like iTunes. AAC for iTunes, MP3 for Amazon, and WMA for major national distribution sites are mainstream.

・ WMA …… A format advocated by Microsoft. It has a strong affinity for Windows and many products are also used in voice recorders.

Based on these characteristics, let’s consider the compression format depending on the device used.

Methods of compression and compression of audio signals Part 3

Methods of compression and compression of audio signals Part 3

Audio Compression

The most popular compression format today is MP3.

The MP3 (MPEG Layer 3) format was developed, after several intermediate formats, by the Fraunhofer Institute in Germany. Actually, the .MP3 format relies on fooling the human ear. After some research, it turned out that human hearing tends to adapt to the appearance of new sounds, which is expressed in an increase in the hearing threshold. Therefore, some sounds are capable of masking (that is, making them subjectively inaudible) others. So in this format, some of the sounds that, according to the corresponding theory, are made inaudible, are simply removed from the general sound. The resulting “semi-finished product” is then encoded using the Hoffman method. Be sure to note that in the MP3 format, programs that compress the sound of the original are not standardized, that is, each competent programmer can implement their own compression scheme. And only the decoders obey the standards, which leads to the fact that the quality of MP3 playback does not always depend on the player that plays this file. Due to the different abilities and predilections of implementers of various encoders, some of them are better at handling symphonic music, some at rock and metal, some at rap and rave, etc.

JointStereo, which is one of the features of MP3, means that instead of encoding stereo as two independent channels, it encodes the call. center channel and the difference from the original stereo channels. Many stereo channel audio components are the same, and encoding them on the common channel allows you to free up additional bandwidth for more detailed encoding of the difference, leading to improved quality.

Be sure to mention the variable bit rate or VBR. This means that the encoder changes the compression ratio on the fly, depending on the nature of the sound. This approach results in a reduction in the final file size or, if quality requirements increase, the same file size produces better sound.

MP3 Pro – Introduced in 2001, the MP3 Pro codec was developed by Coding Technologies in association with Thomson Multimedia. It is MP3 based and as a result it turned out to be fully MP3 backward compatible and only partially forward compatible. It uses SBR (Spectral Band Replication) technology, so the codec provides good quality at low bit rates. However, the encoding quality at medium to high bit rates is inferior to almost all other codecs. As a result, MP3 Pro is used more for streaming on the Internet and demonstrating snippets of new musical compositions.

The MPEG-4 audio standard does not require a single or small set of highly efficient compression schemes, but rather a complex set to perform a wide range of operations, from low-quality speech coding to high-quality music and audio synthesis.

The MPEG-4 family of audio coding algorithms ranges from low quality voice (up to 2 kbps) to high quality audio (64 kbps per channel and higher).

RAW – Yes, it is not just the image format in which some digital cameras write photographs. In fact, RAW is the so-called. “Pure digitization”, which does not contain a title and contains only a sequence of samples of a sound wave. Typically, the scan is stored in 16-bit format.

Shorten is one of the first lossless codecs to appear. For a long time the project “slept sweetly.” However, in 2007, it began to develop again.

TTA (True Audio) – Finally about the most interesting. TTA is being developed by a team of our compatriots. And, I must say, the result of their work is impressive. All in order.

The codec is still quite young, but despite this it contains all the necessary features. We won’t list them again, we’ll just note that the format only lacks support for streaming audio over the network.

The format is open, as well as the source codes of the encoder program. There are compiled versions for Mac and Linux. There should be no compatibility issues during playback either, because there are already plugins for all popular players, as well as DirectShow filters for Windows Media Player. There is a plugin for Adobe Audition, which is important for musicians. For the past 4 years, hardware support has even appeared on players!

WAV – This is the primary audio format for many, many digital audio playback systems and is used as a standard audio file format on personal computers.

Compression and compression methods for audio signals Part 2

Compression and compression methods for audio signals Part 2

audio compression

FLAC is a member of the Xiph.Org codec family. By the way, it also includes the well-known ogg vorbis, one of the best lossy music compression algorithms. As a container for audio data, of course, OGG (files with the extension .ogg) and another open source container – Matroska (files with the extension .mka) are used.

It should be noted right away that both the FLAC format and algorithm are fully open. They are not patented, so they can be used completely free of charge in any program. This is the reason for the wide support for FLAC in players – any serious gamer has a plugin for FLAC. In addition, there are hardware mp3 players that support the FLAC codec.

The FLAC encoder is compiled for most platforms in use, so there should be no compatibility issues on alternative Windows operating systems.

FLAC supports tags in its own “FlacTags” format. There is the ability to encode multi-channel audio, a great advantage over Monkey’s Audio. The format supports any sample rate in the range of 1 Hz (!) To 65,535 Hz. Audio bit depth from 4 (!) To 32 bits.

FLAC is believed to be the most efficient use of system resources when decoding (playing) audio compared to other lossless codecs. Unfortunately, this is achieved at the expense of a significant increase in encoding (compression) time.

The FLAC website is regularly updated and new versions of the codec are released. Overall, FLAC is without a doubt the leader in terms of development activity. This may make it the main format in the future. Well, let’s see …

FLAC is the best option for storing high quality music.

MIDI (Musical Instrument Digital Interface) is a standard for hardware and software that allows you to play (and record) music by executing / recording special commands, as well as the format of the files that contain those commands. The playback device or program is called a MIDI synthesizer (sequencer) and is actually an automatic musical instrument.

Unlike other formats, it does not store the digitized sound, but sets of commands (played notes, links to played instruments, variable sound parameter values) that can be played differently depending on the playback device. The convenience of the MIDI format as a data representation format enables devices that produce automatic arrangements according to given chords, as well as 3D sound visualization applications. Additionally, these files tend to be orders of magnitude smaller than digitized audio of comparable quality.

Monkey’s Audio is a popular lossless digital audio encoding format. Distributed for free along with open source and a suite of encoding and playback software, as well as plugins for popular players. Monkey’s audio files use the following extensions: .ape to store audio and .apl to store metadata. Despite being open source, Monkey’s Audio is not free, as its license imposes significant restrictions on its use.

Audio files compressed with the Monkey audio codec have the extension ‘APE’; As you can see, the monkeys are present not only in the logo or the name (from English monkey: monkey, primate).

The average bit rate in an audio file is 600 to 700 kbps; compare with 128 kbps in MP3. Average compression is 40-50%, depending on the genre of music: if classical or jazz pieces are compressed in the best way, then compositions in the style of trash-metal or something similar “electronic noise” will show the worst result. . For codecs with acceptable quality loss, compression is approximately 80%.

There are four levels of compression. Maximum compression may seem like the only correct solution, although the compression time is quite long. However, you must also take into account the resource consumption of the system that plays the file; for the most compressed file, it is relatively high.

The .APE format provides tag support for searching for songs in your music collection. Another advantage is the verification of the integrity of the file during decoding. Recovery of original compressed .APE wav files is supported.

Monkey’s Audio has a graphical interface for Windows, in other words, a convenient window program to manage the encoding process. The rest of the codecs require the use of the command line or third-party interfaces.