
What is MP3?
“MP3” widely used in audio players. The official name is “MPEG-1 Audio Layer III”, which is the audio format for MPEG-1. The MP3 format itself is being standardized in parallel with MPEG as the video format, and in 1992 it will be standardized as “ISO / IEC IS 11172-3 (MPEG-1 Audio)”.
![]()
After that, MP3s will be distributed “as is” among enthusiasts, but this has not been a major advance since the introduction of the portable “mpman” audio player launched by SAEHAN International in South Korea in 1998. By combining this player, which can download and play music data over the Internet, with Napster, which appeared in 1999, the scene of portable audio players that used to carry cassettes, CDs, MDs, etc. it will change completely.
MP3s can also reduce the original data to less than one tenth. For example, it has become possible to compress a one-hour music CD to about 40MB and, using Napster, etc., we have established a new need for music sharing between users. After that, despite various “RIAA (Recording Industry Association of America)” procedures and the emergence of successor formats formulated by many manufacturers, MP3s remain a widely used audio. It is still used as a format.
■ MPEG
To understand the working principle of MP3, let’s first explain about “MPEG Audio” itself. A feature of MPEG Audio is that it uses auditory psychology, the lower audible limit of hearing, and the masking effect.
Let’s start with this minimum audible limit. In general, it is considered that humans can hear sounds in the range of 20 Hz to 20 kHz. Of course, this is an average value, and some people can hear a wider range, while others can only hear a narrower range, but this time I’ll drop it.
So if you can hear any sound in the 20Hz to 20KHz range, that’s not the case. The lower audible limit curve is shown in Fig. 1, and it is possible to hear even a fairly low sound around 2KHz, but at frequencies above or below it, it is heard that it is not considerably loud. .
You may have heard the term “volume curve”, which is the curve shown in Figure 1. Therefore, even if there is a sound source that sounds in a wide range from bass to treble (Fig. 2 ), the human ear has the characteristic that it can only be heard with both ends drooping (Fig. 3). By taking advantage of this and omitting all inaudible frequency data, a great deal of compression is made possible.
Masking effect
The masking effect is another phenomenon. For example, when a very loud sound is generated at a certain frequency, a specific area called “Critical Band” is created before and after that. And you won’t hear any of the other sounds included in this critical band.
When sound A is generated, the sloping area that extends to the before and after frequencies is the Critical Band. I can hear the part of the B sound that sticks out of the Critical Band without any problem, but I can’t hear the C sound that completely fits into the Critical Band.
In MPEG Audio, compression efficiency is further improved by omitting sound data that cannot be heard due to this critical band as before. By the way, the masking effect itself is effective not only in the direction of frequency but also in the direction of the time axis. In other words, not only immediately after a loud sound is generated, but also just before that, you cannot hear a small sound for some reason. This is called the temporary masking effect, but in Figure 5, sound B and sound C become inaudible. This is also effective for data compression.







