sample rate and sample size Archives

What does the quality of an mp3 depend on?

Free Download Mp4Gain

What does the quality of an mp3 depend on?

Factors influencing hearing quality

digital sound

We must consider that we are talking about digital sound.
The audio as we hear it from Daria is an analog audio. This means that it is a continuum, there are no partitions, cuts, chunks, etc.

On the other hand, digital audio is made up of thousands of points that make up a curve, but the curve is not continuous but is made up of a series of points.

Of course, the more points that curve has, the smoother the curve is and the more similar it is to the initial analog audio.

When the CD was developed, the conclusion was reached to make 44100 shots per second, so that the curve was smooth enough and could contain the sounds in the range that the human ear can perceive them.

Because there are sounds that are too high-pitched that we cannot hear them and also others that are so serious that we cannot perceive them.

It is even known that as the years go by, a person can perceive very high-pitched sounds less, unlike adolescents who perceive such high-pitched sounds better.

So the first factor to take into account will be to have 44,100 or 48,000 samples per second, in order to have a smooth curve, with high quality.

Recordings with less than that sample rate are not of high quality. Sample rate is called the number of samples taken per second to delineate the sound curve well.

So you take a naudio file and make sure it has a sample rate of at least 44100 or 48000 frames per second to know it’s CD quality.

There are higher samplerates, for example, 96000 but we will talk about it later.

Mp4Gain is a software that manages these parameters perfectly. If you really want high quality sound, Mp4Gain is the right tool for you.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

What is the difference between 128k and 320k music? Part 3

The sampling frequency is approximately the following depending on the type of use (k is the thousand bit symbol, 1khz=1000hz):

8khz – used for phones etc, is enough to record human voices.

22.05khz: transmission use frequency.

44.1kb: Audio CD.

48khz: used in DVD and digital TV.

96khz-192khz: used for DVD-Audio, Blu-ray HD, etc.

The common range of sample precision is 8 bits to 32 bits, with 16 bits generally used on CD.

Having said that, my friends are starting to get confused. It’s not the bitrate that determines the sound quality, so why is everyone saying that 320kb sound quality is better than 128kb?

【Audio Compression】

Well, in fact, the bit rate should be said to be another dimension, it is a compression of audio files.

Nowadays, most of the audio formats we use regularly are based on the original “WAV” file of the audio CD (44.1khz sample rate, 16bit sample precision, 2ch). The original recorded sound data is stored in a matrix, which is in PCM format, while WAV format is an encoding format developed by Microsoft. Its function is to reproduce the data in PCM format through encoding.

Since the data in WAV basically completely restores the PCM data, MP3, AAC and other lossless encoding formats are basically recompressed based on the WAV files. Therefore, we can simply think that WAV is the original audio format and other audio formats are compressed formats.

When it comes to compression, storage and transmission are inseparable. The purpose of compression is to improve storage and transmission. Therefore, before we talk about compression, we need to understand the basic units of computers.

We all know that the computer is a binary number system, and the files stored by the computer are made up of two numbers, 0 and 1. Therefore, the computer’s transmission is based on each number, and each number is called 1 ” bit”. For example, for an audio piece, its basic data is “0,1,1,1,0,1, 1 ,0”, and when transmitting, these numbers are transmitted one by one. The sampling precision mentioned above is this unit.

The storage unit of the computer is “byte (Byte)”. In the computer, 1 byte consists of 8 bits, that is, 8b(bit)=1B(Byte). In computer parlance, data storage is expressed in decimal and data transmission is expressed in binary, so 1KB=1024B=1024×8b. This is also part of the reason why the hard drive capacity we see does not match the actual capacity.

Go back and talk about audio compression, the bitrate of the audio is actually the compression ratio. So the bitrate really just defines the size of the file, but because under normal conditions the larger the file, the less data you lose, so the sound quality is relatively higher. However, the bit rate itself does not directly affect the quality of the file. For example, if we take a 128kb file as the source file, even if it is converted to a 320kb file, the sound quality will not be better than 128kb. .

What is the difference between 128k and 320k music? Part 2

Bit rate, sample rate, lossless, MP3, FLAC, APE, 320kb, 192kb, 128kb, 44.1khz, CBR, VBR. Does this bunch of various names make you both familiar and unknown?

The higher the bitrate, the better the sound quality. Lossless music is the highest sound quality, right? So, let’s start with the sound collection.

【Audio composition】

Nowadays, when we talk about audio, everything is digital audio. Digital audio consists of three parts: sample rate, sample precision, and number of sound channels.

Sample Rate: Both the sample rate, which refers to the number of samples per second when recording the sound, expressed in Hertz (Hz).

Sampling Precision: Refers to the dynamic range of the recorded sound, measured in bits (Bit).

Sound channel: the number of channels (1-8).

In simple terms, we can think of a sound wave as a curve. We know that the curve is made up of points, and the sampling frequency is the number of points in the middle of the length per second (the horizontal axis of the figure above). Sampling precision is the number of points in the dynamic range (upper vertical axis). The finer the positioning of these two dimensions, the greater the true sound restoration and the better the sound quality. Of course, the larger the audio file will be. The customer mentioned by the previous colleague said that the latest Hi-Res Audio format released by SONY is a 6-channel 192kHz/24-bit recorded audio file. The size of the lossless format, of course, will be more than 200 megabytes.

What is the difference between 128k and 320k music?

I can’t fully understand music in words.show all

【Preface】

Some time ago, a colleague came across a very troubled client. The mess was said to have been caused by the client asking him to provide song files larger than 100MB-200MB in size. And my colleagues don’t know much about audio formats, so they started endlessly fumbling about FLAC, WAV and audio size. In the end, the colleague did not clearly explain to the customer what was going on.

After that, some other things happened that made me feel that in the music industry there are too many practitioners around me who have an extremely poor understanding of music and even lack some basic knowledge related to music. I don’t even have the idea to understand, which makes me very sad. It seems that music has only one merchandise attribute, and our practitioners only need to organize the shelves, encode various merchandise, and use the big data of users’ purchase records to recommend merchandise to users, no matter why to users. they like this. features that these products have, and use cold data to provide users with various services.

Therefore, I think it is necessary to write something. I don’t expect practitioners to become people who really love music. I just hope that even if you still think of “her” as a commodity, you can first figure out what you’re selling. and what is..

PS: The content of the first lesson is about media files. Since the relevant content involves a lot of technical issues, it seems a bit boring, but if you read it carefully, you will find that it is actually very easy to understand, but this basic knowledge can be very helpful.Improve your skill well. Also expect more interesting content about records, musical styles, etc. which I will post soon.

Related Audio Attribute Part 3

How samples are combined

This is mainly for two-channel or multi-channel audio. For a two-channel audio, it can be combined in the following two ways:

interleaved Taking stereo as an example, a stereo audio sample is obtained by interleaving the storage of two mono samples.
flat. The samples of each channel are stored separately.

The data after FFmpeg audio decoding is stored in the AVFrame structure.

In packed format, frame.data[0] or frame.extended_data[0] contains all the audio data.
In Planar format, frame.data[i] or frame.extended_data[i] represents the data of the i-th channel (assuming channel 0 is the first), the size of the AVFrame.data array is set to 8, if If the number of channels exceeds 8, you should get the channel data from frame.extended_data.

sample format
The sample formats in FFmpeg are mainly:

copy code
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = – 1 ,
AV_SAMPLE_FMT_U8, /// < 8 bits unsigned
AV_SAMPLE_FMT_S16, /// < 16 bits
signed AV_SAMPLE_FMT_S32, /// < 32 bits
signed AV_SAMPLE_FMT_FLT, /// < float
AV_SAMPLE_FMT_DBL, /// < double

AV_SAMPLE_FMT_U8P, /// < 8 bits unsigned, flat
AV_SAMPLE_FMT_S16P, /// < 16 bits signed, flat
AV_SAMPLE_FMT_S32P, /// < 32 bits signed, flat
AV_SAMPLE_FMT_FLTP, /// < float, flat
AV_SAMPLE_FMT_DBLP, /// < double, flat
AV_SAMPLE_FMT_S64, /// < 64 bits
signed AV_SAMPLE_FMT_S64P, /// < 64 bits signed, plain

AV_SAMPLE_FMT_NB /// < Number of sample formats DO NOT USE if dynamically linked
};
copy code
to illustrate:

1. U8 (8-bit unsigned integer), S16 (16-bit integer), S32 (32-bit integer), FLT (single-precision floating-point type), DBL (double-precision floating-point type), S64 (64-bit integer), those not ending with P are interleaved structures, and those ending with P are flat structures.
2. Flat mode is FFmpeg’s internal storage mode, and the audio files we use are in packed mode.
3. The FFmpeg audio sample format that decodes different output audio formats is not the same. The test found that the data output by AAC decoding is in floating point AV_SAMPLE_FMT_FLTP format, and the data output by MP3 decoding is in AV_SAMPLE_FMT_S16P format (the mp3 file used is 16-bit deep). For the specific sample format, you can see the format member in the decoded AVFrame or the sample_fmt member in the AVCodecContext of the decoder.

Bit rate
The transfer rate per second (bit rate, also called bitrate). Like 705.6kbps or 705600bps, where b is a bit, ps is per second (per second), which means a capacity of 705600bit per second. Compressed audio files are often represented at double speed, for example CD quality MP3 is 128kbps/44100HZ. Note that the unit here is bit instead of byte. One byte is equal to 8 bits (bits). The bit is the smallest unit. It is generally used to describe network speed and various communication speeds. The byte is used to calculate the size. hard drive and memory.

Mbps is: Millionbit per second (millions of bits per second);
Kbps is: Kilobit per second (kilobit per second);
bps is: bit per second (bit per second), the corresponding conversion ratio is:

1Millionbit=1000Kilobit=1000000bit; 1Mbps = 1000,000bps; Again, this is the unit of speed, which refers to the number of bits transmitted per second. The unit of measure for data transmission speed K is the decimal meaning, but the K for data storage is the binary meaning. E.g:

The 1M bandwidth generally described is 1 Mbps = 1,000,000 bps = 1,000,000 / 8 / 1,000 = 125; therefore, the download speed of 1M bandwidth generally does not exceed 125KB/s
. 1000 = 12.5, so the maximum download rate of 100M bandwidth can reach 12.5MB/s
. Of course, the above is only the theoretical rate. In fact, the maximum download rate may not reach that much, and it is mainly affected by various losses, generally 100MB A broadband download rate of 10MB is not bad.

Related Audio Attribute Part 2

The higher the sampling, the more realistic and natural the sound will be.

The frequency recognition range for people is 20 HZ – 20,000 HZ. If 20,000 samples per second can be sampled, it will be enough to satisfy the needs of the human ear during playback. So 22050 The sample rate is commonly used, 44100 is already CD quality, and sampling more than 48000 is no longer meaningful to the human ear. This is similar to a 24 frames per second image from a movie.

Sampling bits
After sampling the audio for a sample, two steps must be performed for the sample:

1. Quantify. The quantization bits commonly used for audio quantization are:

8 bits (that is, 1 byte) can only register 256 numbers, that is, only the amplitude can be divided into 256 levels;

16 bits (ie 2 bytes) can be as small as 65536 numbers, which is already the CD standard;

32 bits (ie 4 bytes) can subdivide the amplitude into 4294967296 levels, which is really unnecessary.

The number of quantization bits is also called the number of sampling bits, bit depth, and resolution, and refers to how many levels the continuous intensity of the sound can be divided after being digitally represented. N-bit means that the intensity of the sound is divided equally into 2^N levels. 16 bits, it is level 65535. This is a very large number and people may not be able to tell the difference in sound intensity from 1/65,535. You can also say that it is the resolution of the sound card. The higher the value, the higher the resolution and the greater the ability to produce sound. The sampling multiple here is primarily addressing the strength characteristics of the signal, and the sampling rate is addressing the time (frequency) characteristics of the signal, which are two different concepts.

2. Binary encoding. That is, the result of the quantization, ie the single channel sample, is stored in a binary keyword. There are two storage methods:

Store the result of the quantization directly in the cast, that is, the two’s complement code;

The result of quantization is stored in floating point type, ie floating point encoding code.

Most PCM sample data formats use integers to store, and for some applications that require high precision, use floating point to represent PCM sample data.

frame
After the audio is quantized to a binary codeword, it must be transformed and the transformation (MDCT) is done in block units, and a block is made up of multiple (120 or 128) samples. A frame will contain one or more blocks. Common frame sizes are 960, 1024, 2048, 4096, etc. A frame records a sound unit whose duration is the product of the sample duration and the number of channels. The nb_samples in the AVFrame structure in FFmpeg represent the number of single channel audio samples in a frame.

Related Audio Attribute

channel, sample rate, sample bits, sample format, bit rate

Sample Rate

The PCM obtained from audio sampling contains three elements: channel, sample rate, and sample rate.

channel
When people hear the sound, they can locate the sound source. By setting the sound source to different positions, a better listening experience can be created. If the position of the audio is adjusted with the image, a better audio-visual experience will be obtained. Effect. Common channels are:

monkey monkey
Two channels, stereo, the most common type, including left and right channels
2.1 channels, adding a bass channel on the basis of two channels
5.1 channels, including one front channel, one front left channel, one front right channel, one surround left channel, one surround right channel, and one bass channel, first used in early theaters
7.1 channel, on the basis of 5.1 channel, the surround left and right channels are divided into surround left and right channels and rear left and right channels, mainly used in BD and modern theaters
Next is a two-channel audio system.

Sampling rate
Audio sampling is the conversion of sound from an analog signal to a digital signal. The sample rate is the number of times the sound is collected per second and is also the number of samples per second of the resulting digital signal. When sampling sound, common sample rates are:

8,000 Hz – telephone sampling rate, sufficient for human speech
11,025 Hz – sample rate for AM radio
22,050 Hz and 24,000 Hz – sample rate for FM radio
32,000 Hz – sampling for miniDV digital camcorder, DAT (LP mode)
44,100 Hz – Audio CD, also commonly used in MPEG-1 audio (VCD, SVCD, MP3) Sample rate 47 250
Hz – Sampling frequency
48,000 Hz for commercial PCM recorders – for miniDV, digital TV, DVD, DAT, movies, and pro audio Sampling rate 50,000 Hz for 2,000 – 96,000 or 192,000 Hz digital sound
for commercial digital sound recorders
– DVD-Audio, some LPCM DVD soundtracks, BD-ROM (Blu-ray Disc) and HD-DVD (High Definition DVD) soundtracks The sample rate used by the audio track
2.8224 MHz: The sample rate used by Direct Stream Digital’s 1-bit sigma-delta modulation process.

Definition of sampling bits, sampling rate and bit rate in audio (transfer) Part 3

1. Why do many professional standards reach 24bit/192KHz?

It is now common to use the 48kHz or 96kHz recording rate in engineering, and only convert to the 44.1kHz CD format during the final mastering process, which reduces distortion caused by multiple sample rate conversions.

In the field of computing, the AC97 specification, which is an audio hardware codec standard, only specifies 48 kHz. This causes nearly all input and output signals to be resampled (the professional term is called sample rate conversion, or SRC). SRC generally causes loss of sound quality, and the simpler (ie poorer) SRC algorithms can cause significant deterioration of sound quality. But this is already a fait accompli.

2. Since 44K is enough, why use 192KHZ to record?

First of all, 20kHz is just the hearing threshold for most people, i.e. the human ear is very insensitive to sounds above 20kHz. Insensitivity to attention does not mean a total inability to perceive. The tones of most musical instruments (especially pianos and strings) are rich in higher harmonics, known in musical terms as higher harmonics. CD audio with a cutoff frequency of 22.05 kHz gives people who are used to listening to real instruments an unnatural feel, especially in the high frequencies, because the Nyquist cutoff frequency distorts the signal from harmonics. of higher frequencies.

Second, digital recordings often require post-processing. Audio processing can introduce more distortion into the signal, including signal distortion, spectral aliasing, and more. If the original signal is only sampled at 44.1 kHz during recording, it must be upsampled before post-processing to expand the sample rate. Since this expansion is “fake”, there is really no more useful original signal, and the quality of the upsampling algorithm will also affect the distortion of the original recording signal, so this approach is undesirable. Therefore, it is common practice to sample at a higher frequency.

In today’s fully professional digital recording studios, recording, mixing and mastering are no longer compliant with the CD standard, instead the HD audio standard is preferred. which:

Use 24Bit 48KHz, 24Bit 96KHz, 24Bit 192KHz three specifications to record, of course, 24Bit 48KHz is used by some small recording studios, because their processor resources are limited. And all the big recording studios use 24bit 96KHz and 24bit 192KHz for recording.

So what are the benefits of such a recording specification?

1. Comply with HD audio standard, which is also the main standard in the future. The finished product can be directly applied to HDCD, DVD-Audio, Blu-ray disc, digital music download business and digital player business to media.

2. Fully take care of the digital video and video business, and the multi-channel film and video will adopt the HD audio specification. Including the use of portable mobile digital video equipment.

3. Fully take care of the consumer audio playback business, such as: Intel HD-Audio audio standard, AC97 audio codec, MP3 / mp4 / phone / game console portable audio highest quality audio playback.

Currently, the highest quality standard in the professional recording industry is: 24 bits deeper than a specific point, 192000 Hz sampling rate, referred to as “24 bits/192 KHz”. Of course, this standard will continue to improve in the future, and it is also possible to move towards 32Bit 384KHz.

In fact, the (genuine) products sold in the current CD market are usually HDCD discs at the lowest level, when you buy discs, you find that they are basically HDCD logos, that is, a CD contains two audio tracks: Normal CD track and HDCD track. The CD track records a 16-bit signal at 44.1 KHz (this is the compatible content on this disc, considering early CD players), and the HDCD track records a 24-bit signal at 96 KHz ( this is the main content of the disc). Ordinary CD players can only play CD audio track signals, and HDCD audio tracks require an HDCD player to play (in fact, most DVD players today can play HDCDs, and modern computers work even better).

Definition of sampling bits, sampling rate and bit rate in audio (transfer) Part 2

Bitrate values compared to real audio:

16 Kbps = phone quality
24 Kbps = increase phone quality, shortwave transmission, longwave transmission, European standard medium wave transmission
40 Kbps = American standard medium wave transmission
56Kbps=Voice
64 Kbps = boost voice (best bitrate setting for cell phone ringtones, best setting for cell phone mono MP3 players)
112 Kbps = FM stereo FM transmission
128 Kbps = tape (best setting for a mobile phone stereo MP3 player, best setting for a low-end MP3 player)
160 Kbps = HIFI high fidelity (best setting for mid to high end MP3 players)
192Kbps=CD (best setting for high-end MP3 players)
256Kbps=Studio Music Studio (for music enthusiasts)
In fact, with the advancement of technology, the bitrate is also getting higher and higher, the maximum bitrate of MP3 is 320Kbps, but some formats can reach higher bitrates and superior sound quality.
For example, the emerging APE audio format can provide true audiophile lossless sound quality and smaller volume than WAV format, and its bit rate is usually 550kbps—–950kbps.
Common coding patterns:

Dynamic bit rate VBR (Variable Bitrate), ie there is no fixed bit rate. The compression software immediately determines which bitrate to use based on the audio data during compression. This is a method that takes into account the quality of the file. and file size The recommended encoding mode;
ABR (Average Bit Rate) Average Bit Rate is an interpolation parameter of VBR. LAME created this encoding mode in response to the low file volume ratio of CBR and the variable size of files generated by VBR. Within the specified file size, ABR takes every 50 frames (about 1 second for 30 frames) as a segment. A relatively low flow rate is used for low frequency and insensitive frequencies, and a high flow rate is used for high frequencies and high dynamic performance. It can be used as VBR and CBR, a compromise option.
CBR (constant bit rate), constant bit rate, means that the file has a bit rate from start to finish. Compared to VBR and ABR, the compressed file size is very large and the sound quality will not improve significantly compared to VBR and ABR.
In simple terms:

In a nutshell, sample rate and bit rate are like horizontal and vertical coordinates on the coordinate axis.

The sampling rate on the abscissa represents the number of samples per second.

The bit rate on the ordinate represents the precision when quantizing analog quantities with digital quantities.

The sample rate is similar to the number of frames of moving images. For example, the sampling rate of movies is 24 Hz, the sampling rate of PAL format is 25 Hz, and the sampling rate of NTSC format is 30 Hz. When we play back the still images sampled at the same rate as the sampling frequency, we see a continuous image. In the same way, when a CD recorded at a sampling rate of 44.1 kHz is played back at the same rate, a continuous sound can be heard. Obviously, the higher the sample rate, the more coherent the sound will be heard and the picture will be seen. Of course, the sampling rate that human auditory and visual organs can distinguish is limited, which is basically higher than sound sampled at 44.1 kHz, and most people haven’t noticed the difference.

The number of digits in the sound is equivalent to the number of colors on the screen, indicating the amount of data per sample. Of course, the larger the amount of data, the more accurate the playback sound, so as not to confuse the sound. of the teapot with the train whistle. In the same way, it is more clear and precise for the image, so as not to confuse blood and ketchup. However, limited by the function of human organs, 16-bit sound and 24-bit image are basically the limits of ordinary humans, and the highest digits can only be distinguished by instruments. For example, the phone has 7-bit sound sampled at 3 kHz and the CD has 16-bit sound sampled at 44.1 kHz, so the CD is clearer than the phone.

Definition of sampling bits, sampling rate and bit rate in audio (transfer)

Number of samples (sample size):

The number of sampling bits can be understood as the resolution of the sound processed by the capture card. The higher the value, the higher the resolution and the more realistic the sound recorded and played back. The first thing we need to know: sound files on the computer are represented by the numbers 0 and 1. So the essence of recording on the computer is to convert the analog sound signal into a digital signal. On the contrary, during playback, the digital signal is restored to an analog sound signal output. The capture card bit refers to the binary digits of the digital sound signal used by the capture card when capturing and playing sound files. The bits on the capture card objectively reflect the accuracy of the digital sound signal’s description of the input sound signal. 8 bits represent the eighth power of 2–256 and 16 bits represent the sixteenth power of 2–64K. For comparison, for the same musical data, a 16-bit sound card can divide it into 64,000 precision units for processing, while an 8-bit sound card can only process 256 precision units, resulting in a large loss of signal. sampling effect is naturally incomparable.

It is usually said in the market, 16bit/24bit/32bit. The higher the value, the better the sound.

Sampling rate:

Sample rate (also called sample rate or sample rate) defines the number of samples per second taken from a continuous signal to form a discrete signal, and is expressed in hertz (Hz). The inverse of the sample rate is called the sample period or sample time, which is the time interval between samples. The sampling theorem states that the sampling frequency must be greater than twice the bandwidth of the sampled signal. Another equivalent statement is that the Nyquist frequency must be greater than the bandwidth of the sampled signal.

If the signal bandwidth is 100 Hz, the sample rate must be greater than 200 Hz to avoid aliasing.

In other words, the sampling frequency must be at least twice the frequency of the largest frequency component of the signal; otherwise the original signal cannot be recovered from the signal samples. Oversampling refers to the sampling rate that exceeds twice the bandwidth of the signal, so that the poorly performing analog anti-aliasing filter can be replaced with a digital filter.

Bit rate:

Bitrate refers to the sampling rate at which digital sound is converted from analog to digital format. The higher the sampling rate, the better the quality of the restored sound. As a benchmark for the efficiency of digital music compression, bit rate indicates the rate of the number of bits bps (bit per second, bits per second) transmitted per unit of time (1 second). Kbps (in layman’s terms is 1000 bits per second) is usually used as the unit. The digital music bitrate on CD is 1411.2 kbps (ie, to record 1 second of CD music, 1411.2 × 1024 bits of data are required). time unit (1 second) The amount of data (BIT) is large, which means the sound quality of the music file is good. However, when the BITRATE is high, the file size increases, which will occupy a large amount of memory capacity. they are 32-256 Kbps. Of course, the wider the rate, the better, but 320 Kbps is the highest level at the moment.