What Is Audio Sampling Rate: A Comprehensive Explanation

Free Download Mp4Gain

What Is Audio Sampling Rate: A Comprehensive Explanation

Introduction

Audio sampling rate is a fundamental concept in digital audio that refers to the number of samples per second used to represent an analog audio signal in digital form. In this article, we’ll explore the technical details of audio sampling rate, its importance in digital audio, and its impact on audio quality and file size.

Sampling Rate Fundamentals

The concept of audio sampling rate is based on the Nyquist-Shannon sampling theorem, which states that in order to accurately represent an analog signal in digital form, the sampling rate must be at least twice the highest frequency present in the signal. This means that a signal with a highest frequency of 20kHz (the upper limit of human hearing) must be sampled at a rate of at least 40kHz in order to be accurately represented.

Sampling rate is measured in Hertz (Hz), which refers to the number of samples per second. Common sampling rates in digital audio range from 44.1kHz (used in CDs) to 192kHz (used in some high-resolution audio formats).

Sample Rate Conversion

In some cases, it may be necessary to convert audio from one sampling rate to another. Sample rate conversion involves resampling the audio data to a different rate, which can be done using digital signal processing techniques. However, sample rate conversion can introduce artifacts and reduce audio quality, especially when downsampling from a higher rate to a lower rate.

There are various reasons why sample rate conversion may be necessary, such as when mixing audio tracks with different sampling rates, or when preparing audio for distribution on different platforms with varying requirements.

Audio Quality and Sampling Rate

The sampling rate has a significant impact on audio quality, with higher sampling rates generally resulting in better fidelity and more accurate representation of the original signal. However, the benefits of higher sampling rates are limited by the limitations of human hearing and the practical limitations of digital audio technology.

While there is debate about the benefits of “high-resolution audio” formats with sampling rates above 44.1kHz, it is generally accepted that sampling rates above 96kHz provide little additional benefit in terms of audio quality.

Bit Depth and Sampling Rate

The bit depth of an audio sample refers to the number of bits used to represent the amplitude of the signal at each sample point. Higher bit depths allow for more precise representation of the signal, but also result in larger file sizes. The bit depth and sampling rate are related, as increasing the bit depth requires more data to be stored for each sample.

There is a trade-off between sampling rate and bit depth, as higher sampling rates require more data to be stored per second, which can limit the maximum bit depth that can be used without exceeding practical file size limits. However, this trade-off can be mitigated by using efficient audio compression techniques.

Sample Rate in Practice

Common sampling rates in digital audio include 44.1kHz (used in CDs), 48kHz (used in digital video), 88.2kHz, 96kHz, 176.4kHz, and 192kHz. Streaming services such as Spotify and Apple Music typically use lower sampling rates for their audio streams, with 44.1kHz being a common choice.

The Nyquist Theorem, named after the Swedish-American physicist Harry Nyquist, states that the sampling rate should be at least twice the highest frequency component in the signal being sampled. This is why the standard CD quality sampling rate is 44.1 kHz, which is just above the upper limit of human hearing.

However, it is important to note that there are higher sampling rates available, such as 48 kHz, 96 kHz, and even 192 kHz. These higher sampling rates can provide more detail and accuracy in the digital representation of the analog signal. However, they also require more storage space and processing power.

Another important factor to consider is the bit depth, which is the number of bits used to represent each sample. The more bits used, the more accurate and detailed the representation of the analog signal. CD quality uses a bit depth of 16 bits, but higher bit depths such as 24 bits are also available.

It is worth noting that some argue that higher sampling rates and bit depths may not necessarily result in audible improvements in sound quality, especially when considering the limitations of human hearing. Additionally, some argue that the increased storage and processing requirements may not be worth the potential improvements.

In conclusion, the sampling rate is a crucial component in the digital representation of analog audio signals. A higher sampling rate can provide more detail and accuracy in the digital representation, but also requires more storage and processing power. The Nyquist Theorem provides a guideline for choosing the appropriate sampling rate based on the highest frequency component in the signal. Additionally, the bit depth is another factor to consider in the accuracy and detail of the digital representation. While higher sampling rates and bit depths are available, the potential improvements in sound quality must be balanced against the increased storage and processing requirements.

Free Download Mp4Gain

Mp4Gain Main Window

Mp4Gain Features

Free Download Mp4Gain

What does the quality of an mp3 depend on? high resolution mp3

Factors influencing hearing quality

High quality

Lately, very high quality audios have been promoted… are they really convenient?

We could say that if we strictly base ourselves on technical aspects, they could be considered of higher quality.

For example, they get to use sample rates of more than double the highest currently used.

The same happens with the bit rate, they use numbers that until now were not used at all.

Pewro first we must ask ourselves if the equipment we use to read them (the computer, a cell phone, an mp4 player) are capable of handling these qualities and if the speakers or headphones are also enabled and built to do the same.

Otherwise we will end up paying a lot for this super audio and effectively get the same.

It is worth additionally thinking about whether our ears could differentiate between one and the other.

To what extent our ear perceives the difference between 4800 and 96000 as a sample rate.

What we must avoid is falling victim to the “numbers”, which will show us that in theory they will sound better, but avoid touching reality – for example the human ear or the quality of our speakers – and therefore the theory ends up being misleading.

What does the quality of an mp3 depend on?

Factors influencing hearing quality

digital sound

We must consider that we are talking about digital sound.
The audio as we hear it from Daria is an analog audio. This means that it is a continuum, there are no partitions, cuts, chunks, etc.

On the other hand, digital audio is made up of thousands of points that make up a curve, but the curve is not continuous but is made up of a series of points.

Of course, the more points that curve has, the smoother the curve is and the more similar it is to the initial analog audio.

When the CD was developed, the conclusion was reached to make 44100 shots per second, so that the curve was smooth enough and could contain the sounds in the range that the human ear can perceive them.

Because there are sounds that are too high-pitched that we cannot hear them and also others that are so serious that we cannot perceive them.

It is even known that as the years go by, a person can perceive very high-pitched sounds less, unlike adolescents who perceive such high-pitched sounds better.

So the first factor to take into account will be to have 44,100 or 48,000 samples per second, in order to have a smooth curve, with high quality.

Recordings with less than that sample rate are not of high quality. Sample rate is called the number of samples taken per second to delineate the sound curve well.

So you take a naudio file and make sure it has a sample rate of at least 44100 or 48000 frames per second to know it’s CD quality.

There are higher samplerates, for example, 96000 but we will talk about it later.

Mp4Gain is a software that manages these parameters perfectly. If you really want high quality sound, Mp4Gain is the right tool for you.

Bit Depth and Sample Rate PART 2

Fade processing

Bit Depth and Sample Rate

We now know that digital signal processing is bound to be very buggy. So the approximation of the total will also have a lot of error. These errors not only render the audio unrecoverable, but also introduce an unnatural sound.

To remove these artifacts, we add computed low-amplitude noise to the signal, which we call dithering. The amplitude of the jitter noise is very low, and although some is still heard, it is better than no addition.

Note that jitter noise accumulates. When you add noise to a signal, the signal-to-noise ratio decreases. If the operation is repeated, this ratio will continue to decrease, adding uncertainty to the signal. This is why dithering is often applied as the last step in mastering, and only once.

Dithering has quite an interesting history:

The first dither processing appeared during World War II. Bombers use mechanical computers for navigation and ballistic calculations. Interestingly, these computers are more precise in their processing performance in the air. Engineers realized that vibrations from the plane reduced errors in moving parts. His movements become more continuous, rather than sudden vibrations. Computers have little vibrating motors, and their vibrations are called oscillation, which is derived from the medieval English word “didderen,” meaning “to shake.” Modern dictionaries define dither as a state of high tension, confusion, or anxiety. Dithering brings digital systems closer to analog systems in some way.

– Ken Pohlmann, Digital Audio Rules

Sampling rate
According to theory, the sampling rate of 44.1 K per second is sufficient to cover the hearing range of the human ear. You may have inadvertently learned about Nyquist’s theorem, which states how to avoid aliasing (a type of distortion) and how to reconstruct all frequencies by sampling, which requires sampling at twice the highest frequency of the signal (this theorem also applies to non-audio media, we won’t go into that here).

The human ear has a hearing range of up to 20kHz (most studies show that this number is actually around 17K), so a sample rate of 40K is enough to hear every frequency clearly. 44.1K is the industry standard, which was determined by SONY, which was an oligopoly at the time, for a few reasons.

In a nutshell, the digital audio samples must be above the Nyquist frequency because, in practice, the samples are low-pass filtered during the digital-to-analog conversion process to prevent aliasing. The smoother the slope of the low pass filter, the lower the manufacturing cost. So an audio signal that normally uses a low pass filter will have a smooth slope at 2 kHz. For example, to keep the full spectrum below 20kHz, it should be done at a 44kHz sample rate (20K[highest frequency]+2K[low pass filter slope]x2[Nyquist theory]=44K)

Ultimately, the 44.1K standard was resolved in a battle between Sony and Philips (both had similar end goals). This is also based on the math behind audio sample rate and videotape anatomy. In this way, audio and video can coexist on the same video tape, which has a higher cost performance. However, 48K is the standard for video related to audio. CD audio remains at 44.1K.

Bit depth and sample rate

The first thing to understand is that bit depth and sample rate only exist in digital audio.

In digital audio, bit depth describes amplitude (vertical axis) and sample rate describes frequency (horizontal axis). So increasing the number of bits we use increases the resolution of the sound’s amplitude, and increasing the number of samples per second increases the resolution of the sound’s frequency.

In an analog system (the natural world), the audio is continuous and smooth. In digital systems, smooth analog waveforms can only be roughly sampled and limited to a certain amplitude range. When sampling a sound, the audio is divided into small segments (samples) that are fixed at an amplitude level. The process of correcting a signal to a certain amplitude level is called quantization, and the process of creating a sample segment is called sampling.

In the graph below, a natural sine wave is displayed for up to 1 s, starting from 0 and ending at 1 s. The blue bars represent approximations of the digital quantization of the sine wave, and each bar is a sample, clipped to the approximate available amplitude level. (Of course, the graph is more incomplete than reality).

Depending on the choice made during recording, an audio of 1 s duration can have samples of 44.1K, 48K and, in the case of 24 bits, contains an amplitude level of -144 dB at 0 dB (- 96dB to 0dB for 16bit). The dynamic range resolution (the number of amplitude level units that can be used for a sample, ie the number of rectangles displayed) is 65536 at 16 bits and 16777216 at 24 bits.

Therefore, increasing bit depth can greatly improve amplitude resolution and dynamic range. So where does the increase in dynamic range appear? Since the amplitude cannot exceed 0dB, the added dB is distributed to samples with smaller amplitudes. So one can hear more small sounds (such as a reverb track stretching at -130dB) that would cut off at 16 bits, -96dB.

round and discard

In digital audio, each sample is analyzed, processed, converted to audio, and then played through speakers. When a sample is processed in your DAW (gain, distortion, etc.), they go through basic multiply and divide operations that allow you to change the digital representation of the sample. Very simply, if we don’t do the rounding process (the 1dB gain must be multiplied by 1.122018454), even 8 or 4 bits of sample precision will exceed the 24-bit space.

So since we only have 24 bits, these long numbers need to fit in this space. To do this, the DSP rounds or discards the least significant bit (LSB, the last digit in the number of bits, for example, the 16th digit in a 16-bit sample). Rounding is fairly straightforward and uses algorithms that you are familiar with. Discard discards the information after the least significant bit without analysis.

Both processes have certain errors, they will introduce errors into the equation, these errors accumulate through signal chain processing and are eventually reflected. On the plus side, the LSB is the digital bit with the smallest amplitude, so the error occurs at -96dB for 16-bit samples and -144dB for 24-bit samples. At the same time, the different structures and methods of digital signal processors will also lead to different results.

What is the difference between 128k and 320k music? Part 3

The sampling frequency is approximately the following depending on the type of use (k is the thousand bit symbol, 1khz=1000hz):

8khz – used for phones etc, is enough to record human voices.

22.05khz: transmission use frequency.

44.1kb: Audio CD.

48khz: used in DVD and digital TV.

96khz-192khz: used for DVD-Audio, Blu-ray HD, etc.

The common range of sample precision is 8 bits to 32 bits, with 16 bits generally used on CD.

Having said that, my friends are starting to get confused. It’s not the bitrate that determines the sound quality, so why is everyone saying that 320kb sound quality is better than 128kb?

【Audio Compression】

Well, in fact, the bit rate should be said to be another dimension, it is a compression of audio files.

Nowadays, most of the audio formats we use regularly are based on the original “WAV” file of the audio CD (44.1khz sample rate, 16bit sample precision, 2ch). The original recorded sound data is stored in a matrix, which is in PCM format, while WAV format is an encoding format developed by Microsoft. Its function is to reproduce the data in PCM format through encoding.

Since the data in WAV basically completely restores the PCM data, MP3, AAC and other lossless encoding formats are basically recompressed based on the WAV files. Therefore, we can simply think that WAV is the original audio format and other audio formats are compressed formats.

When it comes to compression, storage and transmission are inseparable. The purpose of compression is to improve storage and transmission. Therefore, before we talk about compression, we need to understand the basic units of computers.

We all know that the computer is a binary number system, and the files stored by the computer are made up of two numbers, 0 and 1. Therefore, the computer’s transmission is based on each number, and each number is called 1 ” bit”. For example, for an audio piece, its basic data is “0,1,1,1,0,1, 1 ,0”, and when transmitting, these numbers are transmitted one by one. The sampling precision mentioned above is this unit.

The storage unit of the computer is “byte (Byte)”. In the computer, 1 byte consists of 8 bits, that is, 8b(bit)=1B(Byte). In computer parlance, data storage is expressed in decimal and data transmission is expressed in binary, so 1KB=1024B=1024×8b. This is also part of the reason why the hard drive capacity we see does not match the actual capacity.

Go back and talk about audio compression, the bitrate of the audio is actually the compression ratio. So the bitrate really just defines the size of the file, but because under normal conditions the larger the file, the less data you lose, so the sound quality is relatively higher. However, the bit rate itself does not directly affect the quality of the file. For example, if we take a 128kb file as the source file, even if it is converted to a 320kb file, the sound quality will not be better than 128kb. .

What is the difference between 128k and 320k music? Part 2

Bit rate, sample rate, lossless, MP3, FLAC, APE, 320kb, 192kb, 128kb, 44.1khz, CBR, VBR. Does this bunch of various names make you both familiar and unknown?

The higher the bitrate, the better the sound quality. Lossless music is the highest sound quality, right? So, let’s start with the sound collection.

【Audio composition】

Nowadays, when we talk about audio, everything is digital audio. Digital audio consists of three parts: sample rate, sample precision, and number of sound channels.

Sample Rate: Both the sample rate, which refers to the number of samples per second when recording the sound, expressed in Hertz (Hz).

Sampling Precision: Refers to the dynamic range of the recorded sound, measured in bits (Bit).

Sound channel: the number of channels (1-8).

In simple terms, we can think of a sound wave as a curve. We know that the curve is made up of points, and the sampling frequency is the number of points in the middle of the length per second (the horizontal axis of the figure above). Sampling precision is the number of points in the dynamic range (upper vertical axis). The finer the positioning of these two dimensions, the greater the true sound restoration and the better the sound quality. Of course, the larger the audio file will be. The customer mentioned by the previous colleague said that the latest Hi-Res Audio format released by SONY is a 6-channel 192kHz/24-bit recorded audio file. The size of the lossless format, of course, will be more than 200 megabytes.

What is the difference between 128k and 320k music?

I can’t fully understand music in words.show all

【Preface】

Some time ago, a colleague came across a very troubled client. The mess was said to have been caused by the client asking him to provide song files larger than 100MB-200MB in size. And my colleagues don’t know much about audio formats, so they started endlessly fumbling about FLAC, WAV and audio size. In the end, the colleague did not clearly explain to the customer what was going on.

After that, some other things happened that made me feel that in the music industry there are too many practitioners around me who have an extremely poor understanding of music and even lack some basic knowledge related to music. I don’t even have the idea to understand, which makes me very sad. It seems that music has only one merchandise attribute, and our practitioners only need to organize the shelves, encode various merchandise, and use the big data of users’ purchase records to recommend merchandise to users, no matter why to users. they like this. features that these products have, and use cold data to provide users with various services.

Therefore, I think it is necessary to write something. I don’t expect practitioners to become people who really love music. I just hope that even if you still think of “her” as a commodity, you can first figure out what you’re selling. and what is..

PS: The content of the first lesson is about media files. Since the relevant content involves a lot of technical issues, it seems a bit boring, but if you read it carefully, you will find that it is actually very easy to understand, but this basic knowledge can be very helpful.Improve your skill well. Also expect more interesting content about records, musical styles, etc. which I will post soon.

Related Audio Attribute Part 3

How samples are combined

This is mainly for two-channel or multi-channel audio. For a two-channel audio, it can be combined in the following two ways:

interleaved Taking stereo as an example, a stereo audio sample is obtained by interleaving the storage of two mono samples.
flat. The samples of each channel are stored separately.

The data after FFmpeg audio decoding is stored in the AVFrame structure.

In packed format, frame.data[0] or frame.extended_data[0] contains all the audio data.
In Planar format, frame.data[i] or frame.extended_data[i] represents the data of the i-th channel (assuming channel 0 is the first), the size of the AVFrame.data array is set to 8, if If the number of channels exceeds 8, you should get the channel data from frame.extended_data.

sample format
The sample formats in FFmpeg are mainly:

copy code
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = – 1 ,
AV_SAMPLE_FMT_U8, /// < 8 bits unsigned
AV_SAMPLE_FMT_S16, /// < 16 bits
signed AV_SAMPLE_FMT_S32, /// < 32 bits
signed AV_SAMPLE_FMT_FLT, /// < float
AV_SAMPLE_FMT_DBL, /// < double

AV_SAMPLE_FMT_U8P, /// < 8 bits unsigned, flat
AV_SAMPLE_FMT_S16P, /// < 16 bits signed, flat
AV_SAMPLE_FMT_S32P, /// < 32 bits signed, flat
AV_SAMPLE_FMT_FLTP, /// < float, flat
AV_SAMPLE_FMT_DBLP, /// < double, flat
AV_SAMPLE_FMT_S64, /// < 64 bits
signed AV_SAMPLE_FMT_S64P, /// < 64 bits signed, plain

AV_SAMPLE_FMT_NB /// < Number of sample formats DO NOT USE if dynamically linked
};
copy code
to illustrate:

1. U8 (8-bit unsigned integer), S16 (16-bit integer), S32 (32-bit integer), FLT (single-precision floating-point type), DBL (double-precision floating-point type), S64 (64-bit integer), those not ending with P are interleaved structures, and those ending with P are flat structures.
2. Flat mode is FFmpeg’s internal storage mode, and the audio files we use are in packed mode.
3. The FFmpeg audio sample format that decodes different output audio formats is not the same. The test found that the data output by AAC decoding is in floating point AV_SAMPLE_FMT_FLTP format, and the data output by MP3 decoding is in AV_SAMPLE_FMT_S16P format (the mp3 file used is 16-bit deep). For the specific sample format, you can see the format member in the decoded AVFrame or the sample_fmt member in the AVCodecContext of the decoder.

Bit rate
The transfer rate per second (bit rate, also called bitrate). Like 705.6kbps or 705600bps, where b is a bit, ps is per second (per second), which means a capacity of 705600bit per second. Compressed audio files are often represented at double speed, for example CD quality MP3 is 128kbps/44100HZ. Note that the unit here is bit instead of byte. One byte is equal to 8 bits (bits). The bit is the smallest unit. It is generally used to describe network speed and various communication speeds. The byte is used to calculate the size. hard drive and memory.

Mbps is: Millionbit per second (millions of bits per second);
Kbps is: Kilobit per second (kilobit per second);
bps is: bit per second (bit per second), the corresponding conversion ratio is:

1Millionbit=1000Kilobit=1000000bit; 1Mbps = 1000,000bps; Again, this is the unit of speed, which refers to the number of bits transmitted per second. The unit of measure for data transmission speed K is the decimal meaning, but the K for data storage is the binary meaning. E.g:

The 1M bandwidth generally described is 1 Mbps = 1,000,000 bps = 1,000,000 / 8 / 1,000 = 125; therefore, the download speed of 1M bandwidth generally does not exceed 125KB/s
. 1000 = 12.5, so the maximum download rate of 100M bandwidth can reach 12.5MB/s
. Of course, the above is only the theoretical rate. In fact, the maximum download rate may not reach that much, and it is mainly affected by various losses, generally 100MB A broadband download rate of 10MB is not bad.

Related Audio Attribute Part 2

The higher the sampling, the more realistic and natural the sound will be.

The frequency recognition range for people is 20 HZ – 20,000 HZ. If 20,000 samples per second can be sampled, it will be enough to satisfy the needs of the human ear during playback. So 22050 The sample rate is commonly used, 44100 is already CD quality, and sampling more than 48000 is no longer meaningful to the human ear. This is similar to a 24 frames per second image from a movie.

Sampling bits
After sampling the audio for a sample, two steps must be performed for the sample:

1. Quantify. The quantization bits commonly used for audio quantization are:

8 bits (that is, 1 byte) can only register 256 numbers, that is, only the amplitude can be divided into 256 levels;

16 bits (ie 2 bytes) can be as small as 65536 numbers, which is already the CD standard;

32 bits (ie 4 bytes) can subdivide the amplitude into 4294967296 levels, which is really unnecessary.

The number of quantization bits is also called the number of sampling bits, bit depth, and resolution, and refers to how many levels the continuous intensity of the sound can be divided after being digitally represented. N-bit means that the intensity of the sound is divided equally into 2^N levels. 16 bits, it is level 65535. This is a very large number and people may not be able to tell the difference in sound intensity from 1/65,535. You can also say that it is the resolution of the sound card. The higher the value, the higher the resolution and the greater the ability to produce sound. The sampling multiple here is primarily addressing the strength characteristics of the signal, and the sampling rate is addressing the time (frequency) characteristics of the signal, which are two different concepts.

2. Binary encoding. That is, the result of the quantization, ie the single channel sample, is stored in a binary keyword. There are two storage methods:

Store the result of the quantization directly in the cast, that is, the two’s complement code;

The result of quantization is stored in floating point type, ie floating point encoding code.

Most PCM sample data formats use integers to store, and for some applications that require high precision, use floating point to represent PCM sample data.

frame
After the audio is quantized to a binary codeword, it must be transformed and the transformation (MDCT) is done in block units, and a block is made up of multiple (120 or 128) samples. A frame will contain one or more blocks. Common frame sizes are 960, 1024, 2048, 4096, etc. A frame records a sound unit whose duration is the product of the sample duration and the number of channels. The nb_samples in the AVFrame structure in FFmpeg represent the number of single channel audio samples in a frame.