High-resolution physical and psychoacoustic analysis of digital sound

High-resolution physical and psychoacoustic analysis of digital sound

Sample Rate

“The crux of the question is:” Why constantly increase the sample rate in modern audio communication systems (spending huge amounts of money) if the thresholds of the auditory system are limited in frequency to the 20 Hz range. 20 kHz? ”

Sample Rate

The analysis of the accumulated knowledge on this subject allows us to say that this is not enough. Given the complexity of the audio signal and the properties of the auditory system, it can be argued that only an increase in the resolution of the transmission systems in all areas (temporal, spectral, spatial and dynamic) can help solve this problem. At least now it seems clear that high resolution in the time domain is the most important for sound transparency.

As you know, to convert an analog (continuous) signal into a digital (discrete) signal, you need to perform the following operations: sampling, quantization, and encoding (Figure 1). For its implementation in all digital devices (computers, recorders, players, etc.), an ADC analog-to-digital converter (ADC) is used, the block diagram of which is shown in Figure 2. According to Kotelnikov’s theorem (Nyquist) or the “sampling theorem”, to convert an analog signal with the higher frequency f? (Hz) in digital without loss of information, it is necessary that the sampling frequency, that is, the number of samples (samples per second), is not less than 2 x f? (Hz). The digital word used, the number of binary digits in which it is equal to the number of M (bits) selected, represents the instantaneous value of the input signal,

Therefore, the sampling theorem requires that the sampling frequency be chosen high enough fd> 2fb, while the signal must remain almost constant at the time of sampling. The obligation to use a low pass filter is not specified, which is installed in all ADCs, but to avoid the appearance of excessive frequencies in the spectrum in all digital devices, there is an anti-aliasing filter that cuts the signal in the frequency fd / 2.

The recording of signals in any system begins with a microphone (Figure 3), which is a band-pass filter, which already has certain phase and transient distortions, leading to dispersion and blurring of the signal in the domain of the weather. Data on these distortions are rarely given in microphone catalogs, however, a large body of studies carried out in recent years has made it possible to establish a significant difference in these parameters between dynamic and condenser microphones. For condenser microphones, attack values ​​of several microseconds were obtained, while the decay of transient processes reaches several hundred microseconds. The importance of the phase linearity of the microphones not only inside,

Then the analog signal, which is being converted to digital, is processed by a low-pass filter at the ADC (anti-aliasing filter) input. This filter also causes dispersion of the impulse characteristics of the input signal due to uneven frequency response and phase response in the pass band, the slope of the decay curves in the transition band, and the phase non-linearity.

Such distortions lead to time spreading of the input signal and mean that each instantaneous sample at the output will contain information elements from previous samples (the number of which depends on the characteristics of the filter). Since the musical signal is a rapidly changing current with short, sharp pulses, such scattering and blurring have a certain effect on auditory perception, especially for the experienced and attentive listener with a good musical ear.

Acoustic musical signals have a non-stationary ultra-fast dynamic and temporal structure, which is due to various reasons, in particular, a rapid attack on real musical instruments, the presence of a large number of ultrasonic components in the spectrum of many instruments, the appearance of short reverberation time delays in a room, etc.

Recording an actual reverb process without losing data is also extremely difficult. When a sound source emits a complex non-stationary musical signal, each microphone, installed at different points in the room, “picks up” the complex echo. Furthermore, additional incoming signals, altered in amplitude and phase due to reflections from various surfaces, lead to an exponential increase in the total energy level entering the microphone. When the signal is turned off, there is a drop in the overall level, which is usually characterized by the reverberation time.

Some details of the sample rate

For many years it was thought that the sample rate or sampling frequency did not decisively influence the final quality of the digital audio; There are currently several engineers who record in 44.1K or 48K without really knowing why they do it. With the advent of new and better computers, interfaces, ports and protocols, 88.2K, 96K and up to 192K entered the discussion table on the best sample rate to use. It has always been the subject of discussion between engineers and audiophiles; some argued that they did hear the difference between different sample rates and others that did not, and the topic has been subjected to millions of A / B tests with very high quality equipment, causing all kinds of opinions found and uncompromising, fights and friendships of years broken

samplerate

While this is a basic issue of digital audio, it is always surrounded by a halo of mystery, mysticism and magic (like every sound theme), which is well worth clarifying.

 What is the sample rate?

This topic, although it occurs in the first or second class of digital audio, is not always understood correctly. In scholastic thinking, sample rate is defined as the amount of audio samples transported and taken per second. Since this is a unit of measurement over a second and with events that occur cyclically, the Hertz (1 / Frequency) is used as a unit. Obviously we cannot talk about this subject without referring to the Nyquist sampling theorem, which was tested by Shannon almost twenty years after its publication and in which it is stated that for a signal of limited bandwidth (B) (for example, a vibraphone reaches 14.917Hz), the sampling frequency must be twice its bandwidth (2 * B). Then, taking the previous example, we can say that: 2 * B → 2 * 14.917Hz → The sampling frequency for 14.917Hz should be 29.834Hz. This would be equivalent to 29,834 samples per second (1/29, 834) to be able to regenerate the signal of a vibraphone without error. Hence, it is taken that the highest frequency that human beings listen to is 20kHz and if we apply Nyquist it should be 40kHz, but it takes 44.1kHz to meet the demanding ears and for a matter of multiples.

44.1K or 48K to 88.2K or 96K, the correct division

At the dawn of the digital audio era, Nyquist was used to use the sampling resolution of 44.1K, used at that time audio CD format that played at 16bit / 44.1kHz. With the advent of DVD and Blu Ray as video and audio formats, resolutions such as 24Bits / 48K or 24Bits / 96kHz began to be used. Although for many years there were recordings that were made in 24Bits / 88.2kHz or 24Bits / 96kHz, at a certain time of mastering, before sending it to the disk duplicator, the audio suffered a mutilation that reduced it to 16Bits / 44.1kHz as It was ordered by the CD format. This process should be carried out with equipment specially designed for this function and in stages so that the audio did not suffer a very noticeable cut and the bad conversion was evidenced. Although the old and dear Dither was applied since then to compensate for this process (something like “grain” in the cinema. Watch a film without “grain” and it will look like HD even though it was filmed in 1980 on tape and goes to notice until the makeup of the actor and the assembly of the special effects, something otherwise disagreeable).

Generally, to prevent the audio from mutilating or applying several conversions that degrade it, it was decided at what resolution to record before pressing the REC button (we will not mention those that come down directly with your DAW from 24Bits / 96kHz to 16Bits / 44.1kHz in one step to export the audio … there is a place reserved especially for them in hell). If the audio was going to end on CD, a 88.2kHz sample rate was generally applied, since at the time of mastering, with the symmetric re-sampling at “half”, it was 44.1kHz.

Sounds better?

The subjective point of this is that we expect recordings to “sound” better at a higher sample rate. The reality is that if we record in high sample rates, with very good sampling, our sound will not “sound better”, but will be more detailed. Obviously, if our sound source is bad, our microphones and preamps too and so on, no matter how much we record at 192K, the result will not be the best. Now, if we use a good sound source, good audio chain and a good converter, everything will be obviously good. But don’t confuse; We are talking about detail here, not if it will sound more “warm,” “fat,” or “full-bodied.” This translates into a more homogeneous capture of the entire frequency spectrum, both audible and non-audible.

sample rate

CPU, disk and plug-ins

Obviously, having a higher sample rate means that our processor must do more calculations, since it has to process more samples (or audio samples). Depending on the amount of plug-ins that we use before a multitrack in high resolution, our use of both DSP and native processors (the computer equipment), will increase significantly, making it very difficult or impossible to work. There are several options to overcome this problem, from buying more processor or DSP, using fewer processes or external equipment (hybrid mixing), to borrowing a machine. The only option that should never go through our minds is to lower the resolution of the audio, process and upload it again. The serious problem that comes with this is a cut in the audio, which is not reversible and what is limited and trimmed, so it stays.

Another aspect to consider is that the storage speed must be in accordance with the audio resolution we use. Suppose we want to record at 24Bits / 96kHz; The transfer rate would be: 2304kbits / second. Now, calculating the amount of tracks, we should use a disc that really reaches us in speed for this transfer rate (topic to be developed in another article).

In these times, storage size is not a problem, but speed is. Having three terabyte disk drives are generally used for 5400 rpm dish disks; the least that should be used if they are not solid state disks, would be 7200 rpm plate disc drives. Obviously, with 5400 rpm discs, we would have a third reduction in the final transfer speed and reading and writing possibilities called “iops” (in out per second or in and out per second), which have a certain number, depending on the disk, capacity and arrangement of the same (RAID) which, depending on how much we demand in the resolution of the audio, amount of channels, processing (plug-ins) and expected latency (if we record with real-time monitoring), we will surely face some problems like “clicks” and / or “pops” in our audio.

Clock

The importance of using a good clock (or clock) and being in sync with all the elements that belong to our audio chain is vital. Recall that a few articles ago we have exposed this topic in detail, but it should be reinforced in this article. Several ADC and DAC converters of economic interfaces do not perform sampling and quantization in the correct or expected manner; External clocks or protocols such as Dante help the synchronization between several devices to be correct and improve the audio quality. Much of the final quality of our work in audio is in this part of the process and it is important that if we take our work and passion seriously, we begin to pay attention to these kinds of details that are generally overlooked.