Bits, Hertz, Shaped Dithering … Part 3


Free Download Mp4Gain
picture

Bits, Hertz, Shaped Dithering … Part 3

bits

What is behind these concepts?

BITS

For transmission of sound as is, it would be nice to keep the entire perceived range from 10 Hz to 20 kHz. In theory, there is absolutely no problem with low frequencies in digital recording (but there are problems with transmitting these frequencies through electrical circuits and reproducing them through small stereo speakers or headphones). So, at the output of the sound cards there is usually a power amplifier, which feeds the signal to the stereo speakers. This inexpensive board amplifier, together with the feedback circuit, as well as the parasitic capacitances, forms a low pass filter that “dumps the bass.”

With high frequencies, things are a bit worse, at least definitely more complicated. Most of the essence of the DAC and ADC enhancements and complications is aimed precisely at more reliable transmission of high frequencies. “High” means frequencies comparable to the sampling frequency, that is, in the case of 44.1 kHz, it is 7 to 10 kHz and more.

Imagine a 14 kHz sinusoidal signal digitized at a 44.1 kHz sample rate. There are about three points (samples) for one period of the input sinusoid, and to restore the original frequency as a sinusoid, you need to show some imagination. The sample waveform restoration process also occurs in the DAC, this is done using the restoration filter. And if the relatively low frequencies are almost pre-cast sinusoids, then the shape and consequently the quality of the reconstruction of the high frequencies is completely dependent on the conscience of the DAC restoration system. Therefore, the closer the signal frequency is to one-half the sampling frequency, the more difficult it is to reconstruct the shape of the signal.

This is the main problem when it comes to reproducing high frequencies. However, the problem is not as serious as it might seem. All modern DACs use resampling (multi-rate) technology, which involves restoring digitally to a sample rate several times higher and then converting it to an analog signal at an increased rate. Thus, the problem of restoring high frequencies shifts to the shoulders of digital filters, which can be of very high quality. So high quality that in the case of expensive devices the problem is completely eliminated: distortion-free reproduction of frequencies up to 19-20 kHz is provided. Resampling is also used in inexpensive devices, so this problem can be considered solved in principle. Devices in the region of $ 30- $ 60 (sound cards) or stereos up to $ 600, generally similar in DAC to these sound cards, perfectly reproduce frequencies up to 10 kHz, tolerably up to 14-15, and somewhat way the rest. This is sufficient for most real music applications, and if someone needs more quality, they will find it in professional quality devices, which are not much more expensive, they are simply made with the mind.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Bits, Hertz, Shaped Dithering … Part 2

Bits, Hertz, Shaped Dithering … Part 2

bits

What is behind these concepts?

Bits

In theory, this is the only criterion for choosing the scanning resolution. We no longer contribute absolutely without distortions or inaccuracies. The practice, oddly enough, almost completely repeats the theory. This is what guided those people who chose 16-bit resolution for audio CDs. Noise of minus 93 decibels is a pretty good condition, which corresponds almost exactly to the conditions of our perception: the difference between the pain threshold (140 decibels) and the usual background noise in the city (30-50 decibels) is of about a hundred decibels, and if we consider that the painful volume level, no music is heard, which further reduces the range, it turns out that the actual noise from the room or even from the equipment is much louder than the noise from quantification. If we can hear a level below minus 90 decibels in a digital recording, we will hear and perceive quantization noises; otherwise we will simply never determine whether it is live or digital audio. There is simply no other difference in terms of dynamic range. But, in principle, a person can hear significantly in the 120 decibel range, and it would be nice to keep this full range, which apparently 16 bits cannot support.

But this is only at first glance: using a special technique called shape dithering, it is possible to change the frequency spectrum of the sampling noise, bringing them almost completely into the region of more than 7-15 kHz. In a way, we changed the resolution of the frequency (we refused to reproduce quiet high frequencies) to get additional dynamic range in the remaining frequency segment. In combination with the peculiarities of our hearing, our sensitivity to the ejected high-frequency region is tens of dB lower than in the main region (2-4 kHz), this makes possible a relatively quiet transmission of useful signals by 10-20 additional dB quieter than -93 dB; therefore, the dynamic range of human 16-bit audio is approximately 110 decibels. And in general, at the same time, a person simply cannot hear sounds 110 decibels lower than the loud sound that he just heard. The ear, like the eye, adjusts to the volume of the surrounding reality, therefore the simultaneous range of our hearing is relatively small, around 80 decibels. Let’s talk more about dithring after discussing the frequency aspects.

For CD, the sampling frequency is 44100 Hz. There is an opinion (based on a misunderstanding of the Kotelnikov-Nyquist theorem) that all frequencies are reproduced up to 22.05 kHz, but this is not entirely true. We can only say that there are no frequencies above 22.05 kHz in the digitized signal. The actual image of digitized sound reproduction always depends on the specific technique and is not always as ideal as we would like, and as befits the theory. It all depends on the specific DAC (digital to analog converter responsible for receiving an audio signal from a digital stream).

Let’s first find out what we would like to achieve. A middle-aged (quite young) person can feel sounds from 10 Hz to 20 kHz, hear significantly – from 30 Hz to 16 kHz. The loudest and lowest sounds are heard, but are not acoustic sensations. Sounds above 16 kHz are felt as an annoying and unpleasant factor: pressure on the head, pain, especially loud sounds, cause such acute discomfort that one wants to leave the room. The unpleasant sensations are so strong that the action of the security devices is based on this: a few minutes of very loud high-frequency sound will drive anyone crazy and it becomes absolutely impossible to steal anything in such an environment. Sounds below 30 – 40 Hz with sufficient amplitude are perceived as vibrations emanating from objects (speakers). It would be more correct to say, just vibration.

Bits, hertz, shaped dithering …

Bits, hertz, shaped dithering …

bits

What is behind these concepts?

bits

When developing the standard for CD Audio (CD Audio), 44 kHz, 16-bit, and 2-channel (ie stereo) settings were adopted. Why exactly so many? What is the reason for this choice, and also why are attempts being made to increase these values ​​to, say, 96 kHz and 24 or even 32 bits …

Let’s first find out with the sampling resolution, that is, with the bitness. You happen to have to choose between the numbers 16, 24 and 32. The middle values ​​would of course be more convenient in terms of sound, but too unpleasant for use in digital technology (a very controversial statement, since that many ADCs have 11 or 12 bit digital output (status approx.).

What is this parameter responsible for? Simply put, for dynamic range. The volume range played simultaneously is from the maximum amplitude (0 decibels) to the lowest that the resolution can transmit, for example, approximately minus 93 decibels for 16-bit audio. Interestingly, this is strongly related to the noise level of the soundtrack. In principle, for 16-bit audio, it is quite possible to transmit signals with a power of -120 dB, however, these signals will be difficult to apply in practice due to such a fundamental concept as sampling noise …. The fact is that when taking digital values, we always make mistakes, rounding the true analog value to the closest possible digital value. The smallest possible error is zero, but at most we are wrong with half of the last bit (bit, hereinafter, the term least significant bit will be abbreviated MB). This error gives us the so-called sampling noise, a random discrepancy between the digitized signal and the original. This noise is constant and has a maximum amplitude equal to half of the least significant bit. This can be considered as random values ​​mixed in the digital signal. This is sometimes called rounding noise or quantization noise (which is a more accurate name since encoding the amplitude is called quantization, and sampling is the process of converting a continuous signal into a discrete sequence (pulses) – approx . comp.).

Let’s dwell in more detail on what is meant by signal power, measured in bits. The strongest signal in digital audio processing is generally taken as 0 dB, this corresponds to all bits set to 1. If the most significant bit (hereinafter SB) is set to zero, the resulting digital value will be half, which corresponds to a loss level of 6 decibels (10 * log (2) = 6). Therefore, by zeroing the most significant bits to the least significant, we will decrease the signal level by six decibels. It is clear that the minimum signal level (one in the least significant bit and all other digits are zeros) is (N-1) * 6 dB, where N is the digit capacity of the sample. For 16 digits, the weakest signal level is 90 decibels.

When we say “half the least significant bit”, we do not mean -90/2, but half a step to the next bit, that is, another 3 decibels less, minus 93 decibels.

We return to the choice of scanning resolution. As already mentioned, digitization introduces noise at the level of the middle of the least significant bit, which means that a 16-bit digitized record constantly makes noise at minus 93 decibels. It can transmit signals and is quieter, but the noise is still -93 dB. On this basis, the dynamic range of digital sound is determined: where the signal-to-noise ratio is transformed into noise / signal (there is more noise than the useful signal), the edge of this range is at the bottom. Therefore, the main criterion for digitizing is the amount of noise. Can we afford a recovered signal? The answer to this question depends in part on how much noise was on the original track. An important conclusion: if we digitize something with a noise level of less than 80 decibels, there is absolutely no reason to digitize it to more than 16 bits, since, for one thing, the noise of -93 dB adds very little to the existing one. Huge (comparatively) noise of -80 dB and, on the other hand, quieter than -80 dB on the phonogram itself, the noise / signal already starts, and there is simply no need to digitize and transmit said signal.