How digital compression works. Part 3


Free Download Mp4Gain
picture

How digital compression works. Part 3

DIGITAL COMPRESSION

In most cases, there is another pin, Master Clock (MCLK or MCK), which is used to synchronize the transmitter and receiver from the same clock to reduce the transmission error rate.

DIGITAL COMPRESSION

For the external synchronization of the MCLK, two clock generators are used: with a frequency of 22 579 kHz and 24 576 kHz. The first, 22,579 kHz, is for frequencies that are multiples of 44.1 kHz (88.2, 176.4, 352.8 kHz), and the second, 24,576 kHz, is for frequencies that are multiples of 48 kHz (96, 192, 384 kHz). There may also be generators at 45,158.4 kHz and 49,152 kHz; You’ve probably already noticed how in the digital sound world they like to multiply everything by two.

Frame or I2S frame
Frame or I2S frame
In I2S, three contacts are necessarily used: SCK, WS, SD; the rest of the contacts are optional.

Synchronization pulses are transmitted through the SCK channel, under which the frames are synchronized.

The length of the “word” is transmitted over the WS channel and logical states are also used. If the WS pin is a logical unit, then the right channel data is transmitted, if it is zero, the left channel data is transmitted.

The data bits are transmitted via SD: the values ​​of the amplitude of the audio signal during quantization, the same 16, 24 or 32 bits. No checksums or service channels are provided on the I2S bus. If data is lost in transit, there is no way to get it back.

Expensive DACs often have external connectors to connect to the I2S. The use of such connectors and cables can have a bad effect on the sound, even the appearance of “artifacts” and stuttering, everything will depend on the quality and length of the cable. Still, I2S is a hard-wired connector and the length of the wires from the transmitter to the receiver should tend to zero.

Let’s see how the PCM data stream is transmitted through the I2S bus. For example, when transmitting PCM 44.1 kHz at 16 bits, the length of the word on the SD channel will be these sixteen bits and the length of the frame will be 32 bits (right + left). But most of the time, the transmitters use a 24-bit word length.

When playing PCM 44.1×16, the most significant bits are simply ignored as they are filled with zeros or, in the case of older multi-bit DACs, they can go to the next frame. The length of the “word” (WS) may also depend on the player through which the music is played, as well as the driver for the playback device.

An alternative to PCM and I2S would be to record the audio signal in DSD. This format was developed in parallel with PCM, although Kotelnikov’s theorem also played a role here. To improve sound quality compared to CDDA, the emphasis was not on increasing the quantization bit, as in the DVD Audio format, but on increasing the sample rate.

DSD
DSD stands for Direct Stream Digital. It originates from Sony and Philips labs, however, just like the other formats discussed in this article.

SACD
DSD first saw the light of day on Super Audio CDs in 2002.

At the time, SACD seemed like a masterpiece of engineering, it applied a completely new way of recording and playback, very close to analog devices. The implementation was simple and elegant at the same time.

The media was even equipped with copy protection, although without it, no pirate was afraid. Under the Sony and Philips brands, they began to produce “closed” devices exclusively for playback, with no possibility of copying discs. Manufacturers sold recording equipment to studios, but kept control over the SACD launch.

Who knows, perhaps the SACD format could gain popularity comparable to Audio CD, if it weren’t for the cost of the playback devices. By unreasonably selling out player prices, Sony and Philips’ own leaders hampered the popularity of their format. And the next mistake completely put an end to the sale of specialized devices. To promote Sony’s PlayStation, Sony engineers have added the ability to listen to SACD on it. Hackers immediately hacked the set-top box and began copying SACD discs into ISO images that can be burned to a regular DVD and played on any competing player; others simply ripped out tracks to play on a computer.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

How digital compression works. Part 2

How digital compression works. Part 2

digital compression

The next after CDDA in 1987 appeared the DAT format – Digital Audio Tape.

digital compression

The sample rate was 48 kHz, the quantization bit did not change. And although the format failed, the 48 kHz sample rate took hold in recording studios, as they say, due to the convenience of digital processing.

In 1999, the DVD-Audio format was released, which made it possible to record on a disc six stereo tracks with a sampling frequency of 96 kHz and a 24-bit bit depth, or two stereo tracks with a frequency of 192 kHz, 24 bits.

In the same year, the SACD – Super Audio CD format was introduced, but the discs began to be produced only three years later. I will tell you more about this format in the DSD section.

These are the main formats that are considered the standard for digital audio recordings on media. Now let’s see how data is transmitted on a digital audio path.

The structure of the digital audio path.
When playing music, something like the following happens: the player, using a codec created in the form of a device or program, decompresses the file into a specific format (FLAC, MP3 and others) or reads data from a CD, DVD-Audio or disc SACD, receiving a standard PCM data stream … This stream is then transferred via USB, LAN, S / PDIF, PCI, etc., to the I2S converter. In turn, the converter converts the received data into so-called I2S data interface frames (not to be confused with I2C!)

I2S
I2S is a digital audio transmission serial bus. Now I2S is a standard for connecting a signal source (computer, turntable) to a digital-to-analog converter. It is through it that the vast majority of the DAC connects directly or indirectly. There are other digital audio transmission standards, but they are much less common.

I2S output (input) on PCB
I2S output (input) on PCB
Other articles in this issue:
Xakep # 256. Fight Linux
Broadcast content
Subscription to “Hacker”
The I2S bus can consist of three, four, or even five pins:

continuous serial clock (SCK) – bit sync clock (can be called BCK or BCLK);
word selection (WS) – frame sync clock (may be called LRCK or FSYNC);
Serial data (SD): transmitted data signal (can be called DATA, SDOUT, or SDATA). As a general rule, data is transmitted from a transmitter to a receiver, but there are devices that can act as a receiver and transmitter at the same time. In this case, another contact may be present;
Serial data in (SDIN): On this pin, data moves in the receive direction, not the transmit direction.
SD or SDOUT is used to connect a D / A converter, and SDIN is used to connect an A / D converter to the I2S bus.

How digital compression works.

How digital compression works.

Digital Compression

Have you ever wondered how sound is reproduced on digital devices?

Digital Compression

How is a sound signal formed from a combination of ones and zeros? I’m sure I was thinking, since I started reading! But often, even professionals only have a general idea of ​​the modern sound route. In this article, you will learn how the different formats appeared, what a digital-to-analog converter is, what types of DACs exist, and what determines the quality of sound reproduction.

PCM
As you know, in digital audio, almost any format, with rare exceptions, is recorded using a pulse code stream or a PCM stream – pulse code modulation. FLAC, MP3, WAV, Audio CD, DVD-Audio and other formats are just ways to package, “preserve” a PCM stream.

How it all began
The theoretical foundations of digital sound transmission were developed at the dawn of the 20th century, when scientists tried to transmit an audio signal over a long distance, but not by telephone, but in a rather strange way for that time.

By dividing the sound wave into small parts, it could be sent to the receiver in some kind of mathematical representation. The recipient, in turn, could restore the original waveform and listen to the recording. In addition, scientists were faced with the task of increasing the bandwidth of the “ether”.

In 1933, the theorem of V.A. Kotelnikov. In Western sources, it is called the Nyquist-Shannon theorem. Yes, Harry Nyquist was the first to raise this issue: in 1927 he calculated the minimum sampling frequency to transmit a waveform, which later got his name “Nyquist frequency”, but Kotelnikov’s theorem was published 16 years ago before.

The essence of the theorem is simple: a continuous signal can be represented as an interpolation series consisting of discrete reports, from which the signal can be reconstructed. In order to roughly restore the original state of the signal, the sampling frequency must be at least twice the upper cutoff frequency of this signal.

For many years, the theorem was not in demand, until the advent of the digital age. It was then that it found a use. In particular, the theorem was useful when developing the CDDA (Compact Disc Digital Audio) format, in common people it is called Audio CD or Red Book. The format was released by engineers at Philips and Sony in 1980 and became the standard for audio CDs.

Format characteristics:

sampling frequency – 44.1 kHz;
quantization capacity – 16 bits.

INFO
The sampling rate is the number of signal samples taken during your sampling. Measured in Hertz.
Quantization bit: the number of binary bits that express the amplitude of the signal. Measured in bits.
The 44.1 kHz sampling frequency was calculated from Kotelnikov’s theorem. It is believed that the hearing of the average person cannot pick up sound beyond 19-22 kHz. The frequency was probably 22 kHz and was chosen as the upper limit.

22,000 × 2 = 44,000 + 100 = 44,100 Hertz

Where does 100 Hertz come from? There is a version that this is a small margin in case of errors or oversampling. In fact, Sony chose this frequency for its compatibility with the PAL transmission standard.

The bit depth of the CDDA format is 16 bits, or 65,536 samples, which equates to a dynamic range of approximately 96 dB. Such a large number of samples were not chosen by chance. Firstly, due to the strong influence of quantization noise, and secondly, to provide a formal dynamic range superior to that of the main competitors at the time – cassette records and vinyl records. I’ll cover this in more detail in the section on digital to analog converters.

The development of PCM continued on the principle of multiplying by two. Other sample rates appeared: first, the 48 kHz sample rate was added, and then the frequencies based on it were 96, 192, and 384 kHz. The 44.1 kHz frequency was also doubled to 88.2, 176.4, and 352.8 kHz. Bit depth increased from 16 to 24 and then to 32 bits.

Audio encoding: secrets revealed

Audio encoding: secrets revealed

Digital Audio

Audio settings for video capture and transmission.

Digital Audio

As people directly related to the AV sphere, we constantly talk about audio coding and audio codecs, but what is it? An audio codec is essentially a device or algorithm that can encode and decode a digital audio signal.

In practice, the audio waves that travel through the air are continuous analog signals. The signals are converted to digital form by a device called an analog-to-digital converter (ADC), and the reverse converter is called a digital-to-analog converter (DAC). The codec lies between these two functions and it is he who allows you to adjust some important parameters for the successful capture, recording and transmission of an audio signal: the codec algorithm, the sampling frequency, the bit width and the speed of the audio signal. data.

The three most popular audio codecs are Pulse-Code Modulation (PCM), MP3, and Advanced Audio Coding (AAC). The choice of codec determines the compression rate and the recording quality. PCM is a codec used by computers, CDs, digital phones, and sometimes SACD. The PCM signal source is sampled at regular intervals, and each sample is the digital amplitude of the analog signal. PCM is the simplest option for digitizing an analog signal.

With the correct parameters, this digitized signal can be fully converted to analog without any loss. But this codec, which provides almost complete identity with the original audio, is unfortunately not very cheap, which results in large files, and these files are not suitable for streaming. We recommend using PCM to record digital images for your sources or when doing audio post-processing.

Fortunately, we always have the option of choosing a different codec that can compress digital data (versus PCM) based on some helpful observations on the behavior of sound waves. But in this case, you have to make a compromise: all alternative algorithms are associated with “losses”, since it is impossible to completely restore the original signal, but nevertheless the result is still so good that most users will not be able to to catch the difference.

MP3 is an audio encoding format that uses a digital data compression algorithm that allows you to save the audio signal in smaller files. The MP3 codec is the most used by users to record and store music files. We recommend using MP3 to stream audio content as it requires less network bandwidth.

AAC is a newer audio encoding algorithm that is the successor to MP3. AAC has become the standard for MPEG-2 and MPEG-4 formats. In fact, this is also a digital data compression codec, but with less quality loss than MP3 when encoded with the same bit rate. We recommend using this codec for online streaming.

Sampling frequency (kHz, kHz)
Sample rate (or sample rate): the frequency with which the signal is digitized, stored, processed or converted from analog to digital. Time sampling means that the signal is represented by several of its samples (samples) taken at regular intervals.

Measured in hertz (Hz, Hz) or kilohertz (kHz, kHz,) 1 kHz equals 1000 Hz. For example, 44,100 samples per second can be labeled 44,100 Hz or 44.1 kHz. The selected sample rate will determine the maximum playback frequency and, as follows from Kotelnikov’s theorem, to fully restore the original signal, the sample rate must be twice the highest frequency in the signal spectrum.

As you know, the human ear is capable of picking up frequencies between 20 Hz and 20 kHz. Given these parameters and the values ​​shown in the following table, you can understand why 44.1 kHz was chosen as the sampling frequency for CD and is still considered a very good frequency for recording.

What are the problems with digital audio?

What are the problems with digital audio?

digital audio

As with many areas of technology, there is no single standard for digital audio.

DIGITAL AUDIO

It can be presented in various standards: AES / EBU 110 Ohm, AES-ID3 75 Ohm, S / PDIF 75 Ohm, Optical Toslink, among others. The sampling frequency can be from 32 kHz to 192 kHz with different bit depths. To work with all the variety of standards in a serious studio, you need to have an interface unit, better a digital audio converter or a sample rate converter.

What are the problems with digital video?
Digital video (SDI) is similar in some respects to analog video. In it, the quality of the cables and connectors is also important for normal operation, the loss of high frequencies of the signal in them also affects the quality of the signal. Due to many factors that affect the analog signal, fluctuations can appear in digital systems, at a certain level of which there is a complete blockage of the image (clipping effect *). A little lost in digital video can have far more serious consequences than a pixel lost in analog. When working with digital video, restoration of signal quality (equalization of the frequency spectrum and restoration of clock frequency) is often required. The format (“language”) of a digital signal is very important for its correct transmission, since the transmission protocols are very specific.
Level incompatibility is a rare problem in analog technology. Digital signals, however, can have different and incompatible levels: TTL, ECL or others. Another problem with digital signals is the adaptation of the load capacity of the digital inputs and outputs, which must also be addressed.

What is the easiest way to input a digital video signal into a computer?
The easiest and cheapest way is to use a DV video source and a Firewire® card on your computer (or the built-in interface on many modern computers). The entry procedure is simple and fast. For analog video, you can use an analog video capture card or an external analog video to DV converter connected to the Firewire® card.

Why do I sometimes have difficulties with the DV format?
The digital video format that uses a DV or mini-DV cassette and Firewire® technology has a very high bit rate, which limits the length of the connecting cable. Attempting to use long cables will cause many bit stream problems, such as clipping effect * when the image is completely lost. Another problem is a consequence of two-way communication between devices connected via Firewire® and manifests itself when trying to randomly connect multiple DV devices.

What is a device for embedding (extracting) digital audio into an SDI signal?
The total digital stream of digital serial video can include multiple channels of digital audio. An SDI embedder is used to insert digital audio into an SDI signal, and an SDI embedder is used to extract digital audio from a mixed stream.

What is bitrate?

What is bitrate?

Bitrate

Bitrate

Bitrate

Bit rate: the number of bits of information used to store or transfer one second of data transmission: video and / or audio recordings, including compressed ones.

Bit rate is expressed in bits per second (bit / s, bps), as well as derived values: kilo (kbps, kbps), mega (Mbps, Mbps), etc.

For streaming video and audio formats (such as MPEG and MP3) that use lossy compression]], the bit rate expresses the degree of compression of the stream. Most of the time, the video and audio bit rate is measured in megabits per second.

Increasing the bitrate provides a significant increase in video recording quality, which is especially noticeable when shooting dynamic scenes and small details.

Encoding modes
There are three compression modes for data transmission:

CBR (constant bit rate): with constant bit rate;
VBR (variable bit rate): with variable bit rate;
ABR (Average Bit Rate): with an average bit rate.

Constant bit rate
Constant Bit Rate, CBR – A variant of streaming data encoding, in which the required bit rate is initially set, which does not change throughout the file.

Its main advantage is the ability to predict the size of the final file fairly accurately.

However, the constant bitrate option is not very suitable for video or audio content, the dynamics of which change over time, as it does not provide an optimal size / quality ratio.

Variable bit rate
With a variable bit rate, the VBR codec selects the value of the bit rate based on the parameters (the level of the desired quality), and during the encoded segment, the bit rate may change.

This method provides the best quality / size ratio for the output file, but its exact size turns out to be very unpredictable. Depending on the nature of the sound (or image, in the case of video encoding), the size of the resulting file may differ several times.

Average bit rate
Average bit rate, ABR is a hybrid of constant and variable bit rates: the value in Mbps is set by the user and the program varies it within certain limits. However, unlike VBR, the codec is careful to use the maximum and minimum possible values, without risking going beyond the average specified by the user. This method allows the most flexible setting of the processing speed and with much higher precision (compared to VBR) in predicting the output file size.

What is digital audio?

What is digital audio?

Digital Audio

Digital sound is nothing more than a combination of numbers.

DIGITAL AUDIO

With a certain algorithm, sound, such as air pressure, is converted into data streams and encoded for further processing and playback. Depending on the algorithm used, the music file has one format or another, one or another extension.

Remember that along with digital sound, there is analog sound, which is represented by a continuous electrical signal that reflects the change in the sound wave. The analog to digital sound conversion is a setting of the numerical value of the amplitude at a given time with a given density of values. Consequently, the more values ​​that are recorded, the more reliable and accurate the image of the digitized sound fragment is recreated. With such digitization, very voluminous data matrices emerge that, depending on the format used, differ in the sound quality / volume ratio of the final file.

Perhaps the main advantage of digital audio over analog is the ability to store and copy data indefinitely without losing the original quality (whereas when copying from one analog medium to another, a decrease in recording quality is quite noticeable).

The most widespread and popular digital audio format today is MP3 (MPEG Layer 3). It was developed, after a series of intermediate formats and investigations, started in 1987, by the Fraunhofer Institute in Germany.

The developers of the format were faced with the task of simplifying and reducing the cost of shipping long musical fragments. As you know, one minute of a stereo signal from a CD (16 bit, 44.1 kHz sample rate) takes up about ten megabytes of memory. At the same time, unlike text or graphic files, the audio signal cannot be compressed without loss of quality. Thus, modem transmission of an uncompressed composition from an audio CD lasting 3 minutes at a data transfer rate of, say, 24 kbps will take several hours. Scientists at the Fraunhofer Institute managed to achieve multiple file size compression: on average, one minute of a compressed audio signal in MP3 format takes about 1 megabyte. The principle of compression is based on the removal of “unnecessary” sounds from the music file, to which the human ear is immune, or which duplicate each other.

The main factor that determines the relationship between file size and sound quality within a given format is the bit rate. Bit rate is an indicator of how much information a second of sound encodes. The higher it is, the less distortion and the closer the encoded composition is to the original. The most common on the Internet are compositions with 128 and 192 Kbps bitrates. The maximum bitrate supported by programs and devices that work with MP3 is 320 Kbps. In practice, only an expert or a professional who works with sound can notice the differences between an MP3 file with a 320 bit rate.

To optimize the size of MP3 music files while maintaining decent quality, a variable bit rate (abbreviation VBR – variable bit rate) is used. In this case, the encoding program divides the file into fragments of different spectral saturation and encodes them with a suitable bit rate. Most modern MP3 players support variable bit rate playback. A significant advantage of MP3 files is that they can contain the name of the artist, the name of the track and the album, the year of its release, etc. The set of this data is called ID3 tags. Most modern gamers can read and display them on the screen.

In 2001, Swedish Coding Technologies and Thomson Multimedia developed the MP3 Pro codec. It is MP3-based and as a result is fully MP3 backward compatible and only partially forward compatible. It uses SBR (Spectral Band Replication) technology, so the codec provides good quality at low bit rates. However, the encoding quality at medium to high bit rates is inferior to that of most other codecs. For this reason, this format is mainly used for broadcasts on the Internet and demonstrations of fragments of new musical compositions.

Another type of MP3 was the development of MP3 Surround, recently introduced by the creators of MP3: the Fraunhofer Institute. This format repeats all the characteristics of multi-channel sound, while still being compatible with standard stereo MP3: information describing the spatial characteristics of the sound is recorded on an additional track. By playing files of this format on special equipment capable of reading this track, you can obtain surround sound that conforms to the Surround 5.1 standard.

The beginning of the digital age

The beginning of the digital age

digital audio

binary code

digital audio

Although digital audio is the standard of music these days …

It has not always been this way.

Music originally existed only in the form of sound waves.

Then, with the development of technology, ways were discovered to convert it to other formats, such as:

Musical notation
electrical signals in cables
radio waves in the atmosphere
request on vinyl record
But more recently, in the age of computers, digital audio has become the main recording format, making it easy to copy and transfer songs.

The device that made this possible is called … digital converter.

Also, on how it works …

2. Digital converters
In recording studios, digital converters exist in 2 versions:

as a standalone device in top studios or …
as part of an audio interface in home studios.
To make binary code out of sound, they take tens of thousands of images (samples) per second to build a rough image of an analog wave.

This image is not entirely accurate, because in the moments between samples, the converter has to guess what is happening.

digital wave

As seen in the graphic above:

the red line shows an analog signal and …
black line shows conversion …
The results are not ideal, but sufficient to produce excellent sound quality.

And the difference depends mainly on …

3. Sampling rate
Take a look at this image:

sampling rate circuit

As can be seen …

By capturing more images per second, higher sampling rates:

Collect more real information,
Use less guesswork,
Creates a cleaner display from an analog signal
And in the end, you get the best sound quality.

Now let’s talk about specific numbers:

Standard sample rates in professional audio:

44.1 kHz (CD)
48 kHz
88.2 kHz
96 kHz
192 kHz
44.1 kHz is the minimum sample rate due to a mathematical principle known as …

Kotelnikov’s theorem (Nyquist-Shannon)
To accurately record digital audio, converters must capture the full spectrum of human hearing between 20 Hz and 20 kHz.

According to Kotelnikov’s theorem …

Capturing a specific frequency requires at least 2 samples per cycle … to measure both the high and low points of a wave.

This means that a sample rate of 40 kHz or more is required to record frequencies up to 20 kHz. Therefore, the sampling frequency of CDs is slightly higher, 44.1 kHz.

Kotelnikov’s theorem

Cons of a high sample rate
Although the higher the sample rate, the higher the sound quality … but this just doesn’t happen.

The cons are:

Requires a lot of computing power
Less clues
Large audio files
So this is a constant search for a compromise. Professional studios find it easier to deal with high sample rates because they have the best equipment.

However, for most home studios, the standard 48 kHz sample rate is appropriate.

How does encoding work in digital audio? Part 5

How does encoding work in digital audio? Part 5

encoding digital audio

DSD offers significant advantages over PCM:

encoding digital audio

more precisely draw a wave;
increased immunity to noise;
an easier way to change and transmit a digital stream;
In theory, it is possible to reduce cost by simplifying DAC circuits, but due to backward compatibility, manufacturers are unlikely to do so.
Originally, SACDs used the DSD x64 format with a sample rate of 2822.4 kHz. The 44.1 kHz audio CD sample rate was taken as the basis, increased 64 times, hence the name x64. The following DSDs are currently in use:

x64 = 2822.4 kHz;
x128 = 5644.8 kHz;
x256 = 11,289.6 kHz;
x512 = 22,579.2 kHz;
declared DSD x1024.

DXD
There is a certain intermediate format between PCM and DSD called DXD – Digital eXtreme Definition. This is, in fact, high definition PCM: 352.8 kHz or 384 kHz with 24 or 32 bit quantization. It is used in studies for the processing and subsequent mixing of materials.

But this approach is flawed: firstly, it does not allow to use all the advantages of DSD, and secondly, the file size is larger than in DSD. At the moment, flagship DACs on the I2S input accept a PCM data stream with a sample rate of up to 768 kHz and a bit depth of up to 32 bits. It’s scary to even consider how much hard drive space an album will take up at this resolution.

DSD has practically separated from SACD. Now, the DSD format can often be found packaged in files with the DSF and DFF extensions. Many turntables have been released with the ability to record in DSF and DFF, lovers of good sound are increasingly digitizing vinyl records in the DSD format. But in recording studios, nobody wants to invest in unpopular formats, so they continue to rivet the sound with a minimum wage: 44.1 × 16.

DSD switching and data transmission
To transfer a digital transmission to DSD, a three-pin connection scheme is used:

DSD Clock Pin (DCLK) – sync;
Data input pin DSD Lch (DSDL) – left channel data;
Data input pin DSD Rch (DSDR): Right channel data.

Unlike I2S, DSD data transmission is extremely simplified. DCLK sets the clock rate of the bit sync, and the left and right channel data is transmitted sequentially through the DSDL and DSDR pins, respectively. Here there are no adjustments, recording and playback in DSD is done little by little. This approach provides the closest approximation to the analog signal, and due to the high frequency, the quantization noise is reduced and the reproduction precision is increased by an order of magnitude.

PDO
DoP is often used to carry DSD data streams, so it’s worth mentioning. DoP is an open standard for transferring DSD data over PCM frames (DSD over PCM). The standard was created to transmit a stream through controllers and devices that do not support direct DSD streaming (not native DSD).

The principle of operation is as follows: in a 24-bit PCM frame, the upper 8 bits are padded with ones; this means that DSD data is currently being transmitted. The remaining 16 bits are sequentially filled with DSD data bits.

For x64 DSD transmission with a single bit rate of 2822.4 kHz, a PCM sample rate of 176.4 kHz (176.4 x 16 = 2822.4 kHz) is required. For DSD x128 transmission at 5644.8 kHz, a PCM sampling rate of 352.8 kHz is already required.

How does encoding work in digital audio? Part 4

How does encoding work in digital audio? Part 4

encoding digital audio

When playing PCM 44.1×16, the most significant bits are simply ignored as they are filled with zeros, or, in the case of older multi-bit DACs, they can go to the next frame. The length of the “word” (WS) may also depend on the player through which the music is played, as well as the driver for the playback device.

encoding digital audio

An alternative to PCM and I2S would be to record the audio signal in DSD. This format was developed in parallel with PCM, although Kotelnikov’s theorem had some influence here. To improve sound quality compared to CDDA, the emphasis was not on increasing the quantization bit, as in the DVD Audio format, but on increasing the sample rate.

DSD
DSD stands for Direct Stream Digital. It originates from Sony and Philips labs, however, just like the other formats discussed in this article.

SACD
DSD first saw the light of day on Super Audio CDs in 2002.

At the time, SACD looked like a masterpiece of engineering, applying a completely new way of recording and playback, very close to analog devices. The implementation was simple and elegant.

The media was even equipped with copy protection, although without it, no pirate was afraid. Under the Sony and Philips brands, they began to produce “closed” devices exclusively for playback, with no possibility of copying discs. Manufacturers sold recording equipment to studios, but kept control over the SACD launch.

Who knows, perhaps the SACD format could gain comparable popularity to Audio CD, if it weren’t for the cost of the playback devices. By unreasonably selling out player prices, Sony and Philips’ own leaders stymied the popularity of their format. And the next mistake put an end to the sale of specialized devices. To promote the Sony PlayStation game console, Sony engineers have added the ability to listen to SACD on it. Hackers immediately hacked the set-top box and began to copy SACD discs into ISO images, which can be burned to a regular DVD disc and played on any competing player; others simply ripped out tracks to play on a computer.

Record labels are good too: contrary to what music lovers expected, they did not take full advantage of the new high-definition format. The studios did not record music from the master tape in DSD, instead they took a digital recording in PCM, remixed and processed everything in a row: limiters, compressors, noise-shaping dithering, and various digital filters. The result was a sound so sterile and dry that even CD Audio could have sounded much better. Thus, listeners’ trust in the SACD was undermined, and at the same time in the new formats in general.

INFO
Unfortunately with vinyl records this vicious practice continues to this day: studios print vinyl from a digital recording, even if they have the recording on the master tape. So on modern vinyl it can easily be 44.1 x 16.

DSD
What is DSD? This is a one-bit stream with a very high sample rate compared to PCM. Also, DSD uses a different type of modulation, PDM (Pulse Density Modulation) – pulse density modulation. Sound recording in this format is done by a one-bit analog-to-digital converter, now these ADCs based on sigma-delta modulation are used everywhere. The recording process looks like this: while the amplitude of the wave increases, the output of the ADC is a logical unit, when the amplitude falls, the output is a logical zero, there can be no average value. It is compared with the previous value of the wave amplitude.