Audio recording

Audio recording

Phonograph Thomas Alva Edison

The era of mechanical sound recording began in 1877, when Thomas Alva Edison invented the phonograph.

Gramophon

In fact, gramophones, gramophones, and even modern vinyl players are improved phonographs; after all, the principle of recording sound in a groove located in a spiral on a medium has remained unchanged.

In 1900, the Danish engineer W. Paulsen at the World’s Fair in Paris demonstrated a working model of a magnetic recording apparatus created as an alternative to Edison’s invention. For the first time in human history, a human voice sounded on a magnetic recording: the astonished Parisians heard the voice of the Austro-Hungarian Emperor Franz Joseph breaking the whistle. From this moment, perhaps, the true history of sound recording began, the theory of which was created in the 30s of the 20th century.

Sound is a complex analog signal. For the analysis of such signals a technique widely used in radioelectronics is used. Using the Fourier transform, a complex signal is converted into a harmonic series consisting of sinusoids with different frequencies and amplitudes. But in practice the signal we are dealing with is of course very different from the sinusoidal one.

Musicians call the first harmonic in this spectrum the fundamental tone, and harmonics with higher frequencies are called harmonics. The main tone determines the pitch and the harmonics give it a certain color, creating the timbre of a voice or musical instrument.

To study the spectra of audio signals, complex and expensive instruments are used – spectrum analyzers.

With the help of such devices, it can be established that some musical instruments, such as a violin, have a relatively uniform spectrum and some wind spectra with pronounced maxima and minima, called formants.

There are no terms that directly describe the coloration of the timbre of a human voice or musical instruments, so it is necessary to resort to various metaphors such as “deep timbre”, “hard timbre”, “metallic” sound or even “transistor”.

Attempts to use digital information processing methods in connection with sound recording were made many times, but the first serious results were achieved in the early 1980s of the 20th century, and coincided with the rapid development of computers and the successful microminiaturization of radio components. The use of digital sound processing techniques has opened up exciting new possibilities.

To process sound on a computer, it must first be converted to a digital, encoded format. An analog signal is encoded by devices called analog-to-digital converters (ADCs). The main method of encoding an analog signal is pulse code modulation, which consists of three operations: sampling, quantizing, and encoding.

We won’t go into coding theory now, especially since it’s quite complicated and requires higher math skills. It is important for us to understand that the quality of the digitized sound and the resulting file size depend on the sample rate and bit depth.

The sample rate is the frequency at which the characteristics of an audio signal are measured. It follows from Kotelnikov’s sampling theorem that to obtain an undistorted digital signal, the sampling frequency must be at least twice the highest frequency of the encoded signal. Therefore, when encoding an audio signal, the sample rate must be at least 40 kHz. In digital communication systems, the sampling frequency is 32 kHz, in laser CD players and consumer digital tape recorders – 44.1 kHz. In digital studio equipment, the sample rate is even higher: 48 kHz.

The bit depth of the recorded sound is the number of memory bits that are allocated to record each value of the amplitude of the sound signal at the time of its measurement. Modern sound cards use 8 or 16 bits of memory per dimension, and higher quality 32-bit cards are available. The higher the bit depth, the higher the quality of the digitized sound.

As already mentioned, the size of an audio file depends on the sample rate and bit depth of the sound. So with a sample rate of 44 kHz and a sound depth of 16 bits, one minute of sound requires a file size of 5.3 MB and with a sample rate of 11 kHz and 8 bits, 660 Kb.

It is clear that such a waste of disk space turned out to be unacceptable, and special algorithms and formats were created for cheaper storage of audio files.

What is digital audio?

What is digital audio?

Digital Audio

Digital sound is nothing more than a combination of numbers. With a certain algorithm, sound, such as air pressure, is converted into data streams and encoded for further processing and playback. Depending on the algorithm used, the music file has one format or another, one or another extension.

Analog Vs. Digital Sound

Remember that along with digital sound, there is analog sound, which is represented by a continuous electrical signal that reflects the change in the sound wave. The analog to digital sound conversion is a setting of the numerical value of the amplitude at a given time with a given density of values. Consequently, the more values ​​that are recorded, the more reliable and accurate the image of the digitized sound fragment is recreated. With such digitization, very voluminous data matrices emerge that, depending on the format used, differ in the sound quality / volume ratio of the final file.

Perhaps the main advantage of digital audio over analog is the ability to store and copy data indefinitely without losing the original quality (whereas when copying from one analog medium to another, a decrease in recording quality is quite noticeable).

The most widespread and popular digital audio format today is MP3 (MPEG Layer 3). It was developed, after a series of intermediate formats and investigations, started in 1987, by the Fraunhofer Institute in Germany.

The developers of the format were faced with the task of simplifying and reducing the cost of shipping long musical fragments. As you know, one minute of a stereo signal from a CD (16 bit, 44.1 kHz sample rate) takes up about ten megabytes of memory. At the same time, unlike text or graphic files, the audio signal cannot be compressed without loss of quality. Thus, modem transmission of an uncompressed composition from an audio CD lasting 3 minutes at a data transfer rate of, say, 24 kbps will take several hours. Scientists at the Fraunhofer Institute managed to achieve multiple file size compression: on average, one minute of a compressed audio signal in MP3 format takes about 1 megabyte. The principle of compression is based on the elimination of “unnecessary” sounds from the music file, to which the human ear is immune, or that duplicate each other.

The main factor that determines the relationship between file size and sound quality within a given format is the bit rate. Bit rate is an indicator of how much information a second of sound encodes. The higher it is, the less distortion and the closer the encoded composition is to the original. The most common on the Internet are compositions with 128 and 192 Kbps bitrates. The maximum bitrate supported by programs and devices that work with MP3 is 320 Kbps. In practice, only an expert or a professional who works with sound can notice the differences between an MP3 file with a 320 bit rate.

To optimize the size of MP3 music files while maintaining decent quality, a variable bit rate (abbreviation VBR – variable bit rate) is used. In this case, the encoding program divides the file into fragments of different spectral saturation and encodes them with a suitable bit rate. Most modern MP3 players support variable bit rate playback. A significant advantage of MP3 files is that they can contain the name of the artist, the name of the track and the album, the year of its release, etc. The set of this data is called ID3 tags. Most modern gamers can read and display them on the screen.

In 2001, Swedish Coding Technologies and Thomson Multimedia developed the MP3 Pro codec. It is MP3-based and as a result is fully MP3 backward compatible and only partially forward compatible. It uses SBR (Spectral Band Replication) technology, so the codec provides good quality at low bit rates. However, the encoding quality at medium to high bit rates is inferior to that of most other codecs. For this reason, this format is mainly used for broadcasts on the Internet and demonstrations of fragments of new musical compositions.

Another type of MP3 was the development of MP3 Surround, recently introduced by the creators of MP3: the Fraunhofer Institute. This format repeats all the characteristics of multichannel sound, while still being compatible with standard stereo MP3: information describing the spatial characteristics of the sound is recorded on an additional track. By playing files of this format on special equipment capable of reading this track, you can obtain surround sound that conforms to the Surround 5.1 standard.