
Audio Digitization.
Sound is a continuous wave that propagates through air or other media, formed by
pressure differences, so that it can be detected by measuring the pressure level in a
point. Sound waves have the proper and measurable characteristics of waves in general,
such as reflection, refraction and diffraction. As it is a continuous wave, a
digitization process to represent it as a series of numbers. Currently, most of
the operations carried out on sound signals are digital, since both storage and
processing and transmission of the signal in digital form offers very significant advantages over
analog methods. Digital technology is more advanced and offers greater possibilities, less
sensitivity to transmission noise and ability to include error protection codes,
as well as encryption. With the appropriate decoding mechanisms, moreover, they can be treated
simultaneously signals of different types transmitted on the same channel. The disadvantage
main aspect of the digital signal is that it requires a much greater bandwidth than that of the signal
analog, hence an exhaustive study is carried out regarding data compression,
some of whose techniques will be the center of our study.
The digitization process consists of two phases: sampling and quantization. In the sampling,
Divide the time axis into discrete segments: the sampling frequency will be the inverse of time
that mediates between one measurement and the next. At this time the quantization is performed, which, in its
In the simplest way, it is simply to measure the signal value in amplitude and save it.
Nyquist’s theorem guarantees that the frequency necessary to sample a signal that has its
Higher components at a given frequency f is at least 2f. Therefore, the range being
higher than human hearing around 20 Khz., the frequency that guarantees a sampling
suitable for any audible sound will be about 40 Khz. Specifically, to get sound
High-quality frequencies of 44.1 Khz are used, in the case of CD, for example, and up to 48 Khz.
in the case of the DAT. Other typical values are submultiples of the first, 22 and 11 Khz. According to
nature of the application of course the appropriate frequencies can be much lower
such that the voice process is usually carried out at a frequency of between 6 and 20 Khz. or
even less. Regarding quantization, it is evident that the more bits used for the
axis division of amplitude, the “finer” the partition will be and therefore the less error in attributing
a concrete amplitude to the sound at every moment. For example, 8 bits offer 256 levels of
quantization and 16, 65536. The dynamic range of human hearing is about 100 dB. The
axis division can be performed at equal intervals or according to a certain density function,
looking for more resolution in certain sections if the signal in question has more components in a certain
intensity zone, as we will see in the coding techniques.
The complete process is usually called PCM (Pulse Code Modulation) and so we
We will refer to it hereinafter. It has been described in a very simplistic way, mainly
because it is widely discussed and is well known, being the field of study of
this work. However, we will go into detail at any time that is necessary for the
development of the exhibition.
1.2 Coding and Compression.
Before describing compression and encoding systems, we must pause briefly.
analysis of human auditory perception, to understand why a quantity
Significant information that the PCM provides can be discarded. The heart of the matter,
as far as we are concerned, it is based on a phenomenon known as masking.
The human ear perceives a frequency range between 20 Hz. And 20 Khz. First of all, the
sensitivity is higher in the area around 2-4 Khz., so that the sound is more
hardly audible the closer to the ends of the scale. Second is the
masking, whose properties exhaustively use the most interesting algorithms:
when the component at a certain frequency of a signal has high energy, the ear cannot
perceive lower energy components at close frequencies, both lower and higher. TO
a certain distance from the masking frequency, the effect is reduced so much that
negligible; the range of frequencies in which the phenomenon occurs is called the critical band
(critical band). Components belonging to the same critical band influence each other and
they do not affect nor are affected by those that appear outside it








