
DETECTION AND ANALYSIS OF COMPRESSION TRACKS OF SOUND SIGNALS USING MP3, AAC, WMA AND VORBIS CODES

DETECTION AND ANALYSIS OF MP3, WMA, OGG AND VORBIS CODECS IN AUDIO SIGNALS
The article describes the method of MP3, WMA, OGG and Vorbis codec trace detection in the audio signal.
The method reveals digital audio editing, sample rate change, and multi-encoding traces. Keywords: digital audio and video forensics, codec trace detection, psychoacoustic codecs, MP3, AAC, WMA, Vorbis. Introduction Today, digital phonograms or video phonograms, the audio signal of which has been compressed, quite often become the subject of examination of video and sound recordings.
The purpose of compression, as a rule, is to reduce voice traffic on communication channels, or to reduce the amount of data stored. Devices and programs that implement algorithms to compress audio and video signals are called codecs.
Increased recognition in the field of digital recording and storage of audio signals received the so-called psychoacoustic codecs, which provide compression of the signal by removing from it spectral components inaudible to humans (frequency and time masking). The use of such codecs significantly reduces the amount of memory required to render the signal, leaving the sound quality at an acceptable level for everyday use, which is why psychoacoustic codecs are widely used in the media industry.
The most famous and widespread representative of the psychoacoustic codec family is MPEG 1/2 / 2.5 Layer 3, better known as the MP3 codec. Developed more than 20 years ago, the MP3 codec is now implemented in almost any device with the function of recording and reproducing phonograms or video phonograms at the software or hardware level.
In the last decade, psychoacoustic codecs have become increasingly common, using more advanced psychoacoustic models: Advanced Audio Codec (AAC), WMA (Windows Media Audio), and Ogg Vorbis (OGG). Theoretical foundations When analyzing the dynamic spectrogram of a signal that has been encoded using psychoacoustic codecs, it is often easy to notice rectangular outliers (Fig. 1), which is one of the signs of using one of the psychoacoustic codecs. Figure 1. Dynamic spectrogram with MP3 codec coding traces. These dropouts are the result of encoding the signal using the psychoacoustic codec, the operation of which is described below using the MP3 codec as an example.
In the first stage of MP3 encoding, the spectrum of the signal is calculated using the Modified Discrete Cosine Transform (MDCT). Furthermore, based on a psychoacoustic frequency and time masking model, the inaudible components of the MDCT spectrum are reset to zero. The spectrum of the signal is then quantized and encoded using the Huffman method. To simplify further description, the description of the coding step associated with band-pass filtering and reduced sampling of the signal “bands” is omitted before calculating the MDCT spectra, as this is irrelevant in the context under consideration. In connection with this simplification in the work, the sizes of the analysis windows will be indicated for the original signal, and not for the “bands” of signals, as indicated in the specifications.
For convenience, MP3 spectra are called MDKP spectra, which are calculated in the same way as with MP3 encoding. The calculation of MP3 spectra can be performed using four types of analysis windows: with a standard window of 1152 counts in size (indicated in blue), a small window of 384 counts in size (indicated in red) and two types of windows transition (indicated in green) color).
In this case, the window sizes do not depend on the sample rate of the original signal. During the encoding process, the original signal is divided into fragments that intersect with a step of 576 samples (step of the MP3 encoding window). The size of the fragment, depending on the type of window, can vary from 1152 samples for the standard window, 960 for the transition window and 768 for the small window (three small windows with a 50% intersection), but the step between the “centers” of the fragments in all cases is 576 samples.



