Video encoding, how it works (part 2)


Free Download Mp4Gain
picture

Video encoding, how it works (part 2)

video encoding

So far, we’ve only talked about image compression. But a full video also involves an audio component. CD-quality sound is believed to need to be digitized at 44.1 kHz at 16 bits per channel, which is equivalent to 706 Kbps per channel (1.4 Mbps for stereo). The quality of the DAT signal determines the sampling rate of 48 KHz (frequency band 4-24000 Hz) and increases the stream to 768 Kbps per channel.

Video Encoding

 

The information compression approach is the same: discarding the part that is not very important for the human ear to perceive. The MPEG standard allows 3 layers of audio compression. Layer 1 uses the simplest algorithm with minimal compression, assuming 192 Kbps per channel. The Layer 2 algorithm is more complex, but the compression rate is higher, only 128 Kbps per channel. A powerful CD-quality digital audio compression algorithm (11 times lossless distinguishable by the human ear) Layer 3 provides the highest possible sound quality with severe transmission restrictions – no more than 64 Kbps per channel. It is primarily intended for the Internet. Its importance is so great that it has received a special abbreviation MP3, which stands for MPEG Layer 3. There are many Internet sites that contain hundreds of thousands of MP3 files of popular music. With the help of special playback programs (Real Audio), MP3 music can be listened to in real time over the Internet, copied indefinitely (note that a typical song is 2-8MB), and illegally distributed. There are already portable MP3 players priced around $ 200 (like the Diamond Rio). The music industry, with tangible losses, began an active fight against MP3 sites (the Recording Industry Association of America found and closed most of them). But the gin is out, you can’t close everyone. Adaptec predicts that billions of songs will be downloaded from the Internet in the coming years and announces MP3 support in the next version of EasyCD Creator. However, in digital editing tasks, audio signal compression is not used, therefore, in allowable stream calculations, it is necessary to allocate up to 1.5 Mbps to the audio component.

MPEG2 for non-linear editing tasks

The term non-linear editing does not correspond to the essence of the process, but only reflects one of its characteristics. In fact, we are talking about video editing, done in digital format on computers. In this case, the original video fragments are subject to mandatory digitization and recording on the hard disk in the form of appropriate files. Unlike tape drives, accessing any of these fragmented files does not require tedious rewinding (and this process is linear), meaning all video frames are available in random order. This important property gave rise to the name of digital editing as non-linear, although, obviously, the possibilities of digital processing are much broader and richer.

Remember that according to the ITU-R BT.601 recommendation, a television frame is a 720×576 matrix. Taking into account the television frame rate of 25 Hz, we conclude that one second of digital video in 4: 2: 2 representation requires 25x2x720x576 = 20,736,000 bytes, that is, the data stream is 21 MBps. Recording these streams is technically feasible, but difficult, expensive, and inefficient in terms of post-processing. The real possibilities of practice require a significant reduction in flows. Many algorithms are known to perform lossless compression, but even the most effective ones do not provide more than 2x compression on typical images.

Until recently, M-JPEG reigned supreme in the world of non-linear video editing systems. The different solutions differed in the degree of compression, which corresponded to different levels of quality of the resulting video. Quite conditionally, 4 levels can be distinguished here: Standard Video (VHS, C-VHS, Video8), Super-Video (SVHS, C-SVHS, Hi8), Digital Video (Betacam SP, DV / DVCAM / DVCPRO, mini -DV, Digital8) and Studio Video (Digital-S, DVCPRO50). For simplicity, we will refer to them as Video, S-Video, DV, and Studio-TV in what follows. Quantitatively, they are generally characterized by horizontal resolution (the number of distinguishable elements in a line: television lines). Video is considered to provide a resolution of up to 280 lines and corresponds to an MJPEG stream of approximately 2 MBps.


Free Download Mp4Gain
picture


Mp4Gain Main Window
picture


Mp4Gain Features
picture


Free Download Mp4Gain
picture

Video encoding, how it works (part 1)

Video encoding, how it works (part 1)

video encoding

The effective compression of video information is based on two main ideas: the suppression of small details of the spatial distribution of individual frames that are insignificant to visual perception, and the elimination of temporal redundancy in the sequence of these frames. Consequently, we speak of spatial and temporal compression.

Video Encoding

The first one uses the experimentally established low sensitivity of human perception to distortions of small image details. The eye notices a non-uniform background more quickly than the curvature of a thin edge or a change in brightness and color of a small area. Two equivalent representations of the image are known from mathematics: the familiar spatial distribution of brightness and color and the so-called frequency distribution associated with the spatial Discrete Cosine Transform (DCT). In theory, they are equivalent and reversible, but they store information about the image structure in completely different ways: the transmission of smooth background changes is provided by low-frequency (center) values ​​of the frequency distribution, and the high-frequency coefficients. They are often responsible for the fine details of spatial distribution. This allows the following compression algorithm to be used. The frame is divided into 16×16 blocks (720×576 corresponds to 45×36 blocks), each of which is converted to DCT in the frequency domain. Then the corresponding frequency coefficients are quantized (rounding of values ​​with a given interval). If the DCT itself does not lead to data loss, the quantization of the coefficients obviously causes a thickening of the image. The quantization operation is performed with a variable interval: low-frequency information is transmitted more precisely, while many high-frequency coefficients take zero values. This provides significant compression of the data stream, but leads to a decrease in effective resolution and the possible appearance of minor spurious details (particularly at block boundaries). Obviously

For attentive readers, we repeat that this algorithm came from digital photography, where, under the name JPEG, it was developed to efficiently compress individual frames (JPEG is an abbreviation of the name of the Joint Photographic Experts Group, which endorsed it). It was then successfully applied to frame video sequences (each processed completely independently) and renamed MJPEG (Motion-JPEG). It should also be noted that the DV encoding of the DV / DVCAM / DVCPRO digital standards is essentially based on the same algorithm, but uses a more flexible scheme with adaptive selection of quantization tables. The compression ratio for different blocks, unlike MJPEG, varies with the image: for non-informational blocks (for example, at the edges of the image) it increases, and for blocks with a large number of small details, it decreases relative to the middle level of the image. As a result, with the same quality, the data volume is reduced by approximately 15% (or vice versa, with the same flow, the quality of the output signal is higher).

Temporal MPEG compression uses a high redundancy of information in images separated by small intervals. In fact, between adjacent images, usually only a small part of the scene changes; for example, there is a smooth movement of a small object on the background of a fixed background. In this case, the complete information about the scene should be saved only selectively, for reference images. For the rest, it is enough to transmit only difference information: about the position of the object, the direction and magnitude of its displacement, about new background elements (which open behind the object as it moves). In addition, these differences can form not only in comparison with the previous images, but also with the later ones (since it is in them, as the object moves, the part of the background that was previously hidden behind the object is revealed). Note that mathematically the most difficult element is the search for displaced blocks, but little change in structure, (16×16) and the determination of the corresponding vectors of their displacement. However, this element is the most essential as it can significantly reduce the amount of information required. It is the efficiency of the real-time execution of this “smart” element that distinguishes various MPEG encoders.