VVC Intra-Coding Scheme for Machines
-
Graphical Abstract
-
Abstract
Recently, the proliferation of computer vision applications in areas such as intelligent surveillance, autonomous driving, and robotics has resulted in a surge in the volume of video data. These videos are increasingly processed and analyzed by intelligent algorithms, rather than being solely consumed by humans. Consequently, efficient storage and transmission of video data for machine vision tasks have become the new challenges. The latest video coding standard, Versatile Video Coding (VVC or H.266), represents the state-of-the-art in video compression for human viewers. It aims to provide better quality at lower bitrates by optimizing for the characteristics of the human visual system. However, VVC does not account for the specific requirements of machine vision tasks, which leads to critical information loss during compression. Consequently, the performance of machine vision algorithms may degrade significantly when working with compressed video. This gap indicates the need for a specialized video coding approach that considers the unique requirements of machine vision. To address this problem, this paper proposes a novel VVC intra-coding scheme that optimizes VVC specifically for machine vision tasks. Our approach takes multiple object tracking, a common task in machine vision, as a typical example to demonstrate the effectiveness of the proposed solution. First, the proposed scheme begins by analyzing the video content using a neural network interpretability method known as Gradient-weighted Class Activation Mapping (GradCAM++). This method is typically used to highlight areas of an image that are most relevant to the decision-making process of a neural network. By applying GradCAM++ to the video frames, we generate saliency maps that reveal the regions of interest for machine vision. Subsequently, to highlight the critical edge contour information in the frame, this paper introduces edge detection and fuses it with the saliency analysis results to obtain the final machine vision saliency map. Finally, the process of VVC mode selection is improved based on the fused machine vision saliency map to optimize the mode decision process for block partition and intra-frame prediction in VVC. Furthermore, the rate distortion optimization (RDO) process, which typically balances bitrate and human-perceived distortion, is adjusted to focus on preserving information that is critical for machine vision. By introducing machine vision distortion instead of the conventional signal distortion, the encoder shifts its focus toward preserving the information most relevant to the machine vision. Experimental results show that the proposed method achieves improvements over standard VVC. Specifically, the method reduces the bitrate by 12.7% while maintaining the better accuracy. This result shows that the proposed VVC intra-coding scheme can better satisfy the increasing demand for efficient video compression, customized for automatic video analysis systems.
-
-