Dual-Branch No-Reference Quality Assessment Method for Screen Content Videos Based on Multi-Dimensional Spatiotemporal Features
-
Abstract
The widespread adoption of smart devices has led to the extensive application of screen content videos in fields such as remote education and live streaming. Thus the quality assessment of these videos is crucial for ensuring a satisfactory visual experience. Unlike natural scene videos, screen content contains many synthetic elements such as text and graphics, resulting in more complex distortion types. Therefore, there is a need to develop a no-reference quality assessment model that aligns with human visual characteristics. However, existing methods struggle to effectively handle a high dynamic range and composite distortions, and the high redundancy and strong temporal dependencies in video data constrain feature extraction efficiency and the accuracy of quality perception. To address these challenges, this study proposed a dual-branch architecture for no-reference screen content video quality assessment. For complex distortion patterns, we constructed a spatial perception branch to extract spatial structural information and noise distribution from key frames. To reduce video redundancy and suppress shallow dependencies, we introduced a tube-based masked spatiotemporal encoding mechanism that captures deeper motion features. To address the difficulties encountered in temporal modeling, we designed a temporal perception enhancement module that integrates multi-dimensional features to generate final quality scores. Experimental findings revealed that our method achieved a 2.3% improvement in the weighted Spearman Rank-Order Correlation Coefficient (SROCC) compared with the second-best model on two mainstream datasets, significantly enhancing both the perceptual consistency and generalization capability in screen content video quality assessment.
-
-