基于多维时空特征的双支路无参考屏幕内容视频质量评价方法

赖伊琳; 陈瑜萍; 刘智鸿; 朱显丞; 曾焕强

doi:10.12466/xhcl.2026.02.003

基于多维时空特征的双支路无参考屏幕内容视频质量评价方法

Dual-Branch No-Reference Quality Assessment Method for Screen Content Videos Based on Multi-Dimensional Spatiotemporal Features

摘要

摘要: 随着智能终端的普及，屏幕内容视频在远程教育、直播等领域中应用广泛，其质量评价对保障视觉体验至关重要。与自然场景不同，屏幕内容包含大量文本、图形等合成元素，失真类型更为复杂，故亟需构建符合人眼视觉特性的无参考质量评价模型。然而，现有方法难以有效应对高动态范围与复合失真，且视频的高冗余度与强时序依赖性制约了特征提取的效率与质量感知的准确性。为此，本文提出一种双支路架构的无参考屏幕内容视频质量评价方法。针对复杂失真问题，构建空间感知支路以提取关键帧的空间结构信息与噪声分布；为降低视频冗余并抑制浅层依赖关系，引入基于管状掩蔽策略的时空编码机制，以挖掘深层运动特征；针对视频时序建模的难点，设计时序感知增强模块，实现多维度特征的融合并输出整体质量分数。实验结果表明，本文方法在两个主流数据集上，以加权的斯皮尔曼等级相关系数（Spearman Rank-Order Correlation Coefficient，SROCC）衡量模型性能，其结果相较于次优模型提升了2.3%，显著提升了屏幕内容视频质量评价的感知一致性与泛化能力。

Abstract: The widespread adoption of smart devices has led to the extensive application of screen content videos in fields such as remote education and live streaming. Thus the quality assessment of these videos is crucial for ensuring a satisfactory visual experience. Unlike natural scene videos， screen content contains many synthetic elements such as text and graphics， resulting in more complex distortion types. Therefore， there is a need to develop a no-reference quality assessment model that aligns with human visual characteristics. However， existing methods struggle to effectively handle a high dynamic range and composite distortions， and the high redundancy and strong temporal dependencies in video data constrain feature extraction efficiency and the accuracy of quality perception. To address these challenges， this study proposed a dual-branch architecture for no-reference screen content video quality assessment. For complex distortion patterns， we constructed a spatial perception branch to extract spatial structural information and noise distribution from key frames. To reduce video redundancy and suppress shallow dependencies， we introduced a tube-based masked spatiotemporal encoding mechanism that captures deeper motion features. To address the difficulties encountered in temporal modeling， we designed a temporal perception enhancement module that integrates multi-dimensional features to generate final quality scores. Experimental findings revealed that our method achieved a 2.3% improvement in the weighted Spearman Rank-Order Correlation Coefficient （SROCC） compared with the second-best model on two mainstream datasets， significantly enhancing both the perceptual consistency and generalization capability in screen content video quality assessment.

HTML全文

参考文献(23)

施引文献

资源附件(0)