基于视觉Transformer的运动特征选择融合微表情识别算法
Micro-Expression Recognition Algorithm Based on a Visual Transformer with Motion Feature Selection and Fusion
-
摘要: 微表情识别旨在揭示目标对象隐藏的真实情感,其在人机交互、心理诊断以及意图预测等领域具有重要应用价值。然而,微表情表达强度微弱、时间短暂且面部运动单元间存在长距离依赖,使得传统卷积神经网络难以有效表征微表情动态特征。此外,微表情特征与受试者身份以及面部外观信息存在强耦合性,不利于分离和提取微表情语义信息。为了解决上述问题,本文提出了一种基于视觉Transformer和运动特征选择的微表情识别算法。首先,利用TV-L1光流算法计算水平和垂直光流运动图,用以表征面部运动。随后,利用视觉Transformer网络编码微表情发生时面部运动单元间的运动依赖关系,为了进一步提升特征表达能力,本文设计了特征选择融合模块(Feature Selection Fusion Module,FSFM)以实现微表情关键的局部信息的有效获取,并引入空间一致性注意力模块(Spatial Consistency Attention Module,SCAM)以确保不同运动特征在空间分布上的一致性。此外,本文提出的交叉注意力融合模块(Cross Attention Fusion Module,CAFM)能够增强微表情语义信息的表征能力。与现有方法相比,本文所提出的算法在三个权威的微表情数据库上微表情识别任务中表现出显著的准确率提升,进一步验证了该方法的有效性与优越性。Abstract: Micro-expression recognition (MER) aims to reveal the hidden, true emotions of targets and is therefore of great significance in fields such as human-computer interaction, psychology, and security. However, micro-expressions occurred with weak intensity, transience of duration, and long-range dependence between facial motion units, making it difficult for traditional convolutional neural networks to effectively represent the inherent dynamic features of micro-expressions. In response to these issues, this paper proposes a micro-expression recognition algorithm based on a visual transformer and motion feature selection. The proposed algorithm first computes horizontal and vertical optical flow motion maps to describe facial motion using the TV-L1 and then encodes the relationships between motion units using a visual transformer. Next, this study introduced a feature selection and fusion module (FSFM) to effectively capture the key local information of micro-expressions and integrated a spatial consistency attention module(SCAM) to ensure spatial distribution consistency among different motion patterns. Finally, a cross attention fusion module(CAFM) was introduced to enhance micro-expression semantic information. Extensive experiments were conducted on three benchmark datasets, namely, MMEW, CASMEII, and SAMM. The proposed method achieved recognition accuracy values of 67.8%, 73.3%, and 68.7%, respectively. Compared with existing methods, the proposed algorithm demonstrates a significant improvement in accuracy in micro-expression recognition tasks, thereby further validating the effectiveness and superiority of the method.