融入注意力机制的视频多尺度时序融合行为识别模型
Action Recognition Model Based on Attention Mechanism and Multi Scale Temporal Fusion in Video
-
摘要: 视频行为识别算法在特征提取过程中,存在未聚焦视频图像显著区域信息的问题,使模型分类效果不理想。为了提高网络区别关注的能力,提出融入注意力机制的视频多尺度时序行为识别算法模型。在视频长-短时序网络中分别融入通道-空间注意力和通道注意力模块,引入注意力机制使网络在训练过程中重新分配权重,捕捉视频内容与位置兴趣点,提高网络的表达能力。在Something-somethingV1和Jester数据集上的实验结果表明,融入轻量注意力模块的视频多尺度时序融合行为识别网络的性能得到有效提升,与其他行为识别网络相比体现出一定的优势。
Abstract: In the process of feature extraction, video behavior recognition algorithm has the problem of not focusing the salient area information of video image, which makes the model classification effect not ideal. In order to improve the ability of distinguishing network attention, an algorithm model of video multi-scale temporal sequence behavior recognition incorporating attention mechanism is proposed. The channel-space attention module and the channel attention module are respectively integrated into the video long and short sequence network. In the training process, attention mechanism is introduced to the model to make the network redistribute the weight. The attention mechanism captures the video content and location points of interest and improves the expression ability of the network. Experiments were performed on the Something-SomethingV1 and Jester datasets to verify our behavior recognition method. The results show that the performance of the video multi-scale time-sequence fusion behavior recognition network with robust attention module is effectively improved and shows certain advantages compared with other behavior recognition networks.