结合时空注意力机制和自适应图卷积网络的骨架行为识别

Skeleton-Based Action Recognition on Spatio-Temporal Attention Mechanism and Adaptive Graph Convolutional Network

  • 摘要: 针对骨架行为识别对时空特征提取不充分以及难以捕捉全局上下文信息的问题,研究了一种将时空注意力机制和自适应图卷积网络相结合的人体骨架行为识别方案。首先,构建基于非局部操作的时空注意力模块,辅助模型关注骨架序列中最具判别性的帧和区域;其次,利用高斯嵌入函数和轻量级卷积神经网络的特征学习能力,并考虑人体先验知识在不同时期的影响,构建自适应图卷积网络;最后,将自适应图卷积网络作为基本框架,并嵌入时空注意力模块,与关节信息、骨骼信息以及各自的运动信息构建双流融合模型。该算法在NTU RGB+D数据集的两种评价标准下分别达到了90.2%和96.2%的准确率,在大规模的数据集Kinetics上体现出模型的通用性,验证了该算法在提取时空特征和捕捉全局上下文信息上的优越性。

     

    Abstract: To solve the problem that skeleton behavior recognition can not extract spatio-temporal features sufficiently and it is difficult to capture global context information, a human skeleton behavior recognition scheme based on spatio-temporal attention mechanism and adaptive graph convolution network is studied. Firstly, a spatio-temporal attention module based on non-local operation is constructed to assist the model to focus on the most discriminative frames and regions in the skeleton sequence; secondly, an adaptive graph convolution network is constructed by using the feature learning ability of Gaussian embedding function and lightweight convolution neural network, and considering the effect of human prior knowledge in different time periods; finally, the adaptive graph convolution network is used as the basic framework, the spatio-temporal attention module is embedded to construct two-stream fusion model with joint information, bone information and their respective motion information. The accuracy of the algorithm is 90.2% and 96.2% respectively under the two evaluation standards of NTU RGB + D dataset. The universality of the model is reflected in the large-scale dataset Kinetics, which verifies that the algorithm is proved to be superior in extracting spatio-temporal features and capturing global context information.

     

/

返回文章
返回