Abstract:
To solve the problem that skeleton behavior recognition can not extract spatio-temporal features sufficiently and it is difficult to capture global context information, a human skeleton behavior recognition scheme based on spatio-temporal attention mechanism and adaptive graph convolution network is studied. Firstly, a spatio-temporal attention module based on non-local operation is constructed to assist the model to focus on the most discriminative frames and regions in the skeleton sequence; secondly, an adaptive graph convolution network is constructed by using the feature learning ability of Gaussian embedding function and lightweight convolution neural network, and considering the effect of human prior knowledge in different time periods; finally, the adaptive graph convolution network is used as the basic framework, the spatio-temporal attention module is embedded to construct two-stream fusion model with joint information, bone information and their respective motion information. The accuracy of the algorithm is 90.2% and 96.2% respectively under the two evaluation standards of NTU RGB + D dataset. The universality of the model is reflected in the large-scale dataset Kinetics, which verifies that the algorithm is proved to be superior in extracting spatio-temporal features and capturing global context information.