Kang Shuning, Zhang Liang. Human Action Recognition Based on Semantic Feature Cuboid Slicing[J]. JOURNAL OF SIGNAL PROCESSING, 2020, 36(11): 1897-1905. DOI: 10.16798/j.issn.1003-0530.2020.11.012
Citation: Kang Shuning, Zhang Liang. Human Action Recognition Based on Semantic Feature Cuboid Slicing[J]. JOURNAL OF SIGNAL PROCESSING, 2020, 36(11): 1897-1905. DOI: 10.16798/j.issn.1003-0530.2020.11.012

Human Action Recognition Based on Semantic Feature Cuboid Slicing

  •  Human action recognition based on deep learning has achieved great success in recent years. Especially the 2D convolutional neural network can learn the spatial features of human action well, but there are still problems in capturing long-term motion information. In order to solve this problem, the human action recognition model based on the semantic feature cuboid slicing is proposed to jointly learn the appearance and motion features of action. On the basis of temporal segment networks(TSN), the model adopts InceptionV4 as the backbone network to extract the appearance features of human action, and divides the 3D feature cuboids into 2D spatio-slices and 2D temporal-slices. A spatiotemporal feature fusion module is also proposed to comprehensively learn the weight distribution of multi-dimensional slices, so as to obtain the spatiotemporal features of human action, and an end-to-end model is trained in this way. The accuracy of our model improves in UCF101 and HMDB51 compared with TSN. The experimental result shows that the model can capture more motion information and improve the recognition results of human action without significantly increasing the network parameters.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return