Abstract:
In order to improve the classification accuracy and the temporal positioning accuracy of the temporal action detection network, this paper proposes an improved Region Convolutional 3D neural network(R-C3D).In the temporal proposal subnet, the height and width of the feature map is changed from (H/16, W/16) to (1, 1) through layer-by-layer spatial convolution.So the classification accuracy is improved.According to the idea of deconvolution in Convolutional-De-Convolutional Networks (CDC), the time domain deconvolution network is used to increase the length of the feature map and improve the temporal positioning accuracy of the behavior.The experimental results on the THUMOS14 data set show that the proposed method has better detection accuracy than R-C3D on long-time unsegmented videos.