改进的R-C3D时序行为检测网络

Improved R-C3D temporal action detection network

摘要: 为了提高时序行为检测网络的分类精度和时序上的定位精度，本文提出了一种改进的区域3D卷积神经网络(Region Con-volutional 3D Network,R-C3D)。在时序候选子网中，通过逐层空间卷积把特征图的高宽由(H/16,W/16)变为(1,1),提高行为的分类精度，通过卷积-反卷积网络(Convolutional-De-Convolutional Networks,CDC)里反卷积的思想，使用时域反卷积网络增加特征图长度，提高时域上行为的定位精度。在THUMOS14数据集的实验结果表明：与R-C3D相比，本文提出的方法在长时序未分割视频上有较高的检测精度。

Abstract: In order to improve the classification accuracy and the temporal positioning accuracy of the temporal action detection network, this paper proposes an improved Region Convolutional 3D neural network(R-C3D).In the temporal proposal subnet, the height and width of the feature map is changed from (H/16, W/16) to (1, 1) through layer-by-layer spatial convolution.So the classification accuracy is improved.According to the idea of deconvolution in Convolutional-De-Convolutional Networks (CDC), the time domain deconvolution network is used to increase the length of the feature map and improve the temporal positioning accuracy of the behavior.The experimental results on the THUMOS14 data set show that the proposed method has better detection accuracy than R-C3D on long-time unsegmented videos.