Abstract:
3D convolutional neural network has superior ability in spatio-temporal feature extraction than 2D convolutional neural network, but the calculation intensity is significantly increased. To solve the problem of declined precision caused by reducing the computing complexity, the efficient compression of the model parameters is the key. Hence, an end-to-end channel separable convolutional neural network is proposed. 3D convolution is decomposed by separating channel interaction and spatio-temporal interaction, in which 3×3×3 Depthwise convolution and 1×1×1 conventional convolution are respectively used to separate channel interaction and spatio-temporal interaction. Compared with the traditional 3D convolutional neural network, the channel separable convolutional neural network adds model regularization, which reduces the overfitting of the model by reducing the training accuracy and improving the testing accuracy. Experiments on UCF-101 and HMDB-51 datasets have achieved 92.7% and 64.5% accuracy, respectively. The results show that the channel separable convolutional neural network can improve the accuracy and reduce the computational complexity.