Abstract:
In order to make good use of complementary between C3D and optical flow 2D network and reduce computing load of optical flow. This paper proposes a video classification algorithm for end-to-end fusion spatiotemporal two-stream convolutional network , which can combine the advantages of C3D and self-learning end-to-end optical convolution network. For spatial streams, C3D-based ResNeXt-101 networks are used for video classification. The other branch uses an end-to-end time stream network, and the optical flow learning is performed by the TVnet network in real time, then the BN-Inception network is used for video classification for the stacked optical flow data. Finally, the video classification results of the time stream and the spatial stream are weighted and combined to form a final decision. The experiment is performed on the UCF-101 and HMDB-51 datasets, and obtains 94.6% and 70.4% accuracy, respectively. The experimental results show that the two-stream network method proposed in this paper can not only solve the problem of optical flow self-learning, improve the operation efficiency of the network, but also effectively improve the video classification performance.