TVBN-ResNeXt:解决视频分类的端到端时空双流融合网络

TVBN-ResNeXt:End-to-End Fusion of Space-Time Two-Stream Convolution Network for Video Classification

  • 摘要: 针对如何利用视频中空域C3D与光流2D网络的互补性、光流高效计算与存储问题,提出基于端到端时空双流卷积网络融合的视频分类算法(TV BN-Inception network and ResNeXt-101 TVBN-ResNeXt),可融合C3D与自学习端到端光流卷积网络的优点。针对空间流,首先基于C3D 的ResNeXt-101残差网络进行空域视频分类;然后另一支路使用端到端时间流网络,由TVnet网络实时进行光流学习,其次针对堆叠光流特征数据利用BN-Inception网络进行视频分类;最后将双流支路的视频分类结果进行加权融合形成最后判决。在UCF-101和HMDB-51数据集上的实验分别达到94.6%和70.4%的准确率。结果表明,本文提出的TVBN-ResNeXt双流互补网络融合方法不但可解决光流自学习问题,提高网络的运行效率,还可有效提高视频分类的性能

     

    Abstract: In order to make good use of complementary between C3D and optical flow 2D network and reduce computing load of optical flow. This paper proposes a video classification algorithm for end-to-end fusion spatiotemporal two-stream convolutional network , which can combine the advantages of C3D and self-learning end-to-end optical convolution network. For spatial streams, C3D-based ResNeXt-101 networks are used for video classification. The other branch uses an end-to-end time stream network, and the optical flow learning is performed by the TVnet network in real time, then the BN-Inception network is used for video classification for the stacked optical flow data. Finally, the video classification results of the time stream and the spatial stream are weighted and combined to form a final decision. The experiment is performed on the UCF-101 and HMDB-51 datasets, and obtains 94.6% and 70.4% accuracy, respectively. The experimental results show that the two-stream network method proposed in this paper can not only solve the problem of optical flow self-learning, improve the operation efficiency of the network, but also effectively improve the video classification performance.

     

/

返回文章
返回