UAV Multi-Target Tracking Algorithm Jointly Optimized by YOLOv5 and Deep-SORT
-
摘要: 针对无人机平台下小目标检测性能差、目标尺度变化较大、复杂背景干扰等导致跟踪失败的问题,该文提出一种联合优化检测器YOLOv5(You Only Look Once)和Deep-SORT(Simple Online and Realtime Tracking with a Deep Association Metric)的无人机多目标跟踪算法。该算法使用改进的CSPDarknet53(Cross Stage Paritial Darknet53)骨干网络重新构建检测器中的特征提取模块,同时通过自顶向下和自底向上的双向融合网络设计小目标检测层,采用无人机航拍数据集训练更新优化后的目标检测网络模型,解决小目标检测性能差问题;在跟踪模块中,提出结合时空注意力模块的残差网络作为特征提取网络,加强网络感知微小外观特征及抗干扰的能力,最后采用三元组损失函数加强神经网络区分类内差异的能力。实验结果表明,优化后的目标检测的平均检测精度相比于原始YOLOv5提升了11%,在UAVDT数据集上相较于原始跟踪算法准确率与精度分别提高了13.288%、3.968%,有效减少目标身份切换频次。Abstract: Aiming at the problems of tracking failure caused by poor detection performance of small targets, large target scale changes, and complex background interference under the unmanned aerial vehicle platform, this paper proposed an unmanned aerial vehicle multi-target tracking algorithm that jointly optimized YOLOv5 (You Only Look Once) and Deep-SORT (Simple Online and Realtime Tracking with a Deep Association Metric). The algorithm used the improved CSPDarknet53 (Cross Stage Paritial Darknet53) backbone network to reconstruct the feature extraction module in the detector. At the same time, the small target detection layer was designed by the top-down and bottom-up bidirectional fusion network. In the meanwhile, the optimized target detection network model was trained by the unmanned aerial vehicle aerial photography dataset, which solved the problem of poor detection performance of small targets. As for the tracking module, a residual network combined with the spatiotemporal attention module was proposed as a feature extraction network to enhance the network's ability to perceive small appearance features and anti-interference. Finally, the triple loss function was used to strengthen the ability of the neural network to distinguish within-class differences. The experimental results show that the average detection accuracy of the optimized target detection is improved by 11% compared with the original YOLOv5, and the accuracy and precision of the UAVDT(The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking) data set are improved by 13.288% and 3.968% respectively compared with the original tracking algorithm, effectively reducing the target. Identity switching frequency.
-
Keywords:
- deep learning /
- target tracking /
- target detection /
- unmanned aerial vehicle
-
-
表 1 YOLOv5改进前后性能对比结果
Table 1 Performance comparison results before and after YOLOv5 improvement
Network models Car AP(%) Bus AP(%) Truck AP(%) mAP(%) mAP@.5:.95(%) YOLOv5 64.5 42.1 14.3 21.0 10.3 YOLOv5_1 72.9 53.1 35.1 32.0 17.1 表 2 本文算法在UAVDT数据集中的跟踪结果
Table 2 Tracking results of the algorithm in UAVDT dataset
Tracking Model MOTA(%) MOTP(%) FN(帧) FP(帧) IDs YOLOv5+Deep-SORT 23.237 71.332 102840 157750 1096 YOLOv5_1+Deep-SORT 15.692 70.894 87247 199230 934 YOLOv5+Deep-SORT1 31.217 71.541 120900 112720 872 ours 36.525 75.300 75750 140640 819 -
[1] GAO Ming,JIN Lisheng,JIANG Yuying,et al. Manifold Siamese network:A novel visual tracking ConvNet for autonomous vehicles[J]. IEEE Transactions on Intelligent Transportation Systems,2020,21(4):1612- 1623. doi:10.1109/tits.2019.2930337 doi: 10.1109/tits.2019.2930337
[2] YFANTIS E A. A UAV with autonomy,pattern recognition for forest fire prevention,and AI for providing advice to firefighters fighting forest fires[C]// 2019 IEEE 9th Annual Computing and Communication Workshop and Conference. Las Vegas,NV,USA. IEEE,2019:409- 413. doi:10.1109/ccwc.2019.8666471 doi: 10.1109/ccwc.2019.8666471
[3] 杨建秀,谢雪梅,石光明,等. 特征信息增强的无人机车辆实时检测算法[J]. 信号处理,2022,38(5):901- 914. YANG Jianxiu,XIE Xuemei,SHI Guangming,et al. Real-time UAV vehicle detection based on enhanced feature information[J]. Journal of Signal Processing,2022,38(5):901- 914.(in Chinese)
[4] BAE S H,YOON K J. Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus,OH,USA. IEEE,2014:1218- 1225. doi:10.1109/cvpr.2014.159 doi: 10.1109/cvpr.2014.159
[5] Al-SHAKARJI N M,BUNYAK F,SEETHARAMAN G,et al. Multi-object tracking cascade with multi-step data association and occlusion handling[C]// 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance(AVSS). Auckland,New Zealand. IEEE,2018:1- 6. doi:10.1109/avss.2018.8639321 doi: 10.1109/avss.2018.8639321
[6] JIN J,LI X,LI X,et al. Online multi-object tracking with Siamese network and optical flow[C]// 2020 IEEE 5th International Conference on Image,Vision and Computing(ICIVC). Beijing,China. IEEE,2020:193- 198. doi:10.1109/icivc50857.2020.9177480 doi: 10.1109/icivc50857.2020.9177480
[7] ZHANG Yifu,SUN Peize,JIANG Yi,et al. ByteTrack:multi-object tracking by associating every detection box[EB/OL]. 2021:arXiv:2110.06864[cs.CV]. https://doi.org/10.48550/arXiv.2110.06864. doi: 10.48550/arXiv.2110.06864
[8] KIM C,LI Fuxin,CIPTADI A,et al. Multiple hypothesis tracking revisited[C]// 2015 IEEE International Conference on Computer Vision. Santiago,Chile. IEEE,2015:4696- 4704. doi:10.1109/iccv.2015.533 doi: 10.1109/iccv.2015.533
[9] BEWLEY A,GE Zongyuan,OTT L,et al. Simple online and realtime tracking[C]// 2016 IEEE International Conference on Image Processing. Phoenix,AZ,USA. IEEE,2016:3464- 3468. doi:10.1109/icip.2016.7533003 doi: 10.1109/icip.2016.7533003
[10] WANG Z,ZHENG L,LIU Y,et al. Towards real-time multi-object tracking[C]// European Conference on Computer Vision. Glasgow US:Springer,2020:107- 122. doi:10.1007/978-3-030-58621-8_7 doi: 10.1007/978-3-030-58621-8_7
[11] ZENG F,DONG B,WANG T,et al. Motr:End-to-end multiple-object tracking with transformer[EB/OL]. 2021:arXiv:2105.03247[cs.CV]. https://doi.org/10.48550/arXiv.2105.03247. doi: 10.48550/arXiv.2105.03247
[12] CAI J,XU M,LI W,et al. MeMOT:Multi-object tracking with memory[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans,USA. IEEE,2022:8090- 8100. doi:10.1109/cvpr52688.2022.00792 doi: 10.1109/cvpr52688.2022.00792
[13] ZHOU X,YIN T,KOLTUN V,et al. Global tracking transformers[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans,USA. IEEE,2022:8771- 8780. doi:10.1109/cvpr52688.2022.00857 doi: 10.1109/cvpr52688.2022.00857
[14] ZHU P,WEN L,DU D,et al. Detection and tracking meet drones challenge[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(10):1- 1.
[15] DU D,QI Y,YU H,et al. The unmanned aerial vehicle benchmark:Object detection and tracking[C]// Proceedings of the European Conference on Computer Vision(ECCV). Munich,Germany:Springer,2018:370- 386. doi:10.1007/978-3-030-01249-6_23 doi: 10.1007/978-3-030-01249-6_23
[16] REN Shaoqing,HE Kaiming,GIRSHICK R,et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137- 1149. doi:10.1109/tpami.2016.2577031 doi: 10.1109/tpami.2016.2577031
[17] PANG Jiangmiao,CHEN Kai,SHI Jianping,et al. Libra R-CNN:Towards balanced learning for object detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach,CA,USA. IEEE,2019:821- 830. doi:10.1109/cvpr.2019.00091 doi: 10.1109/cvpr.2019.00091
[18] REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:Unified,real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas,NV,USA. IEEE,2016:779- 788. doi:10.1109/cvpr.2016.91 doi: 10.1109/cvpr.2016.91
[19] LIN T Y,DOLLÁR P,GIRSHICK R,et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA. IEEE,2017:936- 944. doi:10.1109/cvpr.2017.106 doi: 10.1109/cvpr.2017.106
[20] LIU Shu,QI Lu,QIN Haifang,et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA. IEEE,2018:8759- 8768. doi:10.1109/cvpr.2018.00913 doi: 10.1109/cvpr.2018.00913
[21] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al. An image is worth 16 x 16 words:Transformers for image recognition at scale[EB/OL]. 2020:arXiv:2010.11929[cs.CV]. https://arxiv.org/abs/2010.11929.
[22] WOJKE N,BEWLEY A,PAULUS D. Simple online and realtime tracking with a deep association metric[C]// 2017 IEEE International Conference on Image Processing. Beijing,China. IEEE,2017:3645- 3649. doi:10.1109/icip.2017.8296962 doi: 10.1109/icip.2017.8296962
[23] BERNARDIN K,STIEFELHAGEN R. Evaluating multiple object tracking performance:The CLEAR MOT metrics[J]. EURASIP Journal on Image and Video Processing,2008,2008:1- 10. doi:10.1155/2008/246309 doi: 10.1155/2008/246309
[24] MUELLER M,SMITH N,GHANEM B. A benchmark and simulator for UAV tracking[C]// Computer Vision– ECCV 2016,2016:445- 461. doi:10.1007/978-3-319-46448-0_27 doi: 10.1007/978-3-319-46448-0_27