一种融合运动预测的三维点云目标跟踪算法

张远; 刘昭娣; 杨大林; 王伯伦; 王彦平

doi:10.16798/j.issn.1003-0530.2024.03.010

摘要: 随着人工智能技术的发展以及对目标跟踪理论的研究深入，目标跟踪技术在实际生活中得到了广泛的应用，对于许多视觉应用，目标跟踪更是一项必不可少的技术。目前，针对二维图像的目标跟踪研究已取得了丰硕的成果，但基于三维点云的目标跟踪技术还处于研究发展阶段。近年来，由于激光雷达的广泛应用以及深度学习技术在三维点云领域的研究，基于激光雷达点云的目标跟踪技术也取得了一些进展。现有的基于激光雷达点云目标的跟踪算法主要分为两大类：传统滤波算法和深度学习算法。虽然基于传统滤波的点云目标跟踪算法可以达到更好的效果，但是很难给这类算法赋予最优的参数，并且这类算法在一些剧烈变化的场景中很容易失效。而基于深度学习的激光雷达点云目标跟踪算法大多是“检测-跟踪”的架构，这种架构最大的问题是后端跟踪任务严重依赖于前端检测结果，当前端检测器失效时，后端跟踪模块就无法进行跟踪，这会造成大量的目标丢失问题。针对以上问题，本文使用了一种以运动预测为中心的深度学习架构。该架构将检测和运动预测相结合，主要分为两个阶段：第一阶段通过点云特征提取对目标进行检测，将目标从点云中分割出来，并将目标定位到连续帧中；第二阶段通过运动预测更新分支对目标框进行优化，以得到更准确的目标位置。实验结果表明，该方法有效，且跟传统滤波方法相比，能更好的应对一些剧烈变化的场景；跟深度学习方法中的“检测-跟踪”架构相比，减少了目标丢失的情况。针对激光雷达点云目标跟踪，能得到更精确的跟踪结果。

Abstract: ‍ ‍With the development of artificial intelligence technology and the in-depth study of target-tracking theory， target-tracking technology has become widely used in real life. Target tracking is an essential technology for many vision applications. Although the target tracking research for two-dimensional images has already achieved fruitful results， target-tracking technology based on a three-dimensional point cloud is still in the research and development stage. Target-tracking technology based on a lidar point cloud has advanced somewhat in recent years thanks to the widespread use of lidar and development of deep-learning technology in the area of 3D point clouds. The existing tracking algorithms based on lidar point cloud targets are mainly divided into two categories： traditional filtering algorithms and deep-learning algorithms. Although point-cloud target-tracking algorithms based on traditional filtering can achieve better results， it is difficult to assign optimal parameters to such algorithms， and these algorithms can easily fail when dramatic changes occur in a scene. Most of the lidar point-cloud target-tracking algorithms based on deep learning utilize "detection-track" architectures. The biggest problem with this architecture is that the back-end tracking task depends heavily on the front-end detection results， and the back-end tracker cannot track when the front-end detector fails. This creates many target-loss issues. This paper introduces a deep-learning architecture centered on motion prediction to solve these problems. The architecture combines detection and motion prediction， which is mainly divided into two stages. The object is located in successive frames in the first step after being detected using point cloud feature extraction， segmented from the point cloud， and located. In the second stage， the motion prediction update branch is used to optimize the target box to obtain a more accurate target position. The results of experiments showed that this approach is successful and better able to handle dramatic shifts in the scene than the conventional filtering approach. Compared with the "detection-track" architecture used in deep-learning methods， it reduces the target losses. Better accurate and robust tracking results could be obtained in lidar point-cloud target tracking.

一种融合运动预测的三维点云目标跟踪算法

3D Point Cloud Target-Tracking Algorithm Fusing Motion Prediction