Abstract:
In visual object tracking, the appearance of the target is usually modeled by a bounding-box containing the target, which inevitably introduces background interference. As the scene changes, the concerns become blurred and ambiguous, and then produces tracking drift. Considering the above problems, a Siamese network with multi-attention map for visual object tracking is proposed. Firstly, a Siamese network focusing on foreground feature representation of target is established. The gradient attention loss function is constructed to guide network training and improve the ability of distinguishing target and interference background. In addition, embedding channel attention and spatial attention further strengthens the feature expression of the target, and automatically discovers the distinguished feature expression. Extensive experiments on benchmark datasets demonstrate that the proposed tracker performs favorably, and its ability to achieve real-time visual object tracking.