结合时序注意力机制的多特征融合行人序列图像属性识别方法
Pedestrian Sequence Attribute Recognition Method with Multi-feature Fusion Combined with Temporal Attention Mechanism
-
摘要: 目前绝大多数的行人属性识别任务都是基于单张图像的,单张图像所含信息有限,而图像序列中包含丰富的有用信息和时序特征,利用序列信息是提高行人属性识别性能的一个重要途径。本文提出了结合时序注意力机制的多特征融合行人序列图像属性识别网络,该网络除了使用常见的空-时二次平均池化特征聚合和空-时平均最大池化特征聚合提取序列的特征外,还设计了空-时3D卷积注意力因子加权特征聚合分支进一步提取序列的特征。通过融合上述3个分支输出的序列的特征,使网络获得更加丰富的信息。此外在网络训练中本文在使用带权值的交叉熵损失基础上,添加了用于约束FP和FN数量的tversky损失作为网络的整体损失函数,使网络在训练过程中对查准率与查全率有更好的权衡。实验结果表明,结合时序注意力机制的多特征融合行人序列图像属性识别网络在各项评价指标中优于基于单张静止图像的方法,以及其他常见的几种特征聚合与时序建模方式。Abstract: The majority of pedestrian attribute recognition tasks are based on a single image. The information contained in a single image is limited, and the image sequence contained rich useful information and temporal features. Using sequence information is an important way to improve the performance of pedestrian attribute recognition. This paper proposed a multi feature fusion pedestrian sequence attribute recognition network based on temporal attention mechanism. In addition to using common spatial-temporal quadratic average pooling feature aggregation and spatial-temporal mean maximum pooling feature aggregation to extract features, the network also designs spatial-temporal attention factor weighted feature aggregation branch to further extract sequence features. By fusing the sequence features of the above three branches, the network can obtain more abundant information. In the spatial-temporal attention factor weighted feature aggregation branches, a full channel spatial-temporal attention factor generation network based on 3D convolution is designed to better capture the spatial-temporal features in a sequence. Based on the cross-entropy loss, this paper adds the Tversky loss, which is used to constrain the number of FP and FN, as the overall loss function of the network, so that the network has a better trade-off between the precision and the recall. The experimental results show that the proposed method is superior to the method based on a single image and other common feature fusion and time series modeling methods in each performance metrics.