基于多视角融合的细粒度图像分类方法

黄伟锋; 张甜; 常东良; 闫冬; 王嘉希; 王丹; 马占宇

doi:10.16798/j.issn.1003-0530.2020.09.027

基于多视角融合的细粒度图像分类方法

Multi-View Comprehensive Based Fine-Grained Image Classification

摘要

摘要: 细粒度图像分类的目标是区分同一个常见类下的不同子类，由于数据集往往存在较大的类内差异和较大的类间相似性，细粒度图像分类相比于传统图像分类具有更大的挑战性。以往工作中，基于组件的方法和基于注意力的方法致力于挖掘图像中的判别力区域，而忽视了用来区分易混淆类别的微弱差异。为了解决以上问题，本文提出了一个基于多视角融合的细粒度图像分类方法，包含两个分支，其中一个分支基于特征图挖掘图像的局部特征，另一个分支则学习图像的全局特征。同时引入一种嵌入损失，与传统多分类交叉熵损失函数结合增强特征的判别性，进而提升模型的分类性能。所提方法仅使用图像级标签，在CUB-200-2011，Stanford Cars和FGVC Aircraft这三个基准数据集上的分类准确率分别达到了88.3%，94.3%和92.4%，实验结果表明所提方法相比其它细粒度图像分类方法具有一定的优越性。

Abstract: Fine-grained image classification task focuses on discriminating diffierent sub-classes under the common category. Because of the exiting larger intra-class variance and larger inter-class similarity, fine-grained image classification task is extremely challenging compare with traditional task. In previous studies, the part-based and the attention-based approaches only focused on mining discriminative regions in images, while ignoring the weak differences used to distinguish confusing categories. This paper proposed a multi-view comprehensive based fine-grained image classification model, which included two branches, one of which based on feature maps to mine local features of the image, and the other branch learned the global features of the image. A combination of embedding loss and softmax loss is introduced to enhance the discriminativeness of features, thereby improving the classification performance of the model. The proposed method only used image-level labels, and the classification accuracies on the three benchmarks of CUB-200-2011, Stanford Cars, and FGVC Aircraft reached 88.3%, 94.3%, and 92.4% respectively. Experimental results show that it has certain advantages for fine-grained image classification task.

HTML全文

参考文献(0)

施引文献

资源附件(0)