Abstract:
Fine-grained image classification task focuses on discriminating diffierent sub-classes under the common category. Because of the exiting larger intra-class variance and larger inter-class similarity, fine-grained image classification task is extremely challenging compare with traditional task. In previous studies, the part-based and the attention-based approaches only focused on mining discriminative regions in images, while ignoring the weak differences used to distinguish confusing categories. This paper proposed a multi-view comprehensive based fine-grained image classification model, which included two branches, one of which based on feature maps to mine local features of the image, and the other branch learned the global features of the image. A combination of embedding loss and softmax loss is introduced to enhance the discriminativeness of features, thereby improving the classification performance of the model. The proposed method only used image-level labels, and the classification accuracies on the three benchmarks of CUB-200-2011, Stanford Cars, and FGVC Aircraft reached 88.3%, 94.3%, and 92.4% respectively. Experimental results show that it has certain advantages for fine-grained image classification task.