基于不准确图像数据清洗的分类方法研究
Research on Classification Method Based on Inaccurate Image Dataset Cleaning
-
摘要: 在使用图像数据集训练神经网络分类模型时,需要大量标注准确的图像数据集,但实际应用中的图像数据集经常含有大量标注错误的图像,标注错误的图像不利于训练准确的神经网络分类模型。然而,标注准确的数据集制作需要消耗大量的时间和人力成本。因此,本文提出了一种基于不准确图像数据清洗的分类框架。在猫狗自然图像上的实验结果表明,具有清洗环节的分类模型的分类准确率得到提升,损失函数的损失值下降。在探讨数据集中含有标签错误图像的比例与分类准确率之间的关系中发现,较深层次的神经网络对数据集中错误图像有一定的鲁棒性,但在图像数据集中标签噪音图像的比例较高时,清洗环节的引入使得较浅的神经网络分类模型也能达到与较深层次的神经网络分类模型相当的分类效果,而较浅神经网络分类模型的运算速度更快。本文为构建快速和准确的分类模型提供了一种新思路。Abstract: When using image data sets to train neural network classification model, a large number of accurately labeled image data sets are needed, but the actual image data sets often contains a large number of mislabeled images, which is not conducive to the training of accurate neural network classification model. However, the production of annotated accurate data sets requires a lot of time and labor costs. Therefore, this paper proposes a classification framework based on inaccurate image data cleaning. Experimental results on natural cat and dog images show that the classification accuracy of the model with cleaning is improved and the loss value of the loss function is decreased. In the study of the relationship between the proportion of mislabeled images in the data set and the classification accuracy, it is found that the deeper neural network has certain robustness to the error images in the data set, but when the proportion of tag noise images in the image data set is high, The introduction of cleaning makes the shallow neural network classification model achieve the same classification effect as the deeper neural network classification model, and the shallow neural network classification model has faster operation speed. This paper provides a new sightseeing into constructing fast and accurate classification model.