一种基于自监督学习的人脸美丽预测方法

A Novel Method to Facial Beauty Prediction Based on Self-supervised Learning

  • 摘要: 人脸美丽预测是研究如何让计算机判断人脸美丽的前沿课题,随着深度学习的不断进展,已经取得了一定效果。然而,基于深度学习的人脸美丽预测需要大量的训练数据和昂贵的人脸美丽标注。因此,如何在少样本条件下取得较好效果,仍有待深入研究。自监督学习可在上游任务中利用无标注数据来学习较好的特征,从而能在下游任务中降低对标注数据的依赖。为此,本文将自监督学习应用于人脸美丽预测,采用批次内对象识别和多视图特征聚类。其中,批次内对象识别通过给每批次不同样本分配独立的伪标签来学习特征,使得网络有区分每个样本对象的能力。多视图特征聚类首先将人脸图像进行多次数据增强;再经过编码器,得到人脸属性特征;最后通过自监督约束使人脸属性特征聚合在一起。基于大规模亚洲人脸美丽数据库(Large Scale Asia Facial Beauty Database, LSAFBD)和SCUT-FBP5500 数据库的实验结果表明,本文所提方法降低了模型对有标注数据的依赖,提高了预测准确率,在少样本条件下优于以Resnet18为基线的有监督学习方法,准确率高于常规自监督学习方法,可广泛应用于目标检测和图像分类等领域。

     

    Abstract: ‍ ‍Facial beauty prediction is a cutting-edge research topic on how to let computers judge the beauty of faces. With the continuous progress of deep learning, certain effects have been achieved. However, facial beauty prediction based on deep learning requires a lot of training data and expensive facial beauty annotation. Therefore, how to achieve better results with few number of labeled samples remains to be studied deeply. Unlabeled face images are very easy to obtain, such as intercepting face images from videos or crawling face images from the network. Obtaining universal face representations from low-cost unlabeled face images can improve facial beauty prediction under the condition of few number. Self-supervised learning can use unlabeled data to learn better features in the upstream task, thus reducing the dependence on labeled data in the downstream task. For this reason, we present to apply self-supervised learning in facial beauty prediction in this paper, which can make full use of data samples and reduce the dependence of the model on labeled data. Self-supervised learning learns general representations from a large number of unlabeled images through upstream tasks. By using self-supervised learning, upstream tasks can extract more abundant general features, migrate the general features extracted from upstream tasks to downstream specific tasks, and improve the feature expression ability of downstream specific tasks. The self-supervised method in this paper includes two stages: the unlabeled pretraining stage and the labeled weight fine-tuning stage. Among them, the pretraining stage is divided into two steps: intra-batch object recognition and multi-view feature clustering. That is learning different points (negative pairs) of different samples through object recognition within the batch, learning the similarity (positive pairs) of sample individuals through multi-view feature clustering. Object recognition within a batch is to set the unlabeled face image of each batch as a unique hot label, so as to identify the difference between each face image in the batch. Multi-view feature clustering firstly enhances the face image for many times, then encodes the data augmented face image to obtain the facial attribute features, and finally aggregates the face image features enhanced by different data through self-supervised constraints. The weight of the facial attribute extracted in the pretraining stage is transferred to the downstream facial beauty prediction task, and then fine-tuned through the labeled facial beauty data, so as to obtain the final facial beauty prediction model. We experiment with the Large-Scale Asian Facial Beauty Database (LSAFBD) and SCUT-FBP5500 database. The method presented in this paper is better than the monitoring method based on Resnet18 under the condition of few number and higher than the accuracy of the conventional self-supervised method. Experimental results based on LSAFBD and SCUT-FBP5500 databases show that when only 1/32 of the original training set is used, accuracy is improved by 14.7% and 6.35% respectively compared with supervised learning. Compared with the traditional self-supervised learning method, it also has a certain improvement under the condition that only 1/2、1/4、1/8、1/16 and 1/32 of the original training set are used. A linear evaluation experiment was carried out to evaluate the effectiveness of the self-supervised learning method. Experimental results show that our method also has a high level. On LSAFBD and SCUT-FBP5500 databases, the self-supervised learning method presented can make full use of data samples, reduce the dependence of the model on labeled data to a certain extent, and improve the prediction accuracy.

     

/

返回文章
返回