双输入双任务注意力网络融入噪声标签纠正机制的人脸美丽预测

Dual-input Dual-task Attention Network Incorporating Noisy Label Correction Mechanism for Facial Beauty Prediction

  • 摘要: 人脸美丽预测是研究让计算机具有与人相似的人脸美丽预测能力的前沿课题,目前存在监督信息不足、模型易受噪声标签影响等问题。多任务注意力网络(Multi-Task Attention Network,MTAN)利用单个数据库的多种标签类型数据进行监督训练,但忽略了多个仅有一种标签类型的数据库进行多任务训练时效果不佳的问题;同时,未考虑噪声标签对MTAN的影响。噪声标签纠正机制通过比较最大预测概率和标签对应预测概率,来纠正噪声标签。为此,本文结合MTAN,提出双输入双任务注意力网络(Dual-Input Dual-Task Attention Network,DIDTAN),并融入噪声标签纠正机制。其中,DIDTAN能同时利用两个单标签类型人脸美丽数据库的监督信息,从而解决监督信息不足;而该网络融入噪声标签纠正机制,解决了噪声标签的影响,进而提高了人脸美丽预测准确率。DIDTAN将MTAN中任务共享的批量归一化层(Batch Normalization,BN)扩展为不同任务特定的BN层;引入神经辨别性降维(Neural Discriminative Dimensionality Reduction,NDDR)模块约束浅层特征的表达;同时,使用深度相关对齐(Deep CORrelation Alignment,Deep CORAL)损失函数约束全连接层特征表达;通过噪声标签纠正机制来纠正噪声标签。在大规模人脸美丽数据库(Large Scale Facial Beauty Database, LSFBD)、SCUFBP-5500数据库、CelebA数据库上实验,基于LSFBD、SCUFBP-5500数据库的双输入双任务人脸美丽预测取得65.4%的预测准确率,高于常规方法最高准确率。所提方法能实现双输入双任务训练并解决噪声标签影响,提高了人脸美丽预测准确率,可广泛应用于其他存在噪声标签的双输入双任务场景。

     

    Abstract: ‍ ‍Facial beauty prediction is a leading research topic that studies the ability of computers to predict facial beauty similar to that of humans, and currently suffers from insufficient supervisory information, whose models are susceptible to noisy labels. Multi-Task Attention Network (MTAN) utilized a single database with multiple label types for supervised training, but ignored the fact that multiple databases with only one label type did not work well when trained for multiple tasks. The noisy label correction mechanism corrected the noisy labels by comparing the maximum prediction probability with the corresponding prediction probability of the labels. To this end, this paper presented the Dual-Input Dual-Task Attention Network (DIDTAN) in conjunction with MTAN, and incorporated a noise label correction mechanism. In this paper, the supervised information of two single-label type facial beauty databases could be used by DIDTAN simultaneously, thus solving the problem of insufficient supervised information; a noisy label correction mechanism was incorporated by DIDTAN to solve the influence of noisy labels and the accuracy of facial beauty prediction was improved. Batch Normalization (BN) layer shared by tasks in MTAN was extended to different task-specific BN layers in DIDTAN; Neural Discriminative Dimensionality Reduction (NDDR) module was introduced to constrain the expression of shallow features. At the same time, the Deep CORrelation Alignment (Deep CORAL) loss function was used to constrain the expression of fully connected layer features; and noise labels were corrected by a noise label correction mechanism. Experiments on Large Scale Facial Beauty Database (LSFBD), SCUFBP-5500 database and CelebA database showed that the dual-input and dual-task facial beauty prediction based on LSFBD and SCUFBP-5500 database achieved 65.4% prediction accuracy, higher than the highest accuracy of conventional methods. The method presented can achieve dual-input dual-task training and solve the influence of noisy labels, which improves the accuracy of facial beauty prediction and can be widely applied in other dual-input dual-task scenarios whose noisy labels exist.

     

/

返回文章
返回