基于特征金字塔融合表征网络的跨模态哈希方法

Feature pyramid fusion representation network for cross-modal hashing

  • 摘要: 随着多模态数据的爆发式增长,跨模态检索作为一种搜索多模态数据的最常用方法,受到越来越多的关注。然而,目前存在的大多数深度学习的方法仅仅采用模型后端最后一个全连接层输出作为模态独有的高层语义表征,忽视了多个层次上不同尺度特征之间的语义相关性,具有一定的局限性。为此,本文提出一种基于特征金字塔融合表征网络的跨模态哈希检索方法。该方法设计了一种特征金字塔融合表征网络,通过在多个层次和不同尺度上进行特征提取并融合,挖掘多个层次上不同尺度下模态特征的语义相关性,充分利用模态特有的特征,使网络输出的语义表征更具有代表性。最后设计了三重损失函数:模态间损失,模态内损失和汉明空间损失对模型进行训练学习。实验结果表明,本文所提方法在MIRFLICKR-25K和NUS-WIDE数据集上均获得了良好的跨模态检索效果。

     

    Abstract: With the explosive growth of multi-modal data, cross-modal retrieval, as the most commonly-used method to search multi-modal data, has received extensive attention. However, most of the current deep learning methods only use the output of the final fully connected layer as the modal-special high-level semantic representation, ignoring the semantic correlation between features with different scales extracted from multiple levels, thus have certain limitations. In this paper, we proposed a cross-modal hash retrieval method based on feature pyramid fusion representation network. This method designed a feature pyramid fusion representation network. Through feature extraction and fusion at multiple levels and different scales, the semantic correlation of modal features with different scales at multiple levels is mined, and the modal-special features are fully utilized to make the semantic representation of the network output more representative. Finally, a triple loss function is designed to train the model, including the inter-modal loss, the intra-modal loss, and hamming space loss. The experimental results on both MIRFLICKR-25K and NUS-WIDE datasets show that the proposed method in this paper has obtained good cross-modal retrieval results.

     

/

返回文章
返回