基于点积自注意力卷积神经网络的歌声检测

Singing voice detection algorithm based on a scaled dot-product attention embedded convolutional neural network

  • 摘要: 传统的歌声检测过程往往包含了复杂的特征工程,而基于深度神经网络统一框架的算法则可以利用其强大的学习能力学习到特征,从而忽略特征工程。但是,这些学习到的特征通常得不到重要性区分,在网络中所占权重相同。针对这一问题,提出在卷积神经网络中嵌入点积自注意力模块的算法,该算法通过学习得到各个特征的注意力分布,调整注意力权重,使得卷积神经元在“观察”这些特征时能区分轻重,从而提升网络的整体性能。在实验部分,通过在两个公开数据集下测试,并和基准模型进行对比,证明了该算法对提升歌声检测水平切实有效。

     

    Abstract: The complicated feature engineering usually plays a significantly important role in the conventional singing voice detection algorithm, while it could be neglected in those algorithms based on the deep neural network because they can learn the features through their strong learning capability. However, the learned features are treated equally in the network despite their different importance for the result. To address this problem, a scaled dot-product attention embedded convolutional neural network was proposed, in which attention distribution for the feature maps was achieved by learning, and then the weights of the feature maps were adjusted so that the convolutional neurons could distinctively “observe” the features in terms of importance, resulting in the overall performance improvements. In the experimental section, compared to the base line model, with the experiments on the two public datasets, the results proved the effectiveness of this algorithm.

     

/

返回文章
返回