基于时频感知神经网络的语音频带扩展

Time-Frequency Perception Neural Network for Speech Bandwidth Extension

  • 摘要: 为了进一步提高基于深度学习的语音频带扩展性能,提出了一种基于编解码器的神经网络结构,编码器对数据进行深度特征提取,解码器进行宽带语音重构,并在编解码器中间设计了局部敏感哈希自注意力层,用于增强模型对深度特征的有效选择。编解码器内部使用了时间卷积网络,有效提升了模型对语音时序数据上下文依赖关系的学习能力。为了促进模型朝更加准确的方向训练,还提出了一种时频感知损失函数,有利于模型在时域、频域以及感知域获取窄带语音到宽带语音的最优映射解。通过主观和客观实验结果表明,该方法优于传统方法和近几年基于深度神经网络的语音频带扩展方法。

     

    Abstract: In order to further improve the performance of speech bandwidth extension based on the deep learning, this paper presents a codec for the neural network structure. The encoder extracts the deep feature of data, the decoder reconstructs wideband speech, and in the middle of the codec, there is a locality sensitive hashing self-attention layer, which is used to enhance the model effective choice of depth characteristics. Temporal convolutional networks are used in the codec, which effectively improves the learning ability of the model to the context dependency of speech time series data. In order to train the model in a more accurate direction, a time-frequency perception loss function is proposed, which is beneficial for the model to obtain the optimal mapping solution from narrowband speech to wideband speech in time domain, frequency domain and perception domain. The subjective and objective experimental results show that the proposed method in this paper is superior to the traditional methods and the deep neural network methods for speech bandwidth extension in recent years.

     

/

返回文章
返回