基于神经网络的低码率语音编码技术研究综述

Research Review on Low Bit Rate Speech Coding Technology Based on Neural Networks

  • 摘要: 语音编码算法在无线或网络语音的传输过程中具有重要作用,在降低语音编码速率的同时确保编码语音质量不变甚至提升一直是设计者们追求的核心目标。然而,传统语音编解码器在较低速率下经过压缩后的语音音质、可懂度和有效带宽均有明显下降,极大程度上影响了用户的听觉体验。随着人工智能技术的不断进步,深度神经网络模型在语音处理任务中的应用也日益广泛,其性能普遍远超传统方法。在语音编解码领域,近年来很多研究也开始关注如何将神经网络模块融入编解码器,以实现更高效的语音传输,旨在低码率下实现传统方案无法达到的性能,为无线或网络语音传输提供新的解决方案。本文对基于神经网络的低码率语音编解码算法进行全面的整理分析和分类汇总,详细介绍了使用传统方法与神经网络结合的混合式编解码器以及使用编码器-解码器联合训练的端到端编解码器的发展历程、原理、特点及评价指标,并总结了这些方法的优势与不足。最后,结合当前各类编解码器的发展状况,对低速率语音编解码器的未来进行展望。基于神经网络的低码率语音编码技术有望解决实际通话中传输带宽受限时通话质量不佳的问题,为实时语音通信的进一步发展提供有力支持,并为未来压缩编码的研究方向提供新的思路。

     

    Abstract: ‍ ‍Speech coding technology shows potential applications in wireless or network speech transmission. Therefore, designers aim to improve the quality of encoded speech when the rate of speech coding technology is reduced. However, speech codecs implemented by signal processing do not exhibit good results at low bit rates. Speech quality, bandwidth, and intelligibility after codec are considerably reduced in low-bit rate codec algorithms than in high-bit rate codec algorithms, significantly affecting the user’s hearing experience. With the continuous progress in artificial intelligence technology, the application of deep neural network models in speech processing tasks is becoming extremely extensive, and it is more effective than the traditional method. In the field of speech codecs, studies have focused on how to apply neural network modules to codecs to achieve efficient speech transmission. This technique can achieve superior performance compared to traditional schemes at a low bit rate and provide a new solution for wireless voice transmission. In this study, the low-bit rate speech codec algorithms based on neural networks in recent years are comprehensively reviewed, and the development history, the working principle, evaluation metrics, and characteristics of hybrid codecs that combine traditional methods with neural networks and end-to-end codecs that combine training with encoders are introduced in detail. Some studies have shown that codecs based on neural networks can improve objective indicators and the subjective hearing perception of users. Finally, the future development of low-rate speech codecs is prospectively discussed based on the current development of various types of codecs. The author believes that the low-bit rate speech coding technology based on neural networks is expected to address bandwidth limitations in practical calls, provide strong support for the further development of real-time communication, and provide new ideas for future research directions.

     

/

返回文章
返回