Research Review on Low Bit Rate Speech Coding Technology Based on Neural Networks
-
Graphical Abstract
-
Abstract
Speech coding technology shows potential applications in wireless or network speech transmission. Therefore, designers aim to improve the quality of encoded speech when the rate of speech coding technology is reduced. However, speech codecs implemented by signal processing do not exhibit good results at low bit rates. Speech quality, bandwidth, and intelligibility after codec are considerably reduced in low-bit rate codec algorithms than in high-bit rate codec algorithms, significantly affecting the user’s hearing experience. With the continuous progress in artificial intelligence technology, the application of deep neural network models in speech processing tasks is becoming extremely extensive, and it is more effective than the traditional method. In the field of speech codecs, studies have focused on how to apply neural network modules to codecs to achieve efficient speech transmission. This technique can achieve superior performance compared to traditional schemes at a low bit rate and provide a new solution for wireless voice transmission. In this study, the low-bit rate speech codec algorithms based on neural networks in recent years are comprehensively reviewed, and the development history, the working principle, evaluation metrics, and characteristics of hybrid codecs that combine traditional methods with neural networks and end-to-end codecs that combine training with encoders are introduced in detail. Some studies have shown that codecs based on neural networks can improve objective indicators and the subjective hearing perception of users. Finally, the future development of low-rate speech codecs is prospectively discussed based on the current development of various types of codecs. The author believes that the low-bit rate speech coding technology based on neural networks is expected to address bandwidth limitations in practical calls, provide strong support for the further development of real-time communication, and provide new ideas for future research directions.
-
-