采用经验模态分解的语音与音频通用编码方法

A Unified Speech and Audio Coding with Empirical Model Decomposition

  • 摘要: 为有效解决现有单一模型编码器无法在中低速率对语音和音频信号进行高质量通用编码的问题,本文借助语音与音频信号的谐波特性,建立了一种对语音和音频信号统一编码的方法。首先,本文利用经验模态分解(Empirical Mode Decomposition, EMD)提取输入信号的谐波成分;其次,利用感知匹配追踪算法,并结合正弦参数建模对谐波成分进行参数提取与量化;第三,对于量化谐波后的残差进行抖动格型矢量量化,以提升重建音频的主观听觉质量,并最终实现一套包含24kbps和32kbps码率的宽带语音与音频通用编码器;最后,对所提算法进行了客观PESQ/PEAQ和主观A/B测试,并与ITU-T G.722.1和G.722.2编码器进行了比较,实验结果表明,所提编码器对语音和音频信号的编码质量均优于参考编码器。

     

    Abstract: In this paper, a unified speech and audio coding method that based on Empirical Mode Decomposition (EMD) by exploiting the harmonic structure of input signal was proposed. This coder can achieve a high performance for both speech and audio signals at low and medium bitrates, which cannot be done by the codec with one single analysis model. Prior to the quantization, the EMD was adopted to extract the harmonic components of the input signal, after this, the extracted harmonic signal was modeled and quantized by sinusoidal model and perceptual weighted matching pursuit. For the quantization residual of harmonic signal, the dithered lattice vector quantization was used to improve the subjective quality. Finally, both the objective PESQ/PEAQ results and subjective A/B listening tests show that the proposed coder outperforms the ITU-T G.722.1 and G.722.2 codec.

     

/

返回文章
返回