Jointly Denoising and Dereverberation with Maximum Likelihood Beamformer Under Complex Generalized Gaussian Distribution
-
摘要: 提出了一种基于复超高斯分布的多通道联合去噪去混响波束形成器。本文采用复超高斯模型对语音信号建模,在最大似然准则下首次推导出联合去噪去混响波束形成器的解析表达式,并证明了该式是现有多种联合去噪去混响波束形成器的一般化形式。同时通过理论推导证明本文所提算法性能优于传统多通道预测误差算法级联最小功率无失真波束形成器。仿真实验与实际实验结果均表明,本文提出的算法在多个客观指标上明显优于现有联合去噪去混响算法。Abstract: This paper proposes a jointly denoising and dereverberation beamformer based on a complex super-Gaussian distribution. By modelling speech using a complex super-Gaussian distribution, we first derive the optimal denoising and dereverberation beamformer with a maximum likelihood criterion. The paper further proves that the proposed beamformer can be regarded as a generalized framework of many existing jointly denoising and dereverberation methods and also demonstrates that the proposed beamformer outperforms the weighted prediction error algorithm cascaded minimum power distortionless beamformer theoretically. Simulation results and experimental results show that the proposed beamformer does outperform many state-of-the-art joint denoising and dereverberation algorithms in terms of several objective measurements.
-
Keywords:
- speech enhancement /
- denoising and dereverberation /
- beamformer
-
-
表 1 三种算法计算复杂度分析
Table 1 Computational complexity analysis of three algorithms
方法 计算复杂度 M=6, Lw =10, b=4, I=5 WPE+MPDR O(M3(Lw-b+1)3I)+O(M3) O(370656) WPD O((M(Lw-b+1)+M)3I) O(552960) CGG-WPD O((M(Lw-b+1)+M)3I) O(552960) 表 2 REVERB Challenge小型房间测试结果
Table 2 Experiment results of small room in REVERB Challenge
方法 SINR=0 dB SINR=10 dB PESQ ESTOI SDR/dB SRMR/dB PESQ ESTOI SDR/dB SRMR/dB NOISY 1.35 0.38 -0.92 2.30 1.98 0.60 6.49 5.29 WPE+MPDR 1.37 0.39 -1.27 2.50 2.17 0.65 8.00 6.20 WPD 2.09 0.56 4.68 5.32 2.91 0.84 12.12 9.11 CGG-WPD 2.26 0.62 6.17 6.61 3.04 0.85 12.30 9.35 表 3 REVERB Challenge中等房间测试结果
Table 3 Experiment results of medium room in REVERB Challenge
方法 SINR=0 dB SINR=10 dB PESQ ESTOI SDR/dB SRMR/dB PESQ ESTOI SDR/dB SRMR/dB NOISY 1.10 0.31 -2.81 1.84 1.63 0.48 2.32 3.69 WPE+MPDR 1.30 0.40 -0.62 2.65 2.09 0.66 7.59 6.37 WPD 1.91 0.64 5.72 6.31 2.70 0.84 10.72 9.22 CGG-WPD 2.01 0.66 6.09 6.85 2.80 0.85 10.96 9.47 表 4 REVERB Challenge大型房间测试结果
Table 4 Experiment results of large room in REVERB Challenge
方法 SINR=0 dB SINR=10 dB PESQ ESTOI SDR/dB SRMR/dB PESQ ESTOI SDR/dB SRMR/dB NOISY 1.12 0.28 -3.46 1.67 1.57 0.42 1.23 3.13 WPE+MPDR 1.39 0.40 -0.38 2.82 2.10 0.63 7.15 6.43 WPD 1.75 0.51 -1.27 5.60 2.55 0.77 9.87 8.77 CGG-WPD 2.02 0.62 5.92 6.64 2.61 0.78 9.97 8.98 表 5 CHiME-3测试结果
Table 5 Experiment results of CHiME-3
方法 BUS CAF PED STR PESQ ESTOI PESQ ESTOI PESQ ESTOI PESQ ESTOI NOISY 2.40 0.73 1.90 0.56 1.81 0.54 2.03 0.67 WPE+MPDR 2.77 0.80 2.35 0.70 2.20 0.65 2.51 0.76 WPD 2.92 0.87 2.48 0.78 2.33 0.74 2.73 0.85 CGG-WPD 3.02 0.88 2.54 0.80 2.39 0.76 2.83 0.86 -
[1] 潘超,黄公平,陈景东. 面向语音通信与交互的麦克风阵列波束形成方法[J]. 信号处理,2020,36(6):804- 815. doi:10.16798/j.issn.1003-0530.2020.06.002 doi: 10.16798/j.issn.1003-0530.2020.06.002 PAN Chao,HUANG Gongping,CHEN Jingdong. Microphone array beamforming:An overview[J]. Journal of Signal Processing,2020,36(6):804- 815.(in Chinese). doi:10.16798/j.issn.1003-0530.2020.06.002 doi: 10.16798/j.issn.1003-0530.2020.06.002
[2] 冷艳宏,郑成诗,李晓东. 功率比相关子带划分快速独立向量分析[J]. 信号处理,2019,35(8):1314- 1323. LENG Yanhong,ZHENG Chengshi,LI Xiaodong. Fast independent vector analysis using power ratio correlation-based bands partition[J]. Journal of Signal Processing,2019,35(8):1314- 1323.(in Chinese)
[3] CAPON J. High-resolution frequency-wavenumber spectrum analysis[J]. Proceedings of the IEEE,1969,57(8):1408- 1418. doi:10.1109/proc.1969.7278 doi: 10.1109/proc.1969.7278
[4] 郭翔宇,鄢社锋,王文侠. 基于迭代梯度方法的线性约束稳健Capon波束形成快速算法[J]. 信号处理,2021,37(5):712- 723. GUO Xiangyu,YAN Shefeng,WANG Wenxia. A fast algorithm for linear constrained robust capon beamforming based on iterative gradient method[J]. Journal of Signal Processing,2021,37(5):712- 723.(in Chinese)
[5] GANNOT S,COHEN I. Speech enhancement based on the general transfer function GSC and postfiltering[J]. IEEE Transactions on Speech and Audio Processing,2004,12(6):561- 571. doi:10.1109/tsa.2004.834599 doi: 10.1109/tsa.2004.834599
[6] ZHENG Chengshi,DELEFORGE A,LI Xiaodong,et al. Statistical analysis of the multichannel Wiener filter using a bivariate normal distribution for sample covariance matrices[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,26(5):951- 966. doi:10.1109/taslp.2018.2800283 doi: 10.1109/taslp.2018.2800283
[7] CHO B J,LEE J M,PARK H M. A beamforming algorithm based on maximum likelihood of a complex Gaussian distribution with time-varying variances for robust speech recognition[J]. IEEE Signal Processing Letters,2019,26(9):1398- 1402. doi:10.1109/lsp.2019.2932848 doi: 10.1109/lsp.2019.2932848
[8] HABETS E A P,BENESTY J. A two-stage beamforming approach for noise reduction and dereverberation[J]. IEEE Transactions on Audio,Speech,and Language Processing,2013,21(5):945- 958. doi:10.1109/tasl.2013.2239292 doi: 10.1109/tasl.2013.2239292
[9] SCHWARZ A,KELLERMANN W. Coherent-to-diffuse power ratio estimation for dereverberation[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(6):1006- 1018. doi:10.1109/taslp.2015.2418571 doi: 10.1109/taslp.2015.2418571
[10] NAKATANI T,YOSHIOKA T,KINOSHITA K,et al. Speech dereverberation based on variance-normalized delayed linear prediction[J]. IEEE Transactions on Audio,Speech,and Language Processing,2010,18(7):1717- 1731. doi:10.1109/tasl.2010.2052251 doi: 10.1109/tasl.2010.2052251
[11] JUKIĆ A,WATERSCHOOT T VAN,GERKMANN T,et al. Multi-channel linear prediction-based speech dereverberation with sparse priors[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(9):1509- 1520. doi:10.1109/taslp.2015.2438549 doi: 10.1109/taslp.2015.2438549
[12] DRUDE L,BOEDDEKER C,HEYMANN J,et al. Integrating neural network based beamforming and weighted prediction error dereverberation[C]// Interspeech 2018,Hyderabad,India. IEEE,2018:3043- 3047.
[13] SONG Siyuan,CHENG Longbiao,LUAN Shuming,et al. An integrated multi-channel approach for joint noise reduction and dereverberation[J]. Applied Acoustics,2021,171(7):107526- 107534. doi:10.1016/j.apacoust.2020.107526 doi: 10.1016/j.apacoust.2020.107526
[14] NAKATANI T,BOEDDEKER C,KINOSHITA K,et al. Jointly optimal denoising,dereverberation,and source separation[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28(14):2267- 2282. doi:10.1109/taslp.2020.3013118 doi: 10.1109/taslp.2020.3013118
[15] ERKELENS J S,HENDRIKS R C,HEUSDENS R,et al. Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors[J]. IEEE Transactions on Audio,Speech,and Language Processing,2007,15(6):1741- 1752. doi:10.1109/tasl.2007.899233 doi: 10.1109/tasl.2007.899233
[16] PALMER J,WIPF D,KREUTZ-DELGADO K,et al. Variational EM algorithms for non-Gaussian latent variable models[J]. Advances in Neural Information Processing Systems,2006,18(5):1059- 1066.
[17] ZUE V,SENEFF S,GLASS J. Speech database development at MIT:Timit and beyond[J]. Speech Communication,1990,9(4):351- 356. doi:10.1016/0167-6393(90)90010-7 doi: 10.1016/0167-6393(90)90010-7
[18] VARGA A,STEENEKEN H J M. Assessment for automatic speech recognition:II. NOISEX-92:A database and an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Communication,1993,12(3):247- 251. doi:10.1016/0167-6393(93)90095-3 doi: 10.1016/0167-6393(93)90095-3
[19] ALLEN J B,BERKLEY D A. Image method for efficiently simulating small-room acoustics[J]. The Journal of the Acoustical Society of America,1979,65(4):943- 950. doi:10.1121/1.382599 doi: 10.1121/1.382599
[20] RIX A W,BEERENDS J G,HOLLIER M P,et al. Perceptual evaluation of speech quality(PESQ)-a new method for speech quality assessment of telephone networks and codecs[C]// 2001 IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP). Salt Lake City,UT. IEEE,2001:749- 752.
[21] JENSEN J,TAAL C H. An algorithm for predicting the intelligibility of speech masked by modulated noise maskers[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(11):2009- 2022. doi:10.1109/taslp.2016.2585878 doi: 10.1109/taslp.2016.2585878
[22] VINCENT E,GRIBONVAL R,FEVOTTE C. Performance measurement in blind audio source separation[J]. IEEE Transactions on Audio,Speech,and Language Processing,2006,14(4):1462- 1469. doi:10.1109/tsa.2005.858005 doi: 10.1109/tsa.2005.858005
[23] FALK T H,ZHENG Chenxi,CHAN W Y. A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech[J]. IEEE Transactions on Audio,Speech,and Language Processing,2010,18(7):1766- 1774. doi:10.1109/tasl.2010.2052247 doi: 10.1109/tasl.2010.2052247
[24] MARKOVICH S,GANNOT S,COHEN I. Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals[J]. IEEE Transactions on Audio,Speech,and Language Processing,2009,17(6):1071- 1086. doi:10.1109/tasl.2009.2016395 doi: 10.1109/tasl.2009.2016395
-
期刊类型引用(4)
1. 汤永涛,王雪宝,王青波,刘国强. 基于FastICA算法的多源固定频干扰背景下语音信号去噪. 电脑知识与技术. 2024(06): 77-79 . 百度学术
2. 张家扬,何伟,童峰,卢荣富,冯万健. 基于角度压制比谱减的环境自适应双麦语音增强. 厦门大学学报(自然科学版). 2024(02): 296-304 . 百度学术
3. 吴劲芳,齐骥,刘玉龙,王德伟,魏宏杰. 高温高湿环境下大规模海上风电机组零部件的防腐检测. 无损检测. 2024(09): 64-68 . 百度学术
4. 庞凯元,刘桂峰,陈思余,夏菁. 混响背景下基于波束成形的声成像技术. 舰船科学技术. 2023(06): 133-139 . 百度学术
其他类型引用(4)