基于图信号处理的多声源定位方法研究

吴晓欢; 李嘉宁

doi:10.12466/xhcl.2024.10.005

摘要: 波达方向（Direction of Arrival， DOA）估计技术是语音增强和声学探测中的重要工具，对于语音机器人、视频会议、助听器和声呐等应用至关重要。最近出现的DOA估计新方法，例如图信号处理（Graph Signal Processing， GSP）方法，展现出优异的角度估计能力，有望提供更佳的声源DOA估计解决方案。然而，由于在多声源情况下GSP算法由邻接矩阵无法直接得到接收信号特征向量的正交补矩阵，导致多声源下GSP算法失效。为解决此问题，本文基于多源宽带语音信号的频域单源区域检测实现多声源分离，进而利用GSP和聚类算法实现宽带多声源的定位。具体而言，本文首先将GSP方法扩展到频域。其次，利用短时傅里叶变换将信号分为若干时频区域，筛选出单源主导的时频区域后，对其进行频域GSP单源定位。最后，对所有定位结果进行聚类，再通过加权平均获得最终的角度估计。我们利用LibriSpeech语音语料库构建声源信号进行多声源定位仿真，仿真结果证明，本文方法优于其他算法，较高信噪比下可将误差控制在3°以内。此外，我们使用圆形六阵元麦克风阵列，对实际录制的若干组录音数据应用所提算法进行定位测量，结果展示所提算法的定位误差更小，并在声源较为靠近时也能做到较好的分辨。

Abstract: ‍ ‍The Direction of Arrival （DOA） estimation technique is an important tool in speech enhancement and acoustic detection. It has important applications， such as speech robots， video conferencing， hearing aids， and sonar. Recently developed DOA estimation methods， such as Graph Signal Processing （GSP） methods， have demonstrated excellent angle estimation capabilities， offering potential for improved solutions for source DOA estimation. However， in multi-source scenarios， the GSP algorithm fails to directly obtain the orthogonal complement matrix of the received signal feature vectors from the adjacency matrix， rendering it ineffective in such situations. To address this limitation， this paper proposes a multi-source separation based on frequency domain single-source region detection for wideband speech signals， followed by the utilization of GSP and clustering algorithms for wideband multi-source localization. Specifically， this paper first extends the GSP method to the frequency domain； then single-source dominant regions for frequency domain GSP single-source localization are identified by employing a short-time Fourier transform to divide the signal into several time-frequency regions； finally， all localization results are clustered， and the final angle estimation is obtained through weighted averaging. We used the LibriSpeech speech corpus to construct acoustic source signals for multi-source localization simulation. The simulation results demonstrate that our proposed method outperforms other algorithms， with errors being kept within 3° under high signal-to-noise ratio conditions. Additionally， we utilized a circular six-microphone array to conduct localization measurements on several sets of recorded audio data using the proposed algorithm. The results show that the proposed algorithm achieves smaller localization errors and performs better at distinguishing sources when the sources are closer.

基于图信号处理的多声源定位方法研究

Multi-source Localization Based on Graph Signal Processing