Time Difference of Arrival Estimator Based on Generative Adversarial Networks
-
Abstract
The time difference of arrival (TDOA) is a crucial acoustic spatial characteristic that is widely employed in multichannel audio signal processing applications. Traditional TDOA estimators, such as the generalized cross-correlation with phase transform (GCC-PHAT) method, exhibit superior performance under ideal acoustic conditions. However, their accuracy deteriorates significantly under low signal-to-noise ratio (SNR) and strong reverberation conditions. Recent advances in deep learning have spurred the development of data-driven TDOA estimators with high estimation accuracy but limited robustness under severe noise and high reverberation conditions. To address these limitations, this paper proposes a generative adversarial network (GAN)-based TDOA estimator that enhances the robustness of models in low-SNR and highly reverberant environments through adversarial training mechanisms. This study is the first to propose a GAN-based TDOA estimation framework that significantly improves model generalization via adversarial training between the generator and the discriminator. The generator employs gated recurrent units (GRUs) for dimensional expansion of raw audio signals and extracts GCC-PHAT-based cross-correlation features to enhance the model’s sensitivity to time-delay information. The convolutional neural network-based discriminator utilizes multilayer convolutional structures to extract high-dimensional features, which are then fused with either the ground-truth or predicted TDOA values to obtain confidence scores. The generator is optimized using a joint loss function that combines cross-entropy and adversarial losses, while the discriminator shows enhanced discrimination capability for both real and generated TDOA estimates. This design incorporates principles from Wasserstein GANs (WGANs) by integrating the discriminator’s output confidence scores into the generator’s loss function. This approach not only substantially stabilizes model training but also effectively resolves mode collapse issues, and thus, the corresponding performance surpasses the performance boundaries of conventional single-loss-function training schemes. To validate the effectiveness of the proposed method, we conducted comparative experiments on public datasets and thus compared the performance of the proposed framework with those of the classical GCC-PHAT method and state-of-the-art deep learning-based TDOA estimators. The experimental results demonstrate that our method achieves superior performance in acoustic environments characterized by low SNRs and strong reverberation. Thus, it statistically outperforms all baseline methods in terms of TDOA estimation accuracy.
-
-