ZHANG Sheng, YANG Jianming. A Multichannel Multitalker Speech Separation Method for Ad-hoc Microphones[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(5): 757-762. DOI: 16798/j.issn.1003-0530.2021.05.008
Citation: ZHANG Sheng, YANG Jianming. A Multichannel Multitalker Speech Separation Method for Ad-hoc Microphones[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(5): 757-762. DOI: 16798/j.issn.1003-0530.2021.05.008

A Multichannel Multitalker Speech Separation Method for Ad-hoc Microphones

  • For ad-hoc microphones, it is a challenge that how to make the best of multichannel audio data to achieve better performance in multi-talker speech separation tasks. This paper introduces a new multichannel speech separation method, i.e., Squeeze-Excitation-Spinal (SES) module, which can explicitly learn latent channel-wise relationship and adaptively update the weights of each channel characteristics without knowing the positions of microphones in advance so that the enhanced effects of speech separation come at the least expense. SES module obtains the representation of global inter-channel dependency by squeezing multichannel feature information into the channel dimension and uses activation functions to screen out valuable feature information based on the representation in a bottleneck unit. The bottleneck unit consists of spinal modules that generate global information and redistributes weights through step-by-step input. We achieved significant improvements on the simulated multichannel LibriSpeech corpus in the evaluation metrics SDR and SI-SDR compared to the single-channel baseline, achieving results comparable to the state-of-the-art (SOTA) ad-hoc microphone multichannel approach.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return