基于多源语音信息融合的帕金森病辅助检测方法
Auxiliary Detection Method of Parkinson’s Disease Based on Multi-Source Speech Information Fusion
-
摘要: 在病程早期,帕金森病患者由于发声器官的灵活协调能力下降,会出现发音困难、发音不稳定等症状。为分析受试者的言语能力,专家基于上述生理现象设计了包括持续元音、重复音节以及情景对话在内的多类型语料。已有的帕金森病语音检测研究大多基于单类型语料,可评估受试者部分声学器官的协调能力,但无法全面地反映受试者的发声状况,且易受采集环境、个体差异等因素的影响。针对上述问题,本文提出一种用于帕金森病辅助检测的多源语音信息融合模型,旨在充分利用多类型语料获得的多源语音数据,提取丰富全面的病理信息,抵御非病理性因素的影响。所提模型由编码器模块、解码器模块和分类器模块组成。其中,编码器模块通过多个支路分别学习各单源语音数据中的特有信息,并通过一个基于多头注意力机制的多源信息融合分支实现更细粒度的信息交互,学习多源语音数据的共有信息,从而全面提取多源数据所携带的病理信息;解码器模块帮助编码器模块实现信息压缩去冗余;分类器模块根据编码器输出完成帕金森病检测,并辅助编码器模块学习紧凑的病理信息表示。为进一步确保特有信息和共有信息的提取,模型对特有信息和共有信息实施了正交约束。本文在包含340个语音样本的自采数据集上进行了多个对比实验。实验结果显示,所提模型在帕金森病检测的准确率、敏感度和F1分数等各项性能指标上相较于基于单源语音数据的模型分别提高了6%、3%、6%;同时,共有信息与特有信息的有效整合,也使得所提模型相较于其他信息融合模型在准确率指标上提高了2.8%以上。Abstract: In the early stages of Parkinson’s disease, patients develop symptoms such as difficulties in pronunciation and unstable articulation due to a decrease in the flexible coordination ability of the vocal organs. To analyze the speech ability of the subjects, the experts design a multi-type corpus, including sustained vowels, repetitive syllables, and situational dialogues, based on the aforementioned physiological phenomena. Existing researches on Parkinson’s disease speech detection mostly rely on single-type corpora, which can evaluate the coordination ability of certain acoustic organs in the subjects but cannot comprehensively reflect the subjects’ vocal conditions and are susceptible to factors such as the collection environment and individual differences. To address the aforementioned issues, a multi-source speech information fusion model for assisting Parkinson’s disease detection was proposed in this paper. The aim was to fully utilize the multi-source speech data obtained from diverse types of corpora, extract comprehensive and rich pathological information, and counteract the influence of non-pathological factors. The proposed model consists of an encoder module, a decoder module, and a classifier module. In the encoder module, multiple branches are employed to learn the specific information from each individual source of speech data. Through a multi-head attention mechanism-based fusion branch, finer-grained information interaction is achieved, enabling the learning of common information present in the multi-source speech data, thereby comprehensively extracting the pathological information carried by the multi-source data. The decoder module assists the encoder module in information compression and redundancy elimination. The classifier module detects Parkinson’s disease based on the output of the encoder module, while also aiding the encoder module in learning compact representations of pathological information. To further ensure the extraction of specific and common information, the model imposes orthogonal constraints on these information components.Multiple comparative experiments were conducted, based on a self-collected dataset containing 340 speech samples. The experimental results demonstrated that the proposed model outperformed the models based on single-source speech data in terms of accuracy, sensitivity, and F1 score for Parkinson’s disease detection, with improvements of 6%, 3%, and 6% respectively. Moreover, the effective integration of common and specific information enabled the proposed model to achieve more than a 2.8% improvement in accuracy compared to other information fusion models.