基于语速调整和音位属性后验概率的音素识别

A Speaking Rate Adaptation Technique and phonological Attribute Posterior for Phone Recognition

  • 摘要: 基于语音事件检测的自动语音识别是当前研究的热点问题。针对说话人语速变化导致模型适应性差的问题,提出了一种语速自适应调整算法。该算法以语句为单位,采用连续变化的帧长与帧移间隔对语句进行归一化调整,使调整后速率与语料库平均速率一致,减小速率因素对模型训练的影响;另外,通过计算音位属性的后验概率向量间夹角,得到测试集的语速,相比采用训练模型的语速检测方法减轻了系统负担。本文将语速调整算法应用于音位属性的提取,并对音位属性特征进行非线性变换,最后采用隐马尔科夫模型进行建模,实验表明:经过语速调整后,音素的平均持续帧数较为恒定,动态变化范围减小,使得音素识别率提升了1.3%。

     

    Abstract: The event detection-based method has become state of the art technique in Automatic Speech Recognition (ASR).The differences in speaking rate may impair the adaptation ability of acoustical models, On account of this, A novel adaptation algorithm is proposed in this paper, which adjust the frame and step size in the front end of the system with the cell of one utterance, after adaptation, the speaking rate consistent with the average rate of the speech corpus and decreasing it’s effect in model training. In addition, this method calculates the angle between vectors of the posterior probability to get the speed of the testing set, which eased the burden of system compared to that by training models. The algorithm was used in the pre-processing before the phonological features detection stage, and then with the nonlinear transformation, we put them as the observation of Hidden Markov Models based phone recognition systems. After the adaptation approach, the average frame of one phone in an utterance becomes constant and the dynamic range decreases, therefore the phoneme classification rate increase about 1.3%.

     

/

返回文章
返回