基于音乐认知原理的音乐旋律发现技术
A Music Cognition Based Music Melody Detection Approach
-
摘要: 旋律是音乐主题思想的最重要表现手段,分析音乐的旋律、研究智能旋律分析处理方法是音乐信息检索领域的重要课题之一。本文根据脑神经科学及认知心理学关于人类对音乐感知特点的研究成果,引入听觉显著度(AS)的概念,提出了基于音乐认知理论的模拟人类认知过程的旋律发现技术。在前期处理阶段,针对音乐特性采用常数Q变换(CQT)建立音乐的频谱统计模型,采用贝叶斯理论计算每个半音子带数据分布的听觉显著度特征,利用时序神经网络检测各个时刻的听觉变化,得到旋律分量的候选。在后期处理阶段,我们提出了表达形式接近乐理与认知的旋律流(Melody Stream)的概念,以人对音乐和弦感知结果作为先验知识,进行旋律候选分量的规范化处理。在包含各种乐曲风格的实验音乐数据库上,验证了所提取结果同人类听感的接近程度,根据旋律流来捕捉传统旋律线获得了75%的准确率,主观听感打分对旋律流的接受度超过90%。Abstract: As the most important expression of music’s motivation and subject, melody is specially studied in field of Music Information Retrieval (MIR), and researchers have made great efforts to find intelligent information processing and analysis methods for melody estimation and analysis. Based on the achievements in music cognition domain from both neuroscience and cognitive psychology, this paper applies the concept of auditory saliency (AS) and proposes a novel approach for melody detection in polyphonic music through the simulation of human’s musical cognition mechanism and characteristics. Firstly in the preprocessing stage, the constant Q transform (CQT) is applied for spectrum calculation, and spectrum model estimation. The AS feature for each semitone is calculated using Bayesian theory according to the semitone’s spectrum distribution over every frequency band. A special time accumulation artificial neural network is used to simulate the human neural system in order to detect salient features as melody candidate contents. In the post processing stage, a novel musicology and cognition related concept of Melody Stream is introduced to regulate melody candidates according to chord perception results. The results of the proposed melody detection methods and its similarity to human perception are evaluated on a small dataset with hundreds of music pieces that cover a number of typical music styles. Experiment results showed that the performance of the proposed strategy may cover more than 75% of the traditional melody line, and the subjective acceptance is measured to more than 90%.