基于p-稳定分布局部敏感哈希地址的鲁棒音频检索方法

Robust audio retrieval method using Local-Sensitive Hashing addresses based on p-stable distribution

  • 摘要: 局部敏感哈希(Local-Sensitive Hashing, LSH)索引方法具有快速的优点,对数据规模具有子线性的时间复杂度。但是该方法对待检集合的选取要求苛刻,容易将带噪数据排除在待检集合之外,导致检索精度下降。针对这一缺点,本文从p-稳定分布理论出发,分析噪声对数据的局部敏感哈希地址的影响,并利用数据的哈希地址受噪声影响在原始地址附近偏移的特性,提出一种鲁棒的音频检索方法。该方法将LSH地址直接作为相似性判定的特征,并通过扩大检索范围来提高噪声鲁棒性。实验表明,所提方法在噪声鲁棒性方面优于LSH索引方法;进一步引入向量搜索算法优化后,其检索速度也可达到与LSH索引方法接近的水平。

     

    Abstract: The Local-Sensitive Hashing (LSH) which is originally proposed to solve approximate nearest neighbor problem is a well-known indexing method. It can deal with the curse of dimensionality and achieve query times that are sub-linear with respect to the data size. However, the strategy how to pick the potential searching collection made the LSH indexing method sensitive to the hash address of data, which means the robustness against noise of this method is poor. In order to avoid that drawback, this paper firstly analyzed the noise effect on the LSH addresses of data based on LSH and p-stable distribution theories, and then proposed a robust audio retrieval method. The main idea of the method is to consider the whole database as a potential searching collection to improve the robustness. At the same time, the LSH addresses are utilized to calculate the similarity so as to avoid the high dimensional vector comparisons between queries and the data in the potential searching collection. Experimental results showed that the robustness against noise of our scheme is superior to that of the LSH indexing method and the retrieval speed can achieve the same level of the LSH indexing method when it combined with vector search algorithms.

     

/

返回文章
返回