Abstract:
The Local-Sensitive Hashing (LSH) which is originally proposed to solve approximate nearest neighbor problem is a well-known indexing method. It can deal with the curse of dimensionality and achieve query times that are sub-linear with respect to the data size. However, the strategy how to pick the potential searching collection made the LSH indexing method sensitive to the hash address of data, which means the robustness against noise of this method is poor. In order to avoid that drawback, this paper firstly analyzed the noise effect on the LSH addresses of data based on LSH and p-stable distribution theories, and then proposed a robust audio retrieval method. The main idea of the method is to consider the whole database as a potential searching collection to improve the robustness. At the same time, the LSH addresses are utilized to calculate the similarity so as to avoid the high dimensional vector comparisons between queries and the data in the potential searching collection. Experimental results showed that the robustness against noise of our scheme is superior to that of the LSH indexing method and the retrieval speed can achieve the same level of the LSH indexing method when it combined with vector search algorithms.