哈希快速多标记学习算法

Fast Multi-label Learning based on Hashing

  • 摘要: 本文针对多标记学习耗时大、很难处理大规模数据的问题,提出了一种哈希快速多标记学习算法(HFMLL),该算法将哈希算法与多标记学习算法结合,采用局部敏感哈希算法快速获得每个样本的近邻样本,并通过最小独立置换的MinHash算法快速找到每个标记的相关标记,根据其近邻样本及相关标记的信息,运用最大后验概率准则来预测新样本的标记集。实验表明HFMLL 算法在保持较高分类性能的情况下,算法速度明显优于目前的多标记算法,可以广泛应用于大规模的数据集。

     

    Abstract: A Fast Multi-label Learning based on Hashing algorithm (HFMLL) is proposed to solve the problem that many current multi-label learning algorithms are usually time-consuming and difficult to handle large-scale data. The method combines the hashing algorithm and the multi-label algorithm. The HFMLL algorithm takes advantage of a Locality Sensitive Hashing (LSH) to get its neighboring instances for each unseen instance, and calculates the label correlation by estimating the similarity of labels through a min-wise independent permutations locality sensitive hashing (MinHash) scheme. Then, maximum a posteriori principle is used to predict the label set for unseen instances by considering their statistical information attained from all related labels of the neighboring instances. Experiments show that our proposed HFMLL algorithm is superior to current multi-label algorithm in maintaining high classification performance, besides, the method is significantly faster than and achieves the comparable performance with the state-of-art multi-label learning methods ,which can be widely applied to large-scale data sets.

     

/

返回文章
返回