基于模糊邻域的比较密度峰值算法

Clustering by Comparitive Density Peaks using FuzzyNeighborhood

  • 摘要: 聚类作为机器学习中一种重要的无监督学习方式,在图像处理及生物基因分类上具有广泛的应用。《Clustering by fast search and find of density peaks》(DPC)提出通过寻找密度峰对数据进行分类,它既不需要迭代过程,也不需要太多参数输入。但DPC算法在球形数据集上表现较差,容易忽略潜在的聚类中心,且需要人工参与聚类中心选取。针对上述问题,本文采用模糊邻域关系计算数据密度,采用比较距离代替DPC算法中的相对距离。通过对机器学习数据集的实验,将本文提出的算法同DBSCN、OPTICS、DPC在准确率和调整兰德指数上进行比较。实验结果表明本文提出的算法可行有效

     

    Abstract: As an important unsupervised learning method in machine learning, clustering has a wide range of applications in image processing and biological gene classification. "Clustering by fast search and find of density peaks" (DPC) proposes to classify data by looking for density peaks, which does not require an iterative process or too many parameter inputs. However, the DPC algorithm performs poorly on the spherical dataset, and it is easy to ignore the potential cluster center, and needs to manually participate in the cluster center selection. In view of the above problems, this paper uses the fuzzy neighborhood relationship to calculate the data density, and uses the comparative distance instead of the relative distance in the DPC algorithm. Through the experiment of machine learning data set, we compared our algorithm with DBSCN, OPTICS and DPC in accuracy and ARI. The experimental results show that the proposed algorithm is feasible and effective.

     

/

返回文章
返回