结合朴素贝叶斯和欧氏距离的二类非均衡数据集成方法

Ensemble Method combine Naive Bayes and Euclidean Distance for Classification Binary Imbalanced Data

  • 摘要: 随着数据挖掘技术的发展,传统集成方法中的集成规则,例如 Max rule, Min rule, Product rule, 以及 Sum rule,已经不能满足现实中对于二类非均衡数据分类正确率的需要。因此本文提出了基于朴素贝叶斯和欧氏距离的二类非均衡数据集成方法。该集成方法是以朴素贝叶斯为基分类器,其集成规则通过引入测试数据与训练数据之间的欧式距离以及训练数据中多数类与少数类之间的关系,在空间距离上加强了最终的分类结果与原始训练数据之间的关联性。实验结果表明,该集成方法在处理二类非均衡数据时,Area Under roc Curve(AUC)值与现存的集成方法相比显著提高,从而具有更好的分类性能。因此,本文方法在处理二类非均衡数据时具有明显优势。

     

    Abstract: With the development of Data Mining, ensemble methods have been widely applied to classify binary imbalanced data. Traditional ensemble rules, such as Max rule, Min rule, Product rule, and Sum rule have been proved could not meet the needs of classification of binary imbalanced data. So this paper proposed an ensemble rule which take Naive Bayes as base classifier and the Euclidean distance between the new data and train data and relations of majority classes and minority classes are taken into account in the new ensemble rule. The reason is that it can strengthen the relationship between the classify results and raw data. Simulation results are provided to confirm that the proposed method has better performance than existing ensemble methods while dealing with binary imbalanced data in the performance of Area Under roc Curve(AUC). So, the proposed method in this paper has a good performance while dealing with binary imbalanced data.

     

/

返回文章
返回