带约束小生境二进制粒子群优化的生物组学数据集成特征选择

Ensemble Feature Selection Based on Constrained Niching Binary Particle Swarm Optimization for Omics Data Classification

  • 摘要: 针对生物组学数据高维小样本的特点而引起的分类误差较大的问题,提出了一种带约束小生境二进制粒子群优化的集成特征选择方法。该方法利用二进制粒子群优化算法搜索分类准确率最高的特征子集,通过约束粒子编码的置位个数以限制选择特征个数,并加入多模优化中的小生境技术使算法能够一次获得多个差异度较大的特征子集,最后采用集成学习技术将基于多特征子集建立的基分类器集成为强分类器并对数据进行分类学习。实验结果表明,该特征选择方法在生物组学数据上能够稳定选择较少特征并获得较好分类性能。

     

    Abstract: Classification of omics data suffers from the high error rate due to their high dimensional and small sample size characteristics. To overcome the problem, this paper proposes an ensemble feature selection for omics data classification based on constrained niching binary particle swarm optimization (PSO). Particularly, optimal feature subsets in terms of best classification accuracy are identified by the binary PSO. The proposed method introduces constraint on the particle encoding to constrain the number of selected features, and niching technique from multimodal optimization is imposed to enable the algorithm to obtain multiple diverse feature subsets in a single run. Afterward, multiple base classifiers built on the obtained feature subsets are combined into a stronger classifier which is applied to classify the omics data. Experimental results on realworld omics datasets demonstrate that the proposed feature selection method can stably select compact feature subsets and obtain promising classification performance.

     

/

返回文章
返回