FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览7
暂无评分
摘要
High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG- HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.
更多
查看译文
关键词
Gene expression data,Feature selection,Spectral clustering,Symmetric uncertainty,Multi-objective genetic algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要