Relevance-diversity algorithm for feature selection and modified Bayes for prediction

M. Shaheen, N. Naheed,A. Ahsan

Alexandria Engineering Journal(2023)

引用 4|浏览0
暂无评分
摘要
Big data analytics uncovers hidden patterns through classification, prediction and reinforcement of big datasets. In these datasets, some features have a negligible connection with other features and some may be insignificant as their presence does not impact the results of big data analytics. The algorithms of big data analytics generate better classification models when supplied with a dataset consisting of relevant, important and informative features. These features can be classified as important and unimportant. For the selection of important features, different filtrations techniques are used. These techniques filter features on different basis like information gain, information dispersion, Gini index, etc. and have a few drawbacks reviewed in this paper. The first contribution of this paper is to propose a new feature selection technique named “Relevance-diversity algorithm” for selecting important features based on two measures i.e. relevance and diversity for optimizing features as low as possible and reducing the search time used in feature selection. The second contribution of the paper is that it proposes a new supervised classification algorithm based on Naive Bayes classification. The assumption of naive i.e. feature independence is discarded from the algorithm of Naive Bayes classification. The features are considered to be dependent on each other and their combined impact on the class value is evaluated. The newly proposed classification algorithm is then applied to the features selected through the relevance-diversity based feature selection technique. The datasets of Weather, Tic-Tac-Toe, Lenses, Balance-scale and CarEval are used for the evaluation of both the techniques. The results of the proposed feature selection method are compared with the existing methods and the results of Modified-Bayes are compared with the existing Naive Bayes algorithm. Analysis revealed that the proposed method performed better in terms of the number of features, accuracy and time complexity.
更多
查看译文
关键词
Naive Bayes,Feature Selection,Relevance,Attributes Selection,Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要