Text Authorship Identification Based On Ensemble Learning and Genetic Algorithm Combination in Turkish Text

JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI(2022)

引用 2|浏览0
暂无评分
摘要
Aim Determination of important stylistic features in the process of author detection from Turkish texts and automatic author detection with machine learning methods by using these features. Design & Methodology In the study, natural language processing techniques were used in feature extraction, a combination of Genetic Algorithms and Bagging method in feature selection, and Bagging Algorithm with five different classifiers in model creation. Originality Examination of a total of 6 sub-data sets for the author identification process which ensures the selection of the most appropriate data set. The use of classical machine learning algorithms in both classification and feature selection in Bagging. Findings Our study with 40 authors reached 89% accuracy. Conclusion The high values in metrics were achieved despite the excessive number of authors compared to current similar studies. By using Genetic Algorithm and Bagging together in the feature selection process, the accuracy rate increased by 8%. Declaration of Ethical Standards The author(s) of this article declare that the materials and methods used in this study do not require ethical committee permission and/or legal-special permission.
更多
查看译文
关键词
Author identification, ensemble learning, genetic algorithm, feature selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要