CS-BPSO: Hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis

Knowledge-Based Systems(2021)

引用 20|浏览0
暂无评分
摘要
Email authorship analysis is a challenging task involving the detection of an author’s style to help determine their identity. Emails represent a widespread application of big data, and email authorship analysis is widely performed in the forensic linguistics field. However, the high-dimensional feature space encountered in authorship analysis affects the classification performance. Moreover, the Arabic language is highly inflected and involves certain unique characteristics, which pose critical challenges in identifying the context. Therefore, the selection of prominent features is a critical step in realizing authorship analysis. Swarm intelligence (SI) algorithms are widely adopted to address such feature selection problems. In this study, an efficient hybrid feature selection algorithm based on binary particle swarm optimization (BPSO) and chi-square BPSO (CS-BPSO) was developed to enhance the performance of Arabic email authorship analysis. Static and dynamic features were considered. Experiments were conducted on Arabic email messages collected from a sample population to test the algorithm performance using three popular classifiers: support vector machine (SVM), K-nearest neighbour (KNN), and naïve Bayes (NB) classifiers. Different metrics, specifically, the accuracy, precision, recall, and f1-score, were considered as performance measures. The results showed that the CS-BPSO method achieves impressive results using dynamic features. The findings were quite satisfactory in terms of solving multiple types of difficulties, e.g., imbalanced dataset, small dataset, and short text.
更多
查看译文
关键词
Swarm intelligence,Particle swarm optimization (PSO),Hybrid feature selection,Short texts,Chi-square,Arabic email text,Forensic analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要