Feature selection from disaster tweets using Spark-based parallel meta-heuristic optimizers

Mohammed Ahsan Raza Noori,Bharti Sharma,Ritika Mehra

Social Network Analysis and Mining(2022)

引用 0|浏览7
Twitter is considered a useful tool for effective tracking and management of disaster-related incidents. However, due to a large number of irrelevant features in textual data, the problem of high dimensionality arises which eventually increases the computational cost and also decreases the classification performance. Thus to handle such type of problem, this work presents Spark-BGWO and Spark-BWOA, an Apache Spark-based parallel implementation of two nature inspired meta-heuristic optimizers, binary gray wolf optimization (BGWO) and binary whale optimization algorithm (BWOA) for optimal feature selection and classification of disaster tweets. Random forests (RF) classifier is applied during wrapper-based feature subset selection and classification process. The performance of proposed optimizers was analyzed on seven benchmark disaster tweet datasets, namely California Wildfires, Hurricane Harvey, Hurricane Irma, Hurricane Maria, Iraq–Iran Earthquake, Mexico Earthquake, and Sri Lanka Floods, and then results were compared with the most recent work on the same datasets. Results showed that both optimizers performed competently in feature selection and classification process, as well as outperform the results of previous work over five out of seven datasets in terms of accuracy and F1-score.
Feature selection,Parallel meta-heuristic optimization,Apache Spark,Disaster,Twitter,Classification
AI 理解论文
Chat Paper