Feature selection from disaster tweets using Spark-based parallel meta-heuristic optimizers

Social Network Analysis and Mining(2022)

Cited 0|Views13
No score
Abstract
Twitter is considered a useful tool for effective tracking and management of disaster-related incidents. However, due to a large number of irrelevant features in textual data, the problem of high dimensionality arises which eventually increases the computational cost and also decreases the classification performance. Thus to handle such type of problem, this work presents Spark-BGWO and Spark-BWOA, an Apache Spark-based parallel implementation of two nature inspired meta-heuristic optimizers, binary gray wolf optimization (BGWO) and binary whale optimization algorithm (BWOA) for optimal feature selection and classification of disaster tweets. Random forests (RF) classifier is applied during wrapper-based feature subset selection and classification process. The performance of proposed optimizers was analyzed on seven benchmark disaster tweet datasets, namely California Wildfires, Hurricane Harvey, Hurricane Irma, Hurricane Maria, Iraq–Iran Earthquake, Mexico Earthquake, and Sri Lanka Floods, and then results were compared with the most recent work on the same datasets. Results showed that both optimizers performed competently in feature selection and classification process, as well as outperform the results of previous work over five out of seven datasets in terms of accuracy and F1-score.
More
Translated text
Key words
Feature selection,Parallel meta-heuristic optimization,Apache Spark,Disaster,Twitter,Classification
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined