Comparative Analysis Between Macro and Micro-Accuracy in Imbalance Dataset for Movie Review Classification

Suhaimi Nur Suhailayani,Othman Zalinda,Yaakub Mohd Ridzwan

Proceedings of Seventh International Congress on Information and Communication Technology（2022）

引用 1|浏览1

暂无评分

摘要

Classification for multi-class dataset provides exciting and explorative domain to be studied in data science domain. And yet, the challenges of measuring the accuracy of multi-class performance rise an issue worth detailed research to be explored. Due to multi-class accuracy may be lower due to imbalance dataset, this paper aimed to analyze the usage of macro and micro-accuracy in classifying text data with multi-class label. This research focused on text data of movie reviews being classified by three multi-class classifier which are Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF). We set five performance measure to be analyzed; recall, precision, f-score, sensitivity and specificity with regards of micro and macro-accuracy. We successfully yielded a significant result of comparative analysis where average micro-accuracy (87.3%) produced 14.8% higher than macro-accuracy (72.5%) for imbalance dataset. Result also shown a significant gap between balanced and imbalanced dataset. For further analysis, the flexibility of class label in multi-class may be studied to obtain the changing of learning behavior of the classifier as future work.

查看译文

关键词

Multi-class classification, Macro and micro-accuracy, Text classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要