Strategies for enhancing the performance of news article classification in Bangla: Handling imbalance and interpretation

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE(2023)

引用 3|浏览5
暂无评分
摘要
The rapid increase in obtainable online text data has made text categorization an important tool for data analysts to extract relevant information on the web. However, incorrect or incomplete classification of marginalized groups may result from using biased text data. In order to remedy the disparity in available data, this research suggests a system for classifying and analyzing Bangla news articles. The suggested approach first uses both Random Under-Sampling (RUS) and Synthetic Minority Oversampling Techniques to balance the massive unbalanced Bangla News dataset consisting of 4,37,948 instances (SMOTE). Secondly, the proposed system employs three machine learning models: Logistic Regression, Decision Tree, and Stochastic Gradient Descent along with three deep learning models: Artificial Neural Network (ANN), Convolutional Neural Network (CNN), and Bidirectional Encoder Representations from Transformers (BERT) for Bangla text categorization. The experimental results signify the superior performance of BERT to other classification models of the system as well as other existing methods in this domain. The proposed system achieves the maximum accuracy of 99.04% in balanced dataset and 72.23% in imbalanced dataset using BERT. K-fold cross validation with varied K values is used to determine the performance consistency of BERT. Finally, both LIME (Local Interpretable Model agnostic Explanations and SHAP (SHapley Additive exPlanations) techniques are applied for interpreting each prediction made by BERT.
更多
查看译文
关键词
news article classification,imbalance,bangla
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要