Business text classification with imbalanced data and moderately large label spaces for digital transformation

Applied Network Science（2024）

引用 0|浏览0

暂无评分

摘要

Digital transformation refers to an organization’s use of digital technology to improve its products, services, and operations, aligning them with evolving business requirements. To demonstrate this transformative process, we present a real-life case study where a company seeks to automate the classification of their textual data rather than relying on manual methods. Transitioning to automated classification involves deploying machine learning models, which rely on pre-labeled datasets for training and making predictions on new data. However, upon receiving the dataset from the company, we faced challenges due to the imbalanced distribution of labels and moderately large label spaces. To tackle text classification with such a business dataset, we evaluated four distinct methods for multi-label text classification: fine-tuned Bidirectional Encoder Representations from Transformers (BERT), Binary Relevance, Classifier Chains, and Label Powerset. The results revealed that fine-tuned BERT significantly outperformed the other methods across key metrics like Accuracy, F1-score, Precision, and Recall. Binary Relevance also displayed competence in handling the dataset effectively, while Classifier Chains and Label Powerset exhibited comparatively less impressive performance. These findings highlight the remarkable effectiveness of fine-tuned BERT model and the Binary Relevance classifier in multi-label text classification tasks, particularly when dealing with imbalanced training datasets and moderately large label spaces. This positions them as valuable assets for businesses aiming to automate data classification in the digital transformation era.

查看译文

关键词

Business,Digital transformation,News documents,Performance comparison

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要