Practical Transformer-based Multilingual Text Classification

2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021（2021）

引用 12|浏览63

暂无评分

摘要

Transformer-based methods are appealing for multilingual text classification, but common research benchmarks like XNLI (Conneau et al., 2018) do not reflect the data availability and task variety of industry applications.We present an empirical comparison of transformer-based text classification models in a variety of practical monolingual and multilingual pretraining and fine-tuning settings.We evaluate these methods on two distinct tasks in five different languages.Departing from prior work, our results show that multilingual language models can outperform monolingual ones in some downstream tasks and target languages.We additionally show that practical modifications such as task-and domain-adaptive pretraining and data augmentation can improve classification performance without the need for additional labeled data.

查看译文

关键词

Machine Translation,Natural Language Processing,Topic Modeling,Text Mining,Lexicon-Based Methods

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要