Practical Transformer-based Multilingual Text Classification
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021(2021)
摘要
Transformer-based methods are appealing for multilingual text classification, but common research benchmarks like XNLI (Conneau et al., 2018) do not reflect the data availability and task variety of industry applications.We present an empirical comparison of transformer-based text classification models in a variety of practical monolingual and multilingual pretraining and fine-tuning settings.We evaluate these methods on two distinct tasks in five different languages.Departing from prior work, our results show that multilingual language models can outperform monolingual ones in some downstream tasks and target languages.We additionally show that practical modifications such as task-and domain-adaptive pretraining and data augmentation can improve classification performance without the need for additional labeled data.
更多查看译文
关键词
Machine Translation,Natural Language Processing,Topic Modeling,Text Mining,Lexicon-Based Methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要