Linguistic Feature Injection for Efficient Natural Language Processing.

IJCNN(2023)

引用 0|浏览7
暂无评分
摘要
Transformers have been established as one of the most effective neural approach in performing various Natural Language Processing tasks. However, following common trend in modern deep architectures, their scale has quickly grown to an extent that reduces the concrete possibility for several enterprises to train such models from scratch. Indeed, despite their high-level performances, Transformers have the general drawback of requiring a huge amount of training data, computational resources and energy consumption to be successfully optimized. For this reason, more recent architectures like Bidirectional Encoder Representations from Transformers rely on unlabeled data to pre-train the model, which is later fine-tuned for a specific downstream task using a relatively smaller amount of training data. In a similar fashion, this paper considers a plug-and-play framework that can be used to inject multiple syntactic features, like Part-of-Speech Tagging or Dependency Parsing, into any kind of pre-trained Transformer. This novel approach allows to perform sequence-to-sequence labeling tasks by exploiting: (i) the (more abundant) available training data that is also used to learn the syntactic features, (ii) the language data that is used to pre-train the transformer model. The experimental results show that our approach improves over the baseline performances of the underlying model in different datasets, thus proving the effectiveness of employing syntactic language information for semantic regularization. In addition, we show that our architecture has a huge efficiency advantage over pure large language models. Indeed, by using a model with limited size, but whose input data are enriched with syntactic information, we show that it is possible to obtain a significant reduction of CO2 emissions without decreasing the prediction performances.
更多
查看译文
关键词
-play framework,available training data,baseline performances,Bidirectional Encoder Representations,common trend,computational resources,concrete possibility,effective neural approach,efficient natural language processing,energy consumption,general drawback,high-level performances,huge efficiency advantage,language data,linguistic feature injection,modern deep architectures,multiple syntactic features,Natural Language Processing tasks,Part-of-Speech Tagging,pre-trained Transformer,prediction performances,pure large language models,recent architectures,sequence-to-sequence labeling tasks,specific downstream task,syntactic information,syntactic language information,transformer model,Transformers,unlabeled data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要