Improving BERT With Self-Supervised Attention

Kou Xiaoyu,Yang Yaming,Wang Yujing,Zhang Ce,Chen Yiren,Tong Yunhai,Zhang Yan,Bai Jing

IEEE ACCESS（2021）

引用 7|浏览79

暂无评分

摘要

One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of these fine-tuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by probing the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, through a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model.

查看译文

关键词

Task analysis, Bit error rate, Predictive models, Data models, Training, Training data, Licenses, Natural language processing, attention model, text classification, BERT, pre-trained model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要