Multi-Scale Self-Attention for Text Classification

national conference on artificial intelligence, 2020.

Cited by: 5|Views98

Abstract:

In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we ...More

Code:

Data:

Your rating :
0

 

Tags
Comments