谷歌浏览器插件
订阅小程序
在清言上使用

TBNF:A Transformer-based Noise Filtering Method for Chinese Long-form Text Matching

Applied intelligence(2023)

引用 0|浏览6
暂无评分
摘要
In the field of deep matching, a large amount of noisy data in Chinese long texts affects the matching effect. Most long-form text matching models use all text data indiscriminately, which results in a large amount of noisy data, and thus the PageRank algorithm is combined with Transformer to filter noise. For sentence-level noise detection, after calculating the overlap rate of words to evaluate the similarity, a sentence-level relationship graph is constructed and filtered by using the PageRank algorithm; for word-level noise detection, based on the attention score in Transformer, a word graph is established, then the PageRank algorithm is executed on graph, combined with self-attention weights, to select keywords to highlight topic relevance, the noisy words are filtered sequentially at different layers in the module, layer by layer. In addition, during the model training, PolyLoss is applied to replace the traditional binary Cross-Entropy loss function, thus reducing the difficulty of hyperparameter tuning. Finally, a better filtering strategy is proposed and experiments are conducted to verify it on two Chinese long-form text matching datasets. The result shows that the matching model based on the noise filtering strategy of this paper can better filter the noise and capture the matching signal more accurately.
更多
查看译文
关键词
Long text matching,Noise filtering,Transformer,PageRank
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要