COVIDSum: A linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers

Journal of Biomedical Informatics(2022)

引用 7|浏览13
暂无评分
摘要
COVIDSum (COVID-19 scientific paper Summarization) consists of four major modules: (1) Dataset Preprocessing, (2) Heuristic Sentence Extraction, (3) Word Cooccurrence Graph Construction, and (4) Linguistically Enriched Abstractive Summarization. The Data Preprocessing module retrieves abstract and textual content of each paper and removes papers which have missed abstracts or are not written in English language. Sentence Extraction module applies three heuristic methods to extract sentences of each paper. Word Co-occurrence Relationship Graph Construction module extracts word co-occurrence relationship to construct an un-weighted directed word co-occurrence graph. Linguistically Enriched Abstractive Summarization module proposes a hybrid summarization approach, which utilizes SciBERT and a GATbased graph encoder to encode the word sequences and word co-occurrence graphs respectively, adopts highway networks to fuse the above two encodings for obtaining context vectors of sentences, and applies Transformer decoder to generate summaries.
更多
查看译文
关键词
COVID-19 scientific papers,Abstractive summarization,Linguistically enriched pre-trained language model,SciBERT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要