Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset
arxiv(2023)
摘要
While recent pre-trained transformer-based models can perform named entity
recognition (NER) with great accuracy, their limited range remains an issue
when applied to long documents such as whole novels. To alleviate this issue, a
solution is to retrieve relevant context at the document level. Unfortunately,
the lack of supervision for such a task means one has to settle for
unsupervised approaches. Instead, we propose to generate a synthetic context
retrieval training dataset using Alpaca, an instructiontuned large language
model (LLM). Using this dataset, we train a neural context retriever based on a
BERT model that is able to find relevant context for NER. We show that our
method outperforms several retrieval baselines for the NER task on an English
literary dataset composed of the first chapter of 40 books.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要