The Fellowship of the Authors: Disambiguating Names from Social Network Context

arxiv(2022)

引用 0|浏览7
暂无评分
摘要
Most NLP approaches to entity linking and coreference resolution focus on retrieving similar mentions using sparse or dense text representations. The common "Wikification" task, for instance, retrieves candidate Wikipedia articles for each entity mention. For many domains, such as bibliographic citations, authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities mostly occur in the context of other named entities. Unlike prior work, therefore, we seek to leverage the information that can be gained from looking at association networks of individuals derived from textual evidence in order to disambiguate names. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We experiment with data consisting of lists of names from two domains: bibliographic citations from CrossRef and chains of transmission (isnads) from classical Arabic histories. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora, and that the availability of bibliographic information, such as publication venue or title, can also increase performance on this task. We also present a novel supervised cluster inference model which gives competitive performance for little computational effort, making it ideal for situations where individuals must be identified without relying on an exhaustive authority list.
更多
查看译文
关键词
names,social network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要