Large scale author name disambiguation in digital libraries

BigData Conference(2014)

引用 33|浏览44
暂无评分
摘要
Person name disambiguation is essential to distinguish between persons that share the same name where unique identifiers are not present. In many domains this is a common problem including digital libraries where the same name can refer to multiple unique authors. Correctly attributing work and citations requires the digital library's database to be disambiguated. In this work we describe a large scale framework for disambiguating author names efficiently and effectively. The framework uses a density based clustering algorithm with a random forest based distance function to clusters unique authors. Effective use of blocking functions allows the clustering algorithm to be run in parallel. In our experiments we show that the framework disambiguates authors of more than 4 million papers in 24 hours.
更多
查看译文
关键词
digital libraries,pattern clustering,author name disambiguation,blocking functions,clustering algorithm,digital libraries,distance function,Clustering,Disambiguation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要