dstlr: Scalable Knowledge Graph Construction from Text Collections

Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)(2019)

引用 6|浏览5
暂无评分
摘要
We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.
更多
查看译文
关键词
Graph database,Query language,Spark (mathematics),Information retrieval,Scalability,Computer science,Knowledge graph,Training set
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要