Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language
arxiv(2024)
摘要
Relation extraction is essential for extracting and understanding
biographical information in the context of digital humanities and related
subjects. There is a growing interest in the community to build datasets
capable of training machine learning models to extract relationships. However,
annotating such datasets can be expensive and time-consuming, in addition to
being limited to English. This paper applies guided distant supervision to
create a large biographical relationship extraction dataset for German. Our
dataset, composed of more than 80,000 instances for nine relationship types, is
the largest biographical German relationship extraction dataset. We also create
a manually annotated dataset with 2000 instances to evaluate the models and
release it together with the dataset compiled using guided distant supervision.
We train several state-of-the-art machine learning models on the automatically
created dataset and release them as well. Furthermore, we experiment with
multilingual and cross-lingual experiments that could benefit many low-resource
languages.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要