A Case Study on Start-up of Dataset Construction: In Case of Recipe Named Entity Corpus

Yoko Yamakata,Keishi Tajima,Shinsuke Mori

2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)（2018）

引用 6|浏览27

暂无评分

摘要

In this paper, we report our experience in constructing a cooking recipe text corpus. We describe problems we found and explain how we managed them. One of the problems we faced in the construction of our recipe corpus is the difficulty of establishing a clear, stable, and complete guideline instructing annotators how to annotate. During the annotation, we found many unexpected cases for which the pre-defined guideline is not clear enough, and even cases for which the pre-defined guideline provides no guidance at all. As a result, we needed to update the guideline twice during the annotation, and also needed to revise annotations we have done before the updates. During that process, we have several trade-offs, and it is not easy to decide when and how often we should revise the annotations. It is even unclear whether we should revise them or should instead use the human resource for annotating more data. We show an experiment, whose result suggests that we should revise the old annotations. Another problem we had is the management of versions of the guideline, sets of annotations corresponding to them, and communication between participants.

查看译文

关键词

dataset construction,cooking recipe text corpus,clear guideline,unexpected cases,pre-defined guideline,old annotations,stable guideline,recipe named entity corpus

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要