Effective Crowd Annotation for Relation Extraction

HLT-NAACL, pp. 897-906, 2016.

Cited by: 46|Bibtex|Views83
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Keywords:
positive training instancedistant supervisionfalse positivehumanannotated trainingnlp taskMore(6+)
Weibo:
How does the boost in extractor performance on random training instances labeled with Gated Instruction compare to that with instances labeled using traditional crowdsourcing techniques selected with active learning?

Abstract:

Can crowdsourced annotation of training data boost performance for relation extraction over methods based solely on distant supervision? While crowdsourcing has been shown effective for many NLP tasks, previous researchers found only minimal improvement when applying the method to relation extraction. This paper demonstrates that a much l...More

Code:

Data:

0
Introduction
  • Relation extraction (RE) is the task of identifying instances of relations, such as nationality or place of birth, in passages of natural text.
  • The authors demonstrate that Gated Instruction increases the annotation quality of crowdsourced training data, raising precision from 0.50 to 0.77 and recall from 0.70 to 0.78, compared to Angeli et al.’s crowdsourced tagging of the same sentences.
Highlights
  • Relation extraction (RE) is the task of identifying instances of relations, such as nationality or place of birth, in passages of natural text
  • Does higher quality crowdsourced training data result in higher extractor performance when adding crowdsourcing to distant supervision?
  • How does the boost in extractor performance on random training instances labeled with Gated Instruction compare to that with instances labeled using traditional crowdsourcing techniques selected with active learning?
  • In order to focus on the effect of crowdsourcing, we restricted our attention to four distinct relations between person and location that were used by previous researchers: nationality, place of birth, place of residence, and place of death2
  • Two authors tagged the sample with 87% agreement and reconciled opinions to agree on consensus labels
  • To see how much of the boost over distant supervision comes from the active learning that went into Angeli et al.’s sample JS training, we used Gated Instruction on a randomly selected set of 10K newswire instances from the TAC KBP 2010 corpus (LDC2010E12) that contained at least one NER tag for person and one for location
Results
  • Researchers have explored the idea of augmenting distant supervision with a small amount of crowdsourced annotated data in an effort to improve relation extraction performance (Angeli et al, 2014; Zhang et al, 2012; Pershina et al, 2014).
  • Does Gated Instruction produce training data with higher precision and recall than other research in crowdsourcing for relation extraction?
  • To evaluate the crowdsourced training data quality, the authors hand-tagged the crowdsourced annotations from both the Gated Instruction system and Angeli et al.’s work on 200 random instances.
  • The authors compared adding the 10K crowdsourced instances from the previous experiment to 700K instances from distant supervision, where the crowdsourced data had tags from either Gated Instruction or the original crowdsourcing from Angeli et al The authors compare only with Angeli et al as the authors did not have annotations from Zhang et al for the same training sentences.
  • To see how much of the boost over distant supervision comes from the active learning that went into Angeli et al.’s sample JS training, the authors used Gated Instruction on a randomly selected set of 10K newswire instances from the TAC KBP 2010 corpus (LDC2010E12) that contained at least one NER tag for person and one for location.
  • Figure 7 shows that when training a logistic regression classifier with high quality crowdsourcing data, a single annotation is, more cost effective than using a simple majority of three, five, or more annotations.
  • The authors provide practical and instituted guidelines for a novel crowdsourcing protocol, Gated Instruction, as an effective method for acquiring high-quality training data.
  • This paper describes the design of Gated Instruction, a crowdsourcing protocol that produces high quality training data.
Conclusion
  • GI uses an interactive tutorial to teach the annotation task, provides feedback during training so workers understand their errors, refuses to let workers annotate new sentences until they have demonstrated competence, and adaptively screens low-accuracy workers with a schedule of test questions.
  • The authors show that in contrast to prior work, adding crowdsourced training data substantially improves the performance of the resulting extractor as long as care is taken to ensure high quality crowdsourced annotations.
Summary
  • Relation extraction (RE) is the task of identifying instances of relations, such as nationality or place of birth, in passages of natural text.
  • The authors demonstrate that Gated Instruction increases the annotation quality of crowdsourced training data, raising precision from 0.50 to 0.77 and recall from 0.70 to 0.78, compared to Angeli et al.’s crowdsourced tagging of the same sentences.
  • Researchers have explored the idea of augmenting distant supervision with a small amount of crowdsourced annotated data in an effort to improve relation extraction performance (Angeli et al, 2014; Zhang et al, 2012; Pershina et al, 2014).
  • Does Gated Instruction produce training data with higher precision and recall than other research in crowdsourcing for relation extraction?
  • To evaluate the crowdsourced training data quality, the authors hand-tagged the crowdsourced annotations from both the Gated Instruction system and Angeli et al.’s work on 200 random instances.
  • The authors compared adding the 10K crowdsourced instances from the previous experiment to 700K instances from distant supervision, where the crowdsourced data had tags from either Gated Instruction or the original crowdsourcing from Angeli et al The authors compare only with Angeli et al as the authors did not have annotations from Zhang et al for the same training sentences.
  • To see how much of the boost over distant supervision comes from the active learning that went into Angeli et al.’s sample JS training, the authors used Gated Instruction on a randomly selected set of 10K newswire instances from the TAC KBP 2010 corpus (LDC2010E12) that contained at least one NER tag for person and one for location.
  • Figure 7 shows that when training a logistic regression classifier with high quality crowdsourcing data, a single annotation is, more cost effective than using a simple majority of three, five, or more annotations.
  • The authors provide practical and instituted guidelines for a novel crowdsourcing protocol, Gated Instruction, as an effective method for acquiring high-quality training data.
  • This paper describes the design of Gated Instruction, a crowdsourcing protocol that produces high quality training data.
  • GI uses an interactive tutorial to teach the annotation task, provides feedback during training so workers understand their errors, refuses to let workers annotate new sentences until they have demonstrated competence, and adaptively screens low-accuracy workers with a schedule of test questions.
  • The authors show that in contrast to prior work, adding crowdsourced training data substantially improves the performance of the resulting extractor as long as care is taken to ensure high quality crowdsourced annotations.
Funding
  • This work was supported by NSF grant IIS-1420667, ONR grant N00014-15-1-2774, DARPA contract FA8750-13-2-0019, the WRF/Cable Professorship, and a gift from Google
Study subjects and analysis
workers: 2
Worker agreement with GI was surprisingly high. Two workers agreed on between 78% to 97% of the instances, depending on the relation. The average agreement was 88%

workers: 2
For the experiments presented, unless otherwise noted, we used a variant of majority vote to create a training set. We obtained annotations from two workers for each example sentence and kept the instances where both agreed as our training data. Finally, we ran a learning algorithm on the distant supervision training data, the crowdsourced training data, and a combination of the two

workers: 10
Is the quality of data produced by Gated Instruction high enough to rely on just one annotation per instance?. We randomly select 2K examples from the 20K newswire instances and use Gated Instruction to acquire labels from 10 workers for each sentence. Figure 7 shows that when training a logistic regression classifier with high quality crowdsourcing data, a single annotation is, indeed, more cost effective than using a simple majority of three, five, or more annotations (given a fixed budget)

Reference
  • Gabor Angeli, Julie Tibshirani, Jean Y. Wu, and Christopher D. Manning. 2014. Combining distant and partial supervision for relation extraction. In EMNLP.
    Google ScholarFindings
  • Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • M. Califf and R. Mooney. 1997. Relational learning of pattern-match rules for information extraction. In Workshop in Natural Language Learning, Conf. Assoc. Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mark Craven and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In ISMB.
    Google ScholarFindings
  • Peng Dai, Jeffrey M Rzeszotarski, Praveen Paritosh, and Ed H Chi. 201And Now for Something Completely Different: Improving Crowdsourcing Workflows with Micro-Diversions. In CSCW.
    Google ScholarLocate open access versionFindings
  • A.P. Dawid and A. M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1):20–28.
    Google ScholarLocate open access versionFindings
  • Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 363–370. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matthew R. Gormley, Adam Gerber, Mary Harper, and Mark Dredze. 2010. Non-expert correction of automatically generated relation annotations. In Proceedings of NAACL and HLT 2010.
    Google ScholarLocate open access versionFindings
  • Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S Weld. 2011. Knowledgebased weak supervision for information extraction of overlapping relations. In Proceedings of ACL. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Panagiotis G. Ipeirotis and Evgeniy Gabrilovich. 2014. Quizz: targeted crowdsourcing with a billion (potential) users. In WWW ’14: Proceedings of the 23rd International Conference on the World Wide Web.
    Google ScholarLocate open access versionFindings
  • Heng Ji and Ralph Grishman. 20Knowledge base population: Successful approaches and challenges. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 1148– 1158, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01.
    Google ScholarLocate open access versionFindings
  • Christopher H. Lin, Mausam, and Daniel S. Weld. 2014. To re(label), or not to re(label). In HCOMP.
    Google ScholarFindings
  • Christopher H. Lin, Mausam, and Daniel S. Weld. 2016. Reactive learning: Active learning with relabeling. In AAAI.
    Google ScholarFindings
  • Andrew Mao, Yiling Chen, Eric Horvitz, Megan E Schwamb, Chris J Lintott, and Arfon M Smith. 2013. Volunteering Versus Work for Pay: Incentives and Tradeoffs in Crowdsourcing. In HCOMP.
    Google ScholarFindings
  • Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of ACL. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thien Huu Nguyen and Ralph Grishman. 2014. Employing word representations and regularization for domain adaptation of relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, volume 2, pages 68–74.
    Google ScholarLocate open access versionFindings
  • David Oleson, Alexander Sorokin, Greg P Laughlin, Vaughn Hester, John Le, and Lukas Biewald. 2011.
    Google ScholarFindings
  • Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In Human Computation Workshop, page 11. Maria Pershina, Bonan Min, Wei Xu, and Ralph Grishman. 2014. Infusion of labeled data into distant supervision for relation extraction. In Proceedings of ACL. Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML-2010), pages 148–163. Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. 2013. Relation extraction with matrix factorization and universal schemas. NAACL HLT 2013, pages 74–84.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments