Evaluating Automatic Learning of Structure for Event Extraction

ADVANCES IN CROSS-CULTURAL DECISION MAKING(2017)

引用 0|浏览30
暂无评分
摘要
Analysts engaged in monitoring and forecasting benefit from the structured representations of domain knowledge and societal events that allow for the use of advanced analytics and predictive data models over large amounts of temporally extended data. However, extracting structured data from unstructured data typically requires the development of domain specific software which is costly, takes months to years to create, and cannot adapt to changing domains. In this paper we consider the operational usefulness of an approach pioneered by Chambers and Jurafsky(Template-based information extraction without the templates, 2011, [1]) that performs automatic learning of structured domain knowledge in the form of event templates from unstructured text that are used to automatically extract structured events from text. We generalize this approach and apply it to operationally relevant corpora from Brazil, Mexico, Ukraine, and Pakistan that focus on societal protests and providing aid. We discover that we are able to generate compelling event templates that correspond to event types described by Conflict and Mediation Event Observations (CAMEO) codes (Retrieved from Computational Event Data System, 2014, [2]) which are used to label event types by existing state of the art systems. Additionally, we are able to learn event templates that capture more nuance than the CAMEO codes represent, as well as entirely new and interesting event types. To automate our experimentation, we describe novel automated metrics that allow us to batch run multiple experiments while getting automated feedback on the quality of results from each run. These metrics indicate significant overlap between the events we extract and those extracted by existing systems.
更多
查看译文
关键词
NLP,Machine learning,Artificial intelligence,Data analytics,Text analytics,Forecasting,Event extracting,Event coding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要