Annotating Discourse Relations In Spoken Language: A Comparison Of The Pdtb And Ccr Frameworks

Language Resources and Evaluation(2016)

引用 10|浏览30
暂无评分
摘要
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe our work on annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according to two different discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mapped onto one another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts, and find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly the size of the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.
更多
查看译文
关键词
Annotation of discourse relations (DRs),interoperability of annotation schemes,DRs in spoken and written genres
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要