Annotation of discourse phenomena in the Prague Dependency Treebank

SLOVO A SLOVESNOST(2015)

引用 0|浏览0
暂无评分
摘要
Language corpora annotation schemes cover various layers of sentence description nowadays - from morphology to semantics. Annotation projects concerning phenomena beyond the sentence boundaries, however, started to attract the attention of corpus linguists only recently. In the present contribution, we describe a unified approach to analysis of discourse phenomena, aimed and developed for a large-scale annotation of Czech empirical data of the Prague Dependency Treebank. This approach is based on two fundamental pillars: (i) it exploits the results of one of the first complex schemes for discourse annotation proposed and realized in the Penn Discourse Treebank for English; (ii) it follows the Praguian Functional Generative Description and treebanking tradition, taking advantage of the tectogrammatical (underlying) layer of sentence analysis and extending it to a full discourse-level description. Our analysis concentrates on two major aspects of discourse coherence: (i) on discourse relations (semantic relations between discourse segments) and discourse connectives as their lexical anchors; and (ii) on coreference and the so-called bridging anaphora. We present a detailed description of the annotation scheme and procedure, address individual problematic issues and offer basic corpus statistics and annotation evaluation.
更多
查看译文
关键词
text,discourse,phenomena beyond the sentence boundary,discourse relations,discourse connectives,coreference,bridging anaphora,Prague Dependency Treebank
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要