Unidad discursiva y relaciones retóricas: un estudio acerca de las unidades de discurso en el etiquetado de un corpus en euskera.
Procesamiento del Lenguaje Natural(2011)
摘要
This article describes the study on the features used for labelling the discourse structure, according to the Rhetorical Structure Theory, at the inter-sentential and intrasentential levels. The tagged corpus is composed of medical texts written in Basque and extracted from the medical journal 'Gaceta Médica de Bilbao'. The difficulties encountered both while identifying the discourse units and while establishing the relations are analysed at each level based on the observation of agreement and disagreement identified in the texts annotated by two annotators. The results obtained suggest that the segmentation into units of discourse is more complex at the intra-sentential level while the assignment of rhetorical relations is more difficult at the inter-sentential level. We also note that some relations occur more frequently at the intra-sentential level and others at the inter-sentential level. However, there are relations that can appear indistinctively in both levels intraand inter-sentential. This study will lay the foundations to carry out the automatic annotation process that the authors intend to perform shortly. K eywords: Annotation, Discourse Analysis, Segmentation, Rhetorical Relations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络