Analyzing Chinese text with clause relevance structure

Neurocomputing(2023)

引用 0|浏览3
暂无评分
摘要
Discourse structure is generally represented as hierarchical structure, the two most well known representations are rhetorical structure theory (RST) and Penn discourse treebank (PDTB). The main problem of the hierarchical structure is that it can not describe the direct semantic relevance between the elementary discourse units (EDU), especially the non-adjacent and cross-level EDUs. Discourse dependency structure (DDS) has been put forward in recent years to describe the head-dependent relation between the EDUs. However, the judgment process of the head can not be answered theoretically. This problem is particularly serious in Chinese discourse analysis, because Chinese lacks the form differences between the main clauses and the subordinate clauses. In this paper, we propose clause relevance structure to represent the discourse structure. Compared with the hierarchical discourse structure and DDS, the clause relevance structure can effectively describe the direct semantic association between discontinuous and cross-level clauses in a text, and the construction of structure is not presupposed by the head recognition. We propose the judgment criteria and formal constraints of the clause relevance structure, and built a human-annotated corpus on Chinese text. Based on the Chinese corpus, we explore the automatic recognition of clause relevance structure. The clause relevance recognition task is formalized as a classification problem and performed by the BERT-based model. A bidirectional LSTM layer is added on the top of the BERT to improve the performance, and the recognition accuracy (90.77%) is achieved by the BERT-LSTM model. Experimental results show that the long distance clause pairs are the main difficulties in the clause relevance recognition, and these difficulties mainly focus on the positive examples, while the clause pairs with short distance are especially difficult to be correctly recognized as negative relevance.
更多
查看译文
关键词
00-01,99-00
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要