Reference Extraction from Vietnamese Legal Documents

Ngo Xuan Bach,Nguyen Thi Thanh Thuy,Dang Bao Chien, Trieu Khuong Duy, To Minh Hien,Tu Minh Phuong

Proceedings of the Tenth International Symposium on Information and Communication Technology（2019）

引用 8|浏览11

暂无评分

摘要

Legal and regulatory texts are ubiquitous and important in our life. Automated processing of such documents using natural language processing and information retrieval techniques is desired. Many legal text processing problems require information extraction as a base component. In this paper, we address the task of extracting references from law and regulatory documents, which are necessary for recognition of the relations between documents and document parts, and other problems. We formulate the task as a sequence labeling problem and introduce several extraction models, consisting of both traditional (conditional random fields) and more advanced (deep neural networks) methods. In addition to features learned by deep networks, we investigate various types of manually engineered features that reflect the characteristics of legal documents. Our best model that combines bidirectional long short-term memory networks and conditional random fields achieves 95.35% in the F1 score on a corpus consisting of more than 11 thousand sentences from Vietnamese law and regulatory documents.

查看译文

关键词

Bidirectional Long Short-Term Memory Networks, Conditional Random Fields, Legal Text, Reference Extraction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要