Comparison of Transformer Models for Information Extraction from Court Room Records in Pakistan

2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)(2022)

引用 1|浏览1
暂无评分
摘要
The legal domain has many opportunities when it comes to improvement and innovation through computational advancements. In Pakistan, as the number of reported judgments continues to grow at a rapid rate, it has become essential to process this massive chunk of data to better meet the requirements of the respective stakeholders. However, extracting the required information from this unstructured legal text is challenging. In this paper, we have compared different variations of BERT to see which would be more suited for a machine learning system that can automatically extract information from these publicly available judgments of the Supreme Court of Pakistan. A labelled dataset comprising of thirteen entities has been created using the publicly available legal judgments from the Supreme Court. Different pre-trained BERT models, namely BERT BASE -uncased, BERT BASE -cased and LegalBERT, are then further trained and fine-tuned on the created dataset for Named Entity with F1 scores of 92.47%, 94.72% and 92.5 % respectively. The BERT models have been found to improve the F1 scores of previous studies on a dataset available from Lahore High Court, having smaller number of labels, with the F1 scores of 82.3%, 93.21% and 85.06%, respectively.
更多
查看译文
关键词
court room records,information extraction,transformer models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要