Collection And Processing Of A Medical Corpus In Ukrainian

Olga Cherednichenko,Olga Kanishcheva,Olena Yakovleva,Denis Arkatov

COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS (COLINS 2020), VOL I: MAIN CONFERENCE（2020）

引用 0|浏览1

暂无评分

摘要

The text corpora are the basis of natural language studying. We describe the structure of a Ukrainian-language corpus (UKRMED), which contains a variety of medical text genres (Clinical protocols, Blogs, and Wikipedia). The paper shows the process of collecting, creating and processing a corpus of medical data in Ukrainian. We represent our own framework for creating a text corpus. The medical domain and text simplification are chosen as corpus directions. The authors gave statistical characteristics of the corpus, an analysis of the morphological parts of speech is provided. Frequency lemmas for this medical corps are analyzed. The UKRMED corpus can be used for solving the task of natural language simplification.

查看译文

关键词

Medicine Corpus, Corpus Linguistic, Ukrainian, Text Collection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要