Native Language Identification for Russian.

ICDM Workshops（2019）

Cited 3|Views3

No score

Abstract

The task of recognizing the author’s native language based on a text (Native Language Identification - NLI) is the task of automatically recognizing native language (L1) based on texts written in a language that is not native to the author. The NLI task was studied in detail for the English language, and two shared tasks were conducted in 2013 [1] and 2017 [2], where TOEFL English essays and essay samples were used as data. There is also a small number of works where the NLI problem was solved for other languages, among which Russian has not yet been studied. This paper discusses the use of well-established approaches in the NLI Shared Task 2013 and 2017 competitions to solve the problem of recognizing the authoru0027s native language, as well as to recognize the type of speaker — learners of Russian or Heritage Russian speakers. The classifier presented in this paper is based on the support vector machine (SVM) using the TF-IDF metric. This study is data-driven and is possible thanks to the Russian Learner Corpus developed by the HSE Learner Russian Research Group [3] on the basis of which experiments are being conducted.

Translated text

Key words

native language identification,NLI,support vector machine,SVM,term frequency,inverse term frequency,TF-IDF,Russian Learner Corpus

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined