A spelling corrector for Basque based on morphology

Itziar Aduriz,Iñaki Alegria,Xabier Artola,Nerea Ezeiza,Kepa Sarasola,Miriam Urkia

Literary and Linguistic Computing（1997）

引用 33|浏览17

暂无评分

摘要

This paper describes the components used in the elaboration of the commercial Xuxen spelling checker/corrector for Basque. Because Basque is a highly inflected and agglutinative language, the spelling checker/corrector has been conceived as a by-product of a general purpose morphological analyser/generator (Alegria et al., 96). The two-level model of morphology (Koskenniemi, 83) that we use is based on two main components —see Sproat (1992): • A lexicon where the morphemes (lemmas and affixes) and the possible links among them (morphotactics) are defined. • A set of rules which controls the mapping between the lexical level and the surface level due to the morphonological transformations (morphophonemics). There are four kind of rules: context restriction rules "=>" (lexical character may be realized as the lexical one in the given context), surface coercion rules "<=" (lexical character must be realized as the lexical one in the given context), composite rules "<=>" (lexical character must be realized as the lexical one in the given context and this change is licit only in this context) and exclusion rules (lexical character may not be realized as the lexical one in the given context). The rules are independent from the morphotactics. The rules are compiled into transducers, so it is possible to apply the system for both analysis and generation. In order to increase the coverage and the robustness, the analyser has been designed in an incremental way and it consists of three main modules: the standard analyser, the analyser of linguistic variants —due to dialectal uses and competence errors—, and the analyser without lexicon which can recognize word-forms without having their lemmas in the lexicon. An important feature of the analyser is its homogeneity as the three different steps are based on two-level morphology, very different from ad-hoc solutions. This analyser is a basic tool for current and future work on automatic processing of Basque and its first applications is the commercial spelling corrector named Xuxen that is presented here. First we describe the subsystem added to the analyser in order to increase relevantly the coverage in competence errors

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要