Unification-Based Persian Morphology
msra(2000)
摘要
We present a complete formalization of Persian inflectional morphology using a unification-based framework. The morphological analyzer was developed for use in a Persian-English machine translation system; it computes the part of speech categories and returns all syntactically relevant inflectional features for a word. The morphological analyses are represented as feature structures, which can easily be used by a syntactic parser. The morphological formalism consists of a declarative description of rules utilizing typed feature structures. Persian morphotactics include a few prefixes and sequences of suffixes with co- occurrence constraints between non-adjacent morphemes. The verbal inflectional morphology is rich and is characterized by a complex system of conjugations. A morphological rule associates a regular expression describing a set of character strings to a typed feature structure. Rules can be combined using regular expression operators and they can be factorized in conjugation tables. The morphological engine is implemented as a finite-state transducer where the left projection is the input string and the right projection is a typed feature structure. In this paper, we describe the implementation of an inflectional morphological analyzer for Persian, which is based on finite state transducers and typed feature structures with unification. The analyzer was designed to provide an interface to the syntactic parser in the Shiraz Persian-English machine translation system (http://crl.nmsu.edu/shiraz) and was tested on online newspaper articles. The system includes a dictionary with 50,000 entries which is used for lookup after morphological analysis has been performed. This paper also provides a detailed description of Persian inflectional morphology. Persian is an affixal system consisting mainly of suffixes and a few prefixes. The nominal paradigm consists of a relatively small number of affixes but the language has a complete verbal inflectional system, which can be obtained by the combination of prefixes, stems inflections and auxiliaries. The affixes in the language follow a strict morphotactic order. One of the main problems for the analysis of Persian written text is discontinuity in the word structure. Certain affixes in the language are always bound to the stem, while others may appear as either bound or free morphemes. For instance, the plural
更多查看译文
关键词
part of speech,complex system,regular expression,morphological analysis,finite state transducer,machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络