A Research on Length Based Sentence Alignment for Chinese-English Parallel Corpus

Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference(2008)

引用 0|浏览0
暂无评分
摘要
Many existing length based Chinese-English sentence alignment methods compute sentence length in terms of the number of bytes. In this paper, we examine the effectiveness of six different ways of sentence length computation, which take, respectively, the number of verbs, nouns, adjectives, content words, bytes and all words in a sentence as its length. Most previous methods are found memory consuming and inefficient. This paper proposes an alignment method to save memory and time via grouping sentence for alignment. Our experimental results show that taking all words into account in the sentence length computation can further enhance alignment performance, giving 99.01% precision and 99.5% recall, respectively.
更多
查看译文
关键词
different way,sentence alignment,alignment method,chinese-english parallel corpus,memory consuming,content word,sentence length computation,sentence length,existing length,alignment performance,grouping sentence,chinese-english sentence alignment method,computational modeling,nlp,computational linguistics,natural language processing,pattern matching,correlation,noun
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要