Chrome Extension
WeChat Mini Program
Use on ChatGLM

TAMS: Translation-Assisted Morphological Segmentation

Enora Rice, Ali Marashian,Luke Gessler,Alexis Palmer, Katharina von der Wense

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1 Long Papers)(2024)

Cited 0|Views9
No score
Abstract
Canonical morphological segmentation is the process of analyzing words intothe standard (aka underlying) forms of their constituent morphemes. This is acore task in language documentation, and NLP systems have the potential todramatically speed up this process. But in typical language documentationsettings, training data for canonical morpheme segmentation is scarce, makingit difficult to train high quality models. However, translation data is oftenmuch more abundant, and, in this work, we present a method that attempts toleverage this data in the canonical segmentation task. We propose acharacter-level sequence-to-sequence model that incorporates representations oftranslations obtained from pretrained high-resource monolingual language modelsas an additional signal. Our model outperforms the baseline in a super-lowresource setting but yields mixed results on training splits with more data.While further work is needed to make translations useful in higher-resourcesettings, our model shows promise in severely resource-constrained settings.
More
Translated text
Key words
Neural Machine Translation,Language Modeling,Part-of-Speech Tagging,Syntax-based Translation Models,Statistical Machine Translation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined