Unsupervised Abbreviation Disambiguation Contextual disambiguation using word embeddings.

Manuel R. Ciosici, Tobias Sommer,Ira Assent

arXiv: Computation and Language(2019)

引用 24|浏览23
暂无评分
摘要
As abbreviations often have several distinct meanings, disambiguating their intended meaning in context is important for Machine Reading tasks such as document search, recommendation and question answering. Existing approaches mostly rely on labelled examples of abbreviations and their correct long forms, which is costly to generate and limits their applicability and flexibility. Importantly, they need to be subjected to a full empirical evaluation, which is cumbersome in practice. In this paper, we present an entirely unsupervised abbreviation disambiguation method (called UAD) that picks up abbreviation definitions from text. Creating distinct tokens per meaning, we learn context representations as word embeddings. We demonstrate how to further boost abbreviation disambiguation performance by obtaining better context representations from additional unstructured text. Our method is the first abbreviation disambiguation approach which features a transparent model that allows performance analysis without requiring full-scale evaluation, making it highly relevant for real-world deployments. In our thorough empirical evaluation, UAD achieves high performance on large real world document data sets from different domains and outperforms both baseline and state-of-the-art methods. UAD scales well and supports thousands of abbreviations with many different meanings with a single model.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要