FETD$$^{}$$: A Framework for Enabling Textual Data Denoising via Robust Contextual Embeddings

Govind, Céline Alec,Jean-Luc Manguin,Marc Spaniol

springer

引用 0|浏览0
暂无评分
摘要
Efforts by national libraries, institutions, and (inter-) national projects have led to an increased effort in preserving textual contents - including non-digitally born data - for future generations . These activities have resulted in novel initiatives in preserving the cultural heritage by digitization. However, a systematic approach toward Textual Data Denoising (TD\(^{2}\)) is still in its infancy and commonly limited to a primarily dominant language (mostly English). However, digital preservation requires a universal approach. To this end, we introduce a “Framework for Enabling Textual Data Denoising via robust contextual embeddings” (FETD\(^{2}\)). FETD\(^{2}\) improves data quality by training language-specific data denoising models based on a small number of language-specific training data. Our approach employs a bi-directional language modeling in order to produce noise-resilient deep contextualized embeddings. In experiments we show the superiority compared with the state-of-the-art.
更多
查看译文
关键词
Textual Data Denoising,AI,Contextual representations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要