A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation

IEEE Transactions on Fuzzy Systems(2020)

引用 20|浏览111
暂无评分
摘要
Rare words are usually replaced with a single $< $ $unk$ $>$ token in the current encoder–decoder style of neural machine translation, challenging the translation modeling by an obscured context. In this article, we propose to build a fuzzy semantic representation (FSR) method for rare words through a hierarchical clustering method to group rare words together, and integrate it into the encoder–decoder framework. This hierarchical structure can compensate for the semantic information in both source and target sides, and providing fuzzy context information to capture the semantic of rare words. The introduced FSR can also alleviate the data sparseness, which is the bottleneck in dealing with rare words in neural machine translation. In particular, our method is easily extended to the transformer-based neural machine translation model and learns the FSRs of all in-vocabulary words to enhance the sentence representations in addition to rare words. Our experiments on Chinese-to-English translation tasks confirm a significant improvement in the translation quality brought by the proposed method.
更多
查看译文
关键词
Fuzzy semantic representation (FSR),hierarchical clustering,neural network,neural machine translation (NMT)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要