What Is Best For Spoken Language Understanding: Small But Task-Dependant Embeddings Or Huge But Out-Of-Domain Embeddings?

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 11|浏览80
暂无评分
摘要
Word embeddings are shown to be a great asset for several Natural Language and Speech Processing tasks. While they are already evaluated on various NLP tasks, their evaluation on spoken or natural language understanding (SLU) is less studied. The goal of this study is two-fold: firstly, it focuses on semantic evaluation of common word embeddings approaches for SLU task; secondly, it investigates the use of two different data sets to train the embeddings: small and task-dependent corpus or huge and out-of-domain corpus. Experiments are carried out on 5 benchmark corpora (ATIS, SNIPS, SNIPS70, M2M, MEDIA), on which a relevance ranking was proposed in the literature. Interestingly, the performance of the embeddings is independent of the difficulty of the corpora. Moreover, the embeddings trained on huge and out-of-domain corpus yields to better results than the ones trained on small and task-dependent corpus.
更多
查看译文
关键词
spoken language understanding, word embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要