AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We presented to our knowledge the first published results when using RNN trained by Backpropagation through time in the context of statistical language modeling

Extensions of recurrent neural network language model

Acoustics, Speech and Signal Processing, pp.5528-5531, (2011)

引用1352|浏览266
EI WOS
下载 PDF 全文
引用
微博一下

摘要

We present several modifications of the original recurrent neural net work language model (RNN LM). While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 1...更多

代码

数据

0
简介
  • Statistical models of natural language are a key part of many systems today.
  • There was always a struggle between those who follow the statistical way, and those who claim that the authors need to adopt linguistics and expert knowledge to build models of natural language.
  • The criticism of linguistic approaches was even more straightforward: despite all the efforts of linguists, statistical approaches were dominating when performance in real world applications was a measure
重点内容
  • Statistical models of natural language are a key part of many systems today
  • The most serious criticism of statistical approaches is that there is no true understanding occurring in these models, which are typically limited by the Markov assumption and are represented by n-gram models
  • We presented to our knowledge the first published results when using RNN trained by Backpropagation through time in the context of statistical language modeling
  • We have shown how to obtain significantly better accuracy of RNN models by combining them linearly
  • We plan to show how to further improve accuracy by combining statically and dynamically evaluated RNN models [4] and by using complementary language modeling techniques to obtain even much lower perplexity
结果
  • The authors have shown how to obtain significantly better accuracy of RNN models by combining them linearly.
  • The authors plan to show how to further improve accuracy by combining statically and dynamically evaluated RNN models [4] and by using complementary language modeling techniques to obtain even much lower perplexity
结论
  • The authors presented to the knowledge the first published results when using RNN trained by BPTT in the context of statistical language modeling.
  • The authors have shown how to obtain significantly better accuracy of RNN models by combining them linearly.
  • The authors plan to show how to further improve accuracy by combining statically and dynamically evaluated RNN models [4] and by using complementary language modeling techniques to obtain even much lower perplexity.
  • In the ongoing ASR experiments, the authors have observed good correlation between perplexity improvements and word error rate reduction
表格
  • Table1: Comparison of different language modeling techniques on Penn Corpus. Models are interpolated with KN backoff model
  • Table2: Comparison of different neural network architectures on Penn Corpus (1M words) and Switchboard (4M words)
  • Table3: Perplexities on Penn corpus with factorization of the output layer by the class model. All models have the same basic configuration (200 hidden units and BPTT=5). The Full model is a baseline and does not use classes, but the whole 10K vocabulary
Download tables as Excel
基金
  • This work was partly supported by European project DIRAC (FP6027787), Grant Agency of Czech Republic project No 102/08/0707, Czech Ministry of Education project No MSM0021630528 and by BUT FIT grant No FIT-10-S-2
引用论文
  • Yoshua Bengio, Rejean Ducharme and Pascal Vincent. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137-1155
    Google ScholarLocate open access versionFindings
  • Joshua T. Goodman (2001). A bit of progress in language modeling, extended version. Technical report MSR-TR-2001-72.
    Google ScholarFindings
  • Holger Schwenk, Jean-Luc Gauvain. Training Neural Network Language Models On Very Large Corpora. in Proc. Joint Conference HLT/EMNLP, October 2005.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur: Recurrent neural network based language model, In: Proc. INTERSPEECH 2010
    Google ScholarFindings
  • Y. Bengio, Y. LeCun. Scaling learning algorithms towards AI. In Large-Scale Kernel Machines, MIT Press, 2007.
    Google ScholarFindings
  • Jeffrey L. Elman. Finding Structure in Time. 1990. Cognitive Science, 14, 179-211
    Google ScholarLocate open access versionFindings
  • Mikael Boden. A Guide to Recurrent Neural Networks and Backpropagation. In the Dallas project, 2002.
    Google ScholarLocate open access versionFindings
  • Peng Xu. Random forests and the data sparseness problem in language modeling, Ph.D. thesis, Johns Hopkins University, 2005.
    Google ScholarFindings
  • Denis Filimonov and Mary Harper. 200A joint language model with fine-grain syntactic tags. In EMNLP.
    Google ScholarFindings
  • Ahmad Emami, Frederick Jelinek. Exact training of a neural syntactic language model. In ICASSP 2004.
    Google ScholarLocate open access versionFindings
  • D. E. Rumelhart, G. E. Hinton, R. J. Williams. 1986. Learning internal representations by back-propagating errors. Nature, 323:533.536.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Jirı Kopecky, Lukas Burget, Ondrej Glembek and Jan Cernocky: Neural network based language models for highly inflective languages, In: Proc. ICASSP 2009.
    Google ScholarFindings
  • F. Morin, Y. Bengio: Hierarchical Probabilistic Neural Network Language Model. AISTATS’2005.
    Google ScholarFindings
  • J. Goodman. Classes for fast maximum entropy training. In: Proc. ICASSP 2001.
    Google ScholarLocate open access versionFindings
  • A. Alexandrescu, K. Kirchhoff. 2006. Factored neural language models. In HLT-NAACL.
    Google ScholarFindings
  • Yoshua Bengio and Patrice Simard and Paolo Frasconi. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Transactions on Neural Networks, 5, 157-166.
    Google ScholarLocate open access versionFindings
  • Y. Bengio, J.-S. Senecal. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. IEEE Transactions on Neural Networks, 2008.
    Google ScholarLocate open access versionFindings
  • Ahmad Emami. A Neural Syntactic Language Model. Ph.D. thesis, Johns Hopkins University, 2006.
    Google ScholarFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn