End-To-End Training Of Acoustic Models For Large Vocabulary Continuous Speech Recognition With Tensorflow

Ehsan Variani,Tom Bagby,Erik McDermott,Michiel Bacchiani

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION（2017）

引用 28|浏览59

暂无评分

摘要

This article discusses strategies for end-to-end training of state-of-the-art acoustic models for Large Vocabulary Continuous Speech Recognition (LVCSR), with the goal of leveraging TensorFlow components so as to make efficient use of large-scale training sets, large model sizes, and high-speed computation units such as Graphical Processing Units (GPUs). Benchmarks are presented that evaluate the efficiency of different approaches to batching of training data, unrolling of recurrent acoustic models, and device placement of TensorFlow variables and operations. An overall training architecture developed in light of those findings is then described. The approach makes it possible to take advantage of both data parallelism and high speed computation on GPU for state-of-the-art sequence training of acoustic models. The effectiveness of the design is evaluated for different training schemes and model sizes, on a 15.000 hour Voice Search task.

查看译文

关键词

speech recognition, tensorflow

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要