Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing.

Abhinav Garg,Gowtham P. Vadisetti,Dhananjaya Gowda,Sichen Jin,Aditya Jayasimha, Youngho Han,Jiyeon Kim,Junmo Park,Kwangyoun Kim, Sooyeon Kim,Young-Yoon Lee, Kyungbo Min,Chanwoo Kim

INTERSPEECH（2020）

Cited 14|Views38

No score

Abstract

In this paper, we present our streaming on-device end-to-end speech recognition solution for a privacy sensitive voice-typing application which primarily involves typing user private details and passwords. We highlight challenges specific to voice-typing scenario in the Korean language and propose solutions to these problems within the framework of a streaming attention-based speech recognition system. Some important challenges in voice-typing are the choice of output units, coupling of multiple characters into longer byte-pair encoded units, lack of sufficient training data. Apart from customizing a high accuracy open domain streaming speech recognition model for voice-typing applications, we retain the performance of the model for open domain tasks without significant degradation. We also explore domain biasing using a shallow fusion with a weighted finite state transducer (WFST). We obtain approximately 13 % relative word error rate (WER) improvement on our internal Korean voice-typing dataset without a WFST and about 30% additional WER improvement with a WFST fusion.

Translated text

Key words

on-device,end-to-end,privacy-sensitive,voice-typing

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined