End-to-End Speech Recognition for Arabic Dialects

Nasr Seham,Duwairi Rehab,Quwaider Muhannad

Arabian journal for science and engineering（2023）

引用 1|浏览4

暂无评分

摘要

Automatic speech recognition or speech-to-text is a human–machine interaction task, and although it is challenging, it is attracting several researchers and companies such as Google, Amazon, and Facebook. End-to-end speech recognition is still in its infancy for low-resource languages such as Arabic and its dialects due to the lack of transcribed corpora. In this paper, we have introduced novel transcribed corpora for Yamani Arabic, Jordanian Arabic, and multi-dialectal Arabic. We also designed several baseline sequence-to-sequence deep neural models for Arabic dialects’ end-to-end speech recognition. Moreover, Mozilla’s DeepSpeech2 model was trained from scratch using our corpora. The Bidirectional Long Short-Term memory (Bi-LSTM) with attention model achieved encouraging results on the Yamani speech corpus with 59% Word Error Rate (WER) and 51% Character Error Rate (CER). The Bi-LSTM with attention achieved, on the Jordanian speech corpus, 83% WER and 70% CER. By comparison, the model achieved, on the multi-dialectal Yem-Jod-Arab speech corpus, 53% WER and 39% CER. The performance of the DeepSpeech2 model has superseded the performance of the baseline models with 31% WER and 24% CER for the Yamani corpus; 68 WER and 40 CER for the Jordanian corpus. Lastly, DeepSpeech2 gave better results, on multi-dialectal Arabic corpus, with 30% WER and 20% CER.

查看译文

关键词

Automatic speech recognition,Arabic dialectal ASR,End-to-end Arabic ASR,Yemeni ASR,Jordanian ASR

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要