Toward Accurate and Flexible Arabic Speech Recognition: A Comprehensive Framework

Ezzaldeen Mahyoub Naji, Ajit A Maslekar,Zeyad A.T. Ahmed, Ali Manour Almadani, Mohammed Tawfik, Alhasan Alharbi

2023 Global Conference on Information Technologies and Communications (GCITC)(2023)

引用 0|浏览0
暂无评分
摘要
Continuous speech recognition (ASR/CSR) system for any language is crucial for the interactions between people and computers or machines. ASR systems play a crucial role in numerous applications, including voice-activated personal assistants and speech-to-text transcription. This paper mainly aims to outline the breakthroughs in CSR of the Arabic language and provide a comprehensive framework for Arabic CSR. Some aspects of speech recognition, like acoustic models, language models, and corpora, were addressed in this study. A review of current Arabic speech recognition(SR) research was conducted with a priority on Arabic-language speech. We outline the research approach, which combines both end-to-end(E2E) and modular-based approaches to improve the flexibility and performance of SR systems. The study proposed a comparhensive framework to build accurate and flexiable CSR system. This framework leverages the ensemble technique to integrate two CSR models, representing both the E2E and modular-based approaches. The study discusses audio capture, audio pre-processing techniques, feature extraction methods, and the utilisation of acoustic and language models. The study utilises machine learning techniques, including deep neural networks(DNN), hidden Markov models(HMM), transfer learning, and the ensembling of more than two models to train and optimise the ASR systems.
更多
查看译文
关键词
Speech recognition,Arabic language,deep neural network,Language model,acoustic model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要