Towards more flexible human-machine speech communication

2023 31st Telecommunications Forum (TELFOR)(2023)

引用 0|浏览6
暂无评分
摘要
The research presented in the paper addresses challenges related to the development of more flexible systems for speech communication between humans and machines. Specifically, the paper presents the main results of the speech technology research group at the Faculty of Technical Sciences, University of Novi Sad, Serbia, in the development of a multilingual human-machine communication system. The approach, which fully exploits recent advances in the area of machine learning and artificial intelligence, extends the basic functionality of a text-to-speech system by increasing its flexibility with respect to the speaking style, speaker identity and even language, by means of neural network embedding. At the same time, the performance of automatic speech recognition is improved in terms of its adaptability to different channels and speakers based on machine learning algorithms originally used in image processing. Domain transfer, as well as creation of dynamic dictionaries have played a crucial role in most recent developments in the area of speech recognition. The focus of the research presented in the paper is on the cases when the available quantity of adaptation data is very small, which corresponds to an increased practical usability of proposed approaches in many real world scenarios.
更多
查看译文
关键词
speech synthesis,speech recognition,audio-visual,embedding,neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要