Analysis of transcription tools for Brazilian Portuguese with focus on disfluency detection.

Simpósio Brasileiro de Fatores Humanos em Sistemas Computacionais (IHC)(2022)

引用 0|浏览2
暂无评分
摘要
Advancements and easier access to technology has led to a greater demand for applications whose interaction is performed through voice recognition, since multimedia content has been a valuable source for computational analysis. In this sense, vocal representations are extracted for various purposes in applications in several areas such as convenience, accessibility, security and sentiment analysis. The main challenge of speech recognition lies in the variability of speakers, environments, devices and the presence of disfluencies during spoken speech. These aspects influence transcription tools, essential when the user requires interaction through voice, aiming at producing texts from this interaction. In particular, detection of disfluencies can help to identify aspects related to the emotional status of the speaker. This work presents an analysis of text transcription tools, with focus in disfluency detection, encompassing the metrics most used for evaluation and databases used in evaluations in the context of Brazilian Portuguese. An experiment was conducted to evaluate the performance of three tools (IBM Watson, Google Speech and Vosk). The Google Speech tool achieved the best performance with average Word Error Rate of 9.69% for fluent sentences and 17.15% for disfluent sentences, followed by IBM Watson with 11.86% and 23.44% and Vosk with 14.39 % and 22.56% respectively.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要