A Whisper transformer for audio captioning trained with synthetic captions and transfer learning

Marek Kadlčík,Adam Hájek,Jürgen Kieslich,Radosław Winiecki

CoRR（2023）

引用 0|浏览0

暂无评分

摘要

The field of audio captioning has seen significant advancements in recent years, driven by the availability of large-scale audio datasets and advancements in deep learning techniques. In this technical report, we present our approach to audio captioning, focusing on the use of a pretrained speech-to-text Whisper model and pretraining on synthetic captions. We discuss our training procedures and present our experiments' results, which include model size variations, dataset mixtures, and other hyperparameters. Our findings demonstrate the impact of different training strategies on the performance of the audio captioning model. Our code and trained models are publicly available on GitHub and Hugging Face Hub.

查看译文

关键词

audio captioning,captions,whisper transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要