Speech Translation with Large Language Models: An Industrial Practice
CoRR(2023)
摘要
Given the great success of large language models (LLMs) across various tasks,
in this paper, we introduce LLM-ST, a novel and effective speech translation
model constructed upon a pre-trained LLM. By integrating the large language
model (LLM) with a speech encoder and employing multi-task instruction tuning,
LLM-ST can produce accurate timestamped transcriptions and translations, even
from long audio inputs. Furthermore, our findings indicate that the
implementation of Chain-of-Thought (CoT) prompting can yield advantages in the
context of LLM-ST. Through rigorous experimentation on English and Chinese
datasets, we showcase the exceptional performance of LLM-ST, establishing a new
benchmark in the field of speech translation. Demo:
https://speechtranslation.github.io/llm-st/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要