MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
arxiv(2024)
摘要
The development of multimodal models has marked a significant step forward in
how machines understand videos. These models have shown promise in analyzing
short video clips. However, when it comes to longer formats like movies, they
often fall short. The main hurdles are the lack of high-quality, diverse video
data and the intensive work required to collect or annotate such data. In the
face of these challenges, we propose MovieLLM, a novel framework designed to
create synthetic, high-quality data for long videos. This framework leverages
the power of GPT-4 and text-to-image models to generate detailed scripts and
corresponding visuals. Our approach stands out for its flexibility and
scalability, making it a superior alternative to traditional data collection
methods. Our extensive experiments validate that the data produced by MovieLLM
significantly improves the performance of multimodal models in understanding
complex video narratives, overcoming the limitations of existing datasets
regarding scarcity and bias.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要