GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Ji Qi,Jifan Yu,Teng Tu, Kunyu Gao,Yifan Xu,Xinyu Guan,Xiaozhi Wang,Bin Xu,Lei Hou,Juanzi Li,Jie Tang

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023（2023）

Cited 8|Views339

No score

Abstract

Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. Based on soccer game videos and synchronized commentary data, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). We experimentally test existing state-of-the-art (SOTA) methods on this resource to demonstrate the future directions for improvement in this challenging task. We hope that our data resource (now available at https://github.com/THU-KEG/goal) can serve researchers and developers interested in knowledge-grounded cross-modal applications.

Translated text

Key words

Video Captioning,Knowledge Grounding,Open-source Dataset

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined