BioVL-QR: Egocentric Biochemical Video-and-Language Dataset Using Micro QR Codes
arxiv(2024)
摘要
This paper introduces a biochemical vision-and-language dataset, which
consists of 24 egocentric experiment videos, corresponding protocols, and
video-and-language alignments. The key challenge in the wet-lab domain is
detecting equipment, reagents, and containers is difficult because the lab
environment is scattered by filling objects on the table and some objects are
indistinguishable. Therefore, previous studies assume that objects are manually
annotated and given for downstream tasks, but this is costly and
time-consuming. To address this issue, this study focuses on Micro QR Codes to
detect objects automatically. From our preliminary study, we found that
detecting objects only using Micro QR Codes is still difficult because the
researchers manipulate objects, causing blur and occlusion frequently. To
address this, we also propose a novel object labeling method by combining a
Micro QR Code detector and an off-the-shelf hand object detector. As one of the
applications of our dataset, we conduct the task of generating protocols from
experiment videos and find that our approach can generate accurate protocols.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要