Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images.
CoRR(2023)
摘要
Due to deteriorating environmental conditions and increasing human activity,
conservation efforts directed towards wildlife is crucial. Motion-activated
camera traps constitute an efficient tool for tracking and monitoring wildlife
populations across the globe. Supervised learning techniques have been
successfully deployed to analyze such imagery, however training such techniques
requires annotations from experts. Reducing the reliance on costly labelled
data therefore has immense potential in developing large-scale wildlife
tracking solutions with markedly less human labor. In this work we propose
WildMatch, a novel zero-shot species classification framework that leverages
multimodal foundation models. In particular, we instruction tune
vision-language models to generate detailed visual descriptions of camera trap
images using similar terminology to experts. Then, we match the generated
caption to an external knowledge base of descriptions in order to determine the
species in a zero-shot manner. We investigate techniques to build instruction
tuning datasets for detailed animal description generation and propose a novel
knowledge augmentation technique to enhance caption quality. We demonstrate the
performance of WildMatch on a new camera trap dataset collected in the
Magdalena Medio region of Colombia.
更多查看译文
关键词
trap,animal,images,species,zero-shot
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要