BIOCLIP: A Vision Foundation Model for the Tree of Life
CVPR 2024(2023)
摘要
Images of the natural world, collected by a variety of cameras, from drones
to individual phones, are increasingly abundant sources of biological
information. There is an explosion of computational methods and tools,
particularly computer vision, for extracting biologically relevant information
from images for science and conservation. Yet most of these are bespoke
approaches designed for a specific task and are not easily adaptable or
extendable to new questions, contexts, and datasets. A vision model for general
organismal biology questions on images is of timely need. To approach this, we
curate and release TreeOfLife-10M, the largest and most diverse ML-ready
dataset of biology images. We then develop BioCLIP, a foundation model for the
tree of life, leveraging the unique properties of biology captured by
TreeOfLife-10M, namely the abundance and variety of images of plants, animals,
and fungi, together with the availability of rich structured biological
knowledge. We rigorously benchmark our approach on diverse fine-grained biology
classification tasks, and find that BioCLIP consistently and substantially
outperforms existing baselines (by 17% to 20% absolute). Intrinsic evaluation
reveals that BioCLIP has learned a hierarchical representation conforming to
the tree of life, shedding light on its strong generalizability. Our code,
models and data will be made available at
https://github.com/Imageomics/bioclip.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要