ComFe: Interpretable Image Classifiers With Foundation Models, Transformers and Component Features
arxiv(2024)
摘要
Interpretable computer vision models are able to explain their reasoning
through comparing the distances between the image patch embeddings and
prototypes within a latent space. However, many of these approaches introduce
additional complexity, can require multiple training steps and often have a
performance cost in comparison to black-box approaches. In this work, we
introduce Component Features (ComFe), a novel interpretable-by-design image
classification approach that is highly scalable and can obtain better accuracy
and robustness in comparison to non-interpretable methods. Inspired by recent
developments in computer vision foundation models, ComFe uses a
transformer-decoder head and a hierarchical mixture-modelling approach with a
foundation model backbone to obtain higher accuracy compared to previous
interpretable models across a range of fine-grained vision benchmarks, without
the need to individually tune hyper-parameters for each dataset. With only
global image labels and no segmentation or part annotations, ComFe can identify
consistent component features within an image and determine which of these
features are informative in making a prediction.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要