Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
CoRR(2024)
摘要
Recently, interpretable machine learning has re-explored concept bottleneck
models (CBM), comprising step-by-step prediction of the high-level concepts
from the raw features and the target variable from the predicted concepts. A
compelling advantage of this model class is the user's ability to intervene on
the predicted concept values, affecting the model's downstream output. In this
work, we introduce a method to perform such concept-based interventions on
already-trained neural networks, which are not interpretable by design, given
an annotated validation set. Furthermore, we formalise the model's
intervenability as a measure of the effectiveness of concept-based
interventions and leverage this definition to fine-tune black-box models.
Empirically, we explore the intervenability of black-box classifiers on
synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning
improves intervention effectiveness and often yields better-calibrated
predictions. To showcase the practical utility of the proposed techniques, we
apply them to deep chest X-ray classifiers and show that fine-tuned black boxes
can be as intervenable and more performant than CBMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要