Climbing the Ladder of Interpretability with Counterfactual Concept Bottleneck Models
CoRR(2024)
摘要
Current deep learning models are not designed to simultaneously address three
fundamental questions: predict class labels to solve a given classification
task (the "What?"), explain task predictions (the "Why?"), and imagine
alternative scenarios that could result in different predictions (the "What
if?"). The inability to answer these questions represents a crucial gap in
deploying reliable AI agents, calibrating human trust, and deepening
human-machine interaction. To bridge this gap, we introduce CounterFactual
Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently
address the above queries all at once without the need to run post-hoc
searches. Our results show that CF-CBMs produce: accurate predictions (the
"What?"), simple explanations for task predictions (the "Why?"), and
interpretable counterfactuals (the "What if?"). CF-CBMs can also sample or
estimate the most probable counterfactual to: (i) explain the effect of concept
interventions on tasks, (ii) show users how to get a desired class label, and
(iii) propose concept interventions via "task-driven" interventions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要