Saliency strikes back: How filtering out high frequencies improves white-box explanations
arxiv(2023)
摘要
Attribution methods correspond to a class of explainability methods (XAI)
that aim to assess how individual inputs contribute to a model's
decision-making process. We have identified a significant limitation in one
type of attribution methods, known as "white-box" methods. Although highly
efficient, these methods rely on a gradient signal that is often contaminated
by high-frequency noise. To overcome this limitation, we introduce a new
approach called "FORGrad". This simple method effectively filters out noise
artifacts by using optimal cut-off frequencies tailored to the unique
characteristics of each model architecture. Our findings show that FORGrad
consistently enhances the performance of already existing white-box methods,
enabling them to compete effectively with more accurate yet computationally
demanding "black-box" methods. We anticipate that our research will foster
broader adoption of simpler and more efficient white-box methods for
explainability, offering a better balance between faithfulness and
computational efficiency.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要