Deeper Understanding of Black-box Predictions via Generalized Influence Functions
arxiv(2023)
摘要
Influence functions (IFs) elucidate how training data changes model behavior.
However, the increasing size and non-convexity in large-scale models make IFs
inaccurate. We suspect that the fragility comes from the first-order
approximation which may cause nuisance changes in parameters irrelevant to the
examined data. However, simply computing influence from the chosen parameters
can be misleading, as it fails to nullify the hidden effects of unselected
parameters on the analyzed data. Thus, our approach introduces generalized IFs,
precisely estimating target parameters' influence while nullifying nuisance
gradient changes on fixed parameters. We identify target update parameters
closely associated with the input data by the output- and gradient-based
parameter selection methods. We verify the generalized IFs with various
alternatives of IFs on the class removal and label change tasks. The
experiments align with the "less is more" philosophy, demonstrating that
updating only 5% of the model produces more accurate results than other
influence functions across all tasks. We believe our proposal works as a
foundational tool for optimizing models, conducting data analysis, and
enhancing AI interpretability beyond the limitation of IFs. Codes are available
at https://github.com/hslyu/GIF.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要