On the (In)feasibility of ML Backdoor Detection as an Hypothesis Testing Problem
CoRR(2024)
摘要
We introduce a formal statistical definition for the problem of backdoor
detection in machine learning systems and use it to analyze the feasibility of
such problems, providing evidence for the utility and applicability of our
definition. The main contributions of this work are an impossibility result and
an achievability result for backdoor detection. We show a no-free-lunch
theorem, proving that universal (adversary-unaware) backdoor detection is
impossible, except for very small alphabet sizes. Thus, we argue, that backdoor
detection methods need to be either explicitly, or implicitly adversary-aware.
However, our work does not imply that backdoor detection cannot work in
specific scenarios, as evidenced by successful backdoor detection methods in
the scientific literature. Furthermore, we connect our definition to the
probably approximately correct (PAC) learnability of the out-of-distribution
detection problem.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要