Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion
CoRR(2024)
摘要
With the rise of Machine Learning as a Service (MLaaS) platforms,safeguarding
the intellectual property of deep learning models is becoming paramount. Among
various protective measures, trigger set watermarking has emerged as a flexible
and effective strategy for preventing unauthorized model distribution. However,
this paper identifies an inherent flaw in the current paradigm of trigger set
watermarking: evasion adversaries can readily exploit the shortcuts created by
models memorizing watermark samples that deviate from the main task
distribution, significantly impairing their generalization in adversarial
settings. To counteract this, we leverage diffusion models to synthesize
unrestricted adversarial examples as trigger sets. By learning the model to
accurately recognize them, unique watermark behaviors are promoted through
knowledge injection rather than error memorization, thus avoiding exploitable
shortcuts. Furthermore, we uncover that the resistance of current trigger set
watermarking against removal attacks primarily relies on significantly damaging
the decision boundaries during embedding, intertwining unremovability with
adverse impacts. By optimizing the knowledge transfer properties of protected
models, our approach conveys watermark behaviors to extraction surrogates
without aggressively decision boundary perturbation. Experimental results on
CIFAR-10/100 and Imagenette datasets demonstrate the effectiveness of our
method, showing not only improved robustness against evasion adversaries but
also superior resistance to watermark removal attacks compared to
state-of-the-art solutions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要