The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Nathaniel Li,Alexander Pan,Anjali Gopal,Summer Yue, Daniel Berrios,Alice Gatti, Justin Li,Ann-Kathrin Dombrowski,Shashwat Goel,Gabriel Mukobi, Nathan Helm-Burger,Rassin Lababidi, Lennart Justen,Andrew Liu, Michael Chen, Isabelle Barrass,Oliver Zhang, Xiaoyuan Zhu,Rishub Tamirisa, Bhrugu Bharathi, Ariel Herbert-Voss, Cort Breuer,Andy Zou,Mantas Mazeika,Zifan Wang, Palash Oswal,Weiran Lin, Adam Hunt, Justin Tienken-Harder, Kevin Shih, Kemper Talley, John Guan, Ian Steneker, David Campbell, Brad Jokubaitis,Steven Basart,Stephen Fitz,Ponnurangam Kumaraguru,Kallol Karmakar,Uday Tupakula,Vijay Varadharajan,Yan Shoshitaishvili,Jimmy Ba,Kevin Esvelt, Alexandr Wang,Dan Hendrycks ICML 2024(2024)
Key words
Intrusion Detection
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper