Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
CoRR(2024)
摘要
Large Language Models (LLMs) are now commonplace in conversation
applications. However, their risks of misuse for generating harmful responses
have raised serious societal concerns and spurred recent research on LLM
conversation safety. Therefore, in this survey, we provide a comprehensive
overview of recent studies, covering three critical aspects of LLM conversation
safety: attacks, defenses, and evaluations. Our goal is to provide a structured
summary that enhances understanding of LLM conversation safety and encourages
further investigation into this important subject. For easy reference, we have
categorized all the studies mentioned in this survey according to our taxonomy,
available at: https://github.com/niconi19/LLM-conversation-safety.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要