GenAIPABench: A Benchmark for Generative AI-based Privacy Assistants

Aamir Hamid,Hemanth Reddy Samidi,Tim Finin,Primal Pappachan,Roberto Yus

CoRR（2023）

引用 0|浏览9

暂无评分

摘要

Privacy policies inform users about the data management practices of organizations. Yet, their complexity often renders them largely incomprehensible to the average user, necessitating the development of privacy assistants. With the advent of generative AI (genAI) technologies, there is an untapped potential to enhance privacy assistants in answering user queries effectively. However, the reliability of genAI remains a concern due to its propensity for generating incorrect or misleading information. This study introduces GenAIPABench, a novel benchmarking framework designed to evaluate the performance of Generative AI-based Privacy Assistants (GenAIPAs). GenAIPABench comprises: 1) A comprehensive set of questions about an organization's privacy policy and a data protection regulation, along with annotated answers for several organizations and regulations; 2) A robust set of evaluation metrics for assessing the accuracy, relevance, and consistency of the generated responses; and 3) An evaluation tool that generates appropriate prompts to introduce the system to the privacy document and different variations of the privacy questions to evaluate its robustness. We use GenAIPABench to assess the potential of three leading genAI systems in becoming GenAIPAs: ChatGPT, Bard, and Bing AI. Our results demonstrate significant promise in genAI capabilities in the privacy domain while also highlighting challenges in managing complex queries, ensuring consistency, and verifying source accuracy.

查看译文

关键词

privacy assistants,benchmark,ai-based

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要