Using Feature Selection to Improve the Utility of Differentially Private Data Publishing.

Yasser Jafer,Stan Matwin,Marina Sokolova

Procedia Computer Science（2014）

引用 15|浏览10

暂无评分

摘要

Protection of patient's privacy is an obligation enforced by laws and regulations in the US, Canada, and other jurisdictions. With exponential growth of exchange of personal health information (PHI) brought about by e-health, there is a need for smart algorithms that help the data publisher to protect PHI. Within exiting privacy models, differential privacy is considered one of the strongest privacy protection techniques that does not make any assumption about the attacker's background knowledge. One way to achieve differential privacy in the non-interactive mode is to derive a contingency table of the raw data over the database domain, to add noise to each count, and to publish the resulting noisy table of counts. This approach, however, is not suitable for high-dimensional data with large domains as the added noise substantially destroys the utility of the data. In this work, we show that when the K-anonymity is preceded by feature selection, it is possible to obtain a contingency table with higher counts. As a result, when noise is added to satisfy differential privacy, its distorting effect is minimized and high utility of the data is preserved. We propose the TOP_Diff algorithm which offers a trade-off between anonymization level K and the privacy budget ɛ, and enables us to publish privacy preserving datasets with high utility. Our approach is capable of handling both numerical and categorical features.

查看译文

关键词

Privacy,Feature Selection,K-anonymity,Differential Privacy,Classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要