Differentially Private Projected Histograms of Multi-Attribute Data for Classification.

CoRR(2015)

引用 22|浏览43
暂无评分
摘要
In this paper, we tackle the problem of constructing a differentially private synopsis for the classification analyses. Several the state-of-the-art methods follow the structure of existing classification algorithms and are all iterative, which is suboptimal due to the locally optimal choices and the over-divided privacy budget among many sequentially composed steps. Instead, we propose a new approach, PrivPfC, a new differentially private method for releasing data for classification. The key idea is to privately select an optimal partition of the underlying dataset using the given privacy budget in one step. Given one dataset and the privacy budget, PrivPfC constructs a pool of candidate grids where the number of cells of each grid is under a data-aware and privacy-budget-aware threshold. After that, PrivPfC selects an optimal grid via the exponential mechanism by using a novel quality function which minimizes the expected number of misclassified records on which a histogram classifier is constructed using the published grid. Finally, PrivPfC injects noise into each cell of the selected grid and releases the noisy grid as the private synopsis of the data. If the size of the candidate grid pool is larger than the processing capability threshold set by the data curator, we add a step in the beginning of PrivPfC to prune the set of attributes privately. We introduce a modified $\chi^2$ quality function with low sensitivity and use it to evaluate an attribute's relevance to the classification label variable. Through extensive experiments on real datasets, we demonstrate PrivPfC's superiority over the state-of-the-art methods.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要