DAG: A General Model for Privacy-Preserving Data Mining

IEEE Transactions on Knowledge and Data Engineering(2020)

引用 27|浏览114
暂无评分
摘要
Rapid advances automated data collection tools and data storage technology led to wide availability huge amount distributed data owned by different parties. Data mining can use distributed data to discover rules, patterns or knowledge normally not discovered data owned by a single party. Thus, data mining on the distributed data can lead to insights and economic advantages. However, recent years, privacy laws have been enacted to protect any individual sensitive information from privacy violation and misuse. To address issue, have proposed data mining (PPDM) based on secure multi-party computation (SMC) that can mine the distributed data with privacy preservation (i.e., privacy protection). However, most solutions ad-hoc. They proposed for specific applications, and thus cannot be applied to other applications directly. Another limitation current PPDM is with only a limited set operators such as +,−,× and log (logarithm). In data mining primitives, some functions can involve operators such as / and √ . The above issues have motivated us to investigate a general SMC-based solution to solve current limitations PPDM. In this thesis, we propose a general for privacy-preserving data mining, namely as DAG. We apply a hybrid that combines homomorphic encryption protocol and circuit approach model. The hybrid has been proven efficient in computation and effective protecting data privacy via theoretical and experimental proofs. Specifically, our proposed research objectives as follows: (i) We want to propose a general privacy-preserving data mining (i.e., DAG) that consists a set secure operators. The secure operators can support many mining primitives. The two-party which is efficient and effective model is applied to develop secure protocols DAG. Our secure operators can provide a complete privacy under semi-honest model. Moreover, secure operators are efficient (ii) We will integrate into various classification problems by proposing new privacy-preserving classification algorithms. (iii) To make our that can support wider applications, we will integrate DAG into other application domains. We will integrate into ant colony optimization (ACO) to solve traveling salesman problem (TSP) by proposing a privacypreserving traveling salesman problem (PPTSP). In this report, we present most results objectives mentioned above. The DAG is general – its operators, if pipelined together, can implement various functions. It is also extendable – secure operators can be defined to expand functions the supports. All secure operators strictly proven secure via simulation paradigm (Goldreich, 2004). In addition, error bounds and complexities the secure operators derived so as to investigate accuracy and computation performance our model. We apply our into various application domains. We first apply into data mining classification algorithms such as support vector machine, kernel regression, and Na¨ive Bayes. Experiment results show that generates outputs that almost same as those by setting, where multiple parties simply disclose their data. is also efficient computation data mining tasks. For example, kernel regression, when training data size is 683,093, one prediction non-private setting takes 5.93 sec, and that by our takes 12.38 sec. In experiment of PPTSP, a salesman can find approximate optimal traveled distance without disclosing any city locations TSP. Various domain applications studies show that our is the general yet efficient for secure multi-party computation.
更多
查看译文
关键词
Data models,Protocols,Cryptography,Task analysis,Data mining,Computational modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要