谷歌浏览器插件
订阅小程序
在清言上使用

Improved 3D Rotation-based Geometric Data Perturbation Based on Medical Data Preservation in Big Data

International journal of advanced computer science and applications/International journal of advanced computer science & applications(2023)

引用 0|浏览2
暂无评分
摘要
With the rise in technology, a huge volume of data is being processed using data mining, especially in the healthcare sector. Usually, medical data consist of a lot of personal data, and third parties utilize it for the data mining process. Perturbation in health care data highly aids in preventing intruders from utilizing the patient's privacy. One of the challenges in data perturbation is managing data utility and privacy protection. Medical data mining has certain special properties compared with other data mining fields. Hence, in this work, the machine learning (ML) based perturbation approach is introduced to provide more privacy to healthcare data. Here, clustering and IGDP-3DR processes are applied to improve healthcare privacy preservation. Initially, the dataset is pre-processed using data normalization. Then, the dimensionality is reduced by SVD with PCA (singular value decomposition with Principal component analysis). Then, the clustering process is performed by IFCM (Improved Fuzzy C means). The high-dimensional data are divided into several segments by IFCM, and every partition is set as a cluster. Then, improved Geometric Data Perturbation (IGDP) is used to perturb the clustered data. IGDP is a combination of GDP with 3D rotation (3DR). Finally, the perturbed data are classified using a machine learning (ML) classifier -kernel Support Vector Machine-Horse Herd Optimization (KSVM-HHO) to classify the perturbed data and ensure better accuracy. The overall evaluation of the proposed KSVM-HHO is carried out in the Python platform. The performance of the IGDP-KSVM-HHO is compared over the two benchmark datasets, Wisconsin prognostic breast cancer (WBC) and Pima Indians Diabetes (PID) dataset. For the WBC dataset, the proposed method obtains an overall accuracy of 98.08% perturbed data, and for the PID dataset, the proposed method obtains an overall accuracy of 98.04%.
更多
查看译文
关键词
Data mining,privacy,health care data,machine learning,perturbation,improved fuzzy c-means,horse herd optimization,kernel based support vector machine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要