Big Data Quality: A Quality Dimensions Evaluation

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)(2016)

引用 34|浏览19
暂无评分
摘要
Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable and can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging and must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size and its fast generation, it requires mechanisms and strategies to evaluate and assess data quality in a fast and efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness and consistency. The experimentations have been conducted on Sleep disorder's data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data and illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.
更多
查看译文
关键词
Big Data,data quality dimensions,data quality evaluation,Big data sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要