A Multi-layer Approach for Data Cleaning in the Healthcare Domain.

International Conferences on Computing and Data Engineering (ICCDE)(2022)

引用 3|浏览0
暂无评分
摘要
It is an undeniable fact that nowadays there exists a plethora of sources that can generate data with complex and, most of the time, error-prone nature, as well as multiple origins. Those sources may be of different complexity, but most of them share a common characteristic: the lack of performing quality checks on the collected data. The aforementioned implies that, in every platform that utilizes data originating from those sources, there should be a mechanism that is responsible for assuring the reliability of the collected data, thus providing to the rest of the platform's mechanisms (e.g., risk analysis and prediction mechanisms) data of high quality that could lead to the best knowledge extraction possible for decision making. The need for this kind of mechanism is even greater when it comes to the healthcare domain because the clean data, which a data cleaning mechanism produces, are essential to bring consistency to healthcare data that might be inaccurate, outdated, redundant or incomplete. Considering these challenges, in this paper it is being proposed a data cleaning mechanism for assuring the quality and the reliability of the data regardless of their origin. The mechanism consists of three (3) sub-components, being responsible for ingesting and storing the data, also including a set of cleaning actions. These actions, namely “Validation”, “Cleaning”, “Verification” and “Logging”, combine multiple well-established data cleaning techniques to ensure the effectiveness and the efficiency of the whole data cleaning procedure. Its evaluation process includes the usage of three (3) separate datasets from the healthcare domain that contain different types of data and errors in their corresponding records. The results of the mechanism (i.e., the cleaned data) are being compared with the ground truth of these datasets, resulting that the data cleaning mechanism was successfully and efficiently preformed, thus providing an extensive insight regarding the mechanism's capabilities.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要