A Framework For Detecting Unnecessary Industrial Data In Etl Processes

Philip Woodall,Duncan Mcfarlane,Torben Jess, Amar Shah,Mark Harrison,William Krechel,Eric Nicks

Industrial Informatics（2014）

引用 4|浏览2

暂无评分

摘要

Extract, transform and load (ETL) is a critical process used by industrial organisations to shift data from one database to another, such as from an operational system to a data warehouse. With the increasing amount of data stored by industrial organisations, some ETL processes can take in excess of 12 hours to complete; this can leave decision makers stranded while they wait for the data needed to support their decisions. After designing the ETL processes, inevitably data requirements can change, and much of the data that goes through the ETL process may not ever be used or needed. This paper therefore proposes a framework for dynamically detecting and predicting unnecessary data and preventing it from slowing down ETL processes - either by removing it entirely or deprioritizing it. Other advantages of the framework include being able to prioritise data cleansing tasks and determining what data should be processed first and placed into fast access memory. We show existing example algorithms that can be used for each component of the framework, and present some initial testing results as part of our research to determine whether the framework can help to reduce ETL time.

查看译文

关键词

Extract,transform and load ETL,Data warehouse,reduce ETL,unnecessary data,data overload,detecting unnecessary data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要