Chrome Extension
WeChat Mini Program
Use on ChatGLM

Determining Essential Statistics for Cost Based Optimization of an ETL Workflow.

EDBT(2014)

Cited 31|Views42
No score
Abstract
of the ETL products in the market today provide tools for design of ETL workflows, with very little or no support for opti- mization of such workflows. Optimization of ETL workflows pose several new challenges compared to traditional query optimization in database systems. There have been many attempts both in the industry and the research community to support cost-based opti- mization techniques for ETL Workflows, but with limited success. Non-availability of source statistics in ETL is one of the major chal- lenges that precludes the use of a cost based optimization strategy. However, the basic philosophy of ETL workflows of design once and execute repeatedly allows interesting possibilities for determin- ing the statistics of the input. In this paper, we propose a frame- work to determine various sets of statistics to collect for a given workflow, using which the optimizer can estimate the cost of any alternative plan for the workflow. The initial few runs of the work- flow are used to collect the statistics and future runs are optimized based on the learned statistics. Since there can be several alterna- tive sets of statistics that are sufficient, we propose an optimization framework to choose a set of statistics that can be measured with the least overhead. We experimentally demonstrate the effective- ness and efficiency of the proposed algorithms.
More
Translated text
Key words
cost based optimization,essential statistics
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined