SAM: Database Generation from Query Workloads with Supervised Autoregressive Models

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)(2022)

引用 7|浏览62
暂无评分
摘要
With the prevalence of cloud databases, database users are increasingly reliant on the cloud database providers to manage their data. It becomes a challenge for cloud providers to benchmark different DBMS for a specific database instance without having access to the underlying data. One viable solution is to leverage a query workload, which contains a set of queries and the corresponding cardinalities, to generate a synthetic database with similar query performance. Existing methods for database generation with cardinality constraints, however, can only handle very small query workloads due to their high complexity and encounter challenges when handling join queries. In this work, we propose SAM, a supervised deep autoregressive model-based method for database generation from query workloads. First, SAM is able to process large-scale query workloads efficiently as its complexity is linear in the size of the query workload, the number of attributes and the attribute domain size. Second, we develop algorithms to obtain unbiased samples of base relations from the deep autoregressive model and assign join keys in a way that accurately recovers the full outer join of the target database. Comprehensive experiments on real-world datasets demonstrate that SAM is able to efficiently generate a high-fidelity database that not only satisfies the input cardinality constraints, but also is close to the target database.
更多
查看译文
关键词
Database Generation, Supervised Autoregressive Models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要