## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Clinical Orthopaedics and Related Research, no. 1 (2007): 29-53

EI

摘要

Data analysis applications typically aggregate data across manydimensions looking for anomalies or unusual patterns. The SQL aggregatefunctions and the GROUP BY operator produce zero-dimensional orone-dimensional aggregates. Applications need the N-dimensionalgeneralization of these operators. This paper defines that operator, calledthe d...更多

代码：

数据：

简介

- Data analysis applications look for unusual patterns in data. They categorize data values and trends, extract statistical information, and contrast one category with another.
- DATA CUBE: A RELATIONAL AGGREGATION OPERATOR
- Using the GROUP BY construct, SQL can create a table of many aggregate values indexed by a set of attributes.
- These aggregate functions are optionally reset each time a grouping value changes in an ordered selection.

重点内容

- Data analysis applications look for unusual patterns in data
- Some of the most exciting work in visualization focuses on presenting new graphical metaphors that allow people to discover data trends and anomalies. Many of these visualization and data analysis tools represent the dataset as an N -dimensional space
- This paper examines how a relational engine can support efficient extraction of information from a SQL database that matches the above requirements of the visualization and data analysis
- The cube operator is based on a relational representation of aggregate data using the ALL value to denote the set over which each aggregation is computed
- The data cube is easy to compute for a wide class of functions

结果

- In standard SQL, histograms are computed indirectly from a table-valued expression which is aggregated.
- The representation suggested by Table 5 and unioned GROUP BYs “solve” the problem of representing aggregate data in a relational data model.
- The data cube operator builds a table containing all these aggregate values.
- Cumulative aggregates, like running sum or running average, work especially well with ROLLUP because the answer set is naturally sequential while the full data cube is naturally non-linear.
- In the Table 5 SalesSummary data cube, the respective sets are: Model.ALL = ALL(Model) = {Chevy, Ford} Year.ALL = ALL(Year) = {1990,1991,1992} Color.ALL = ALL(Color) = {red,white,blue}
- FROM Sales GROUP BY CUBE Model, Year, Color; Wherever the ALL value appeared before, the corresponding value will be NULL in the data field and TRUE in the corresponding grouping field.
- GROUP BY sales.department_number; The department.name column in the answer set is not allowed in current SQL, since it is neither an aggregation column nor is it an aggregate.
- Section 5 discusses how to compute data cubes and how users can add new aggregate operators.
- With this notation added to the SQL programming language, it should be fairly easy to compute super-super-aggregates from the base cube.
- The cube operator allows many aggregate functions in the aggregation list of the GROUP BY clause.
- Assume in this discussion that there is a single aggregate function F() being computed on an N -dimensional cube.
- The authors know of no more efficient way of computing super-aggregates of holistic functions than the 2N -algorithm using the standard GROUP BY techniques.
- When the core GROUP BY operation completes, the CUBE algorithm passes the set of handles to each N − 1 dimensional super-aggregate.

结论

- The cube operator generalizes and unifies several common and popular concepts: aggregates, group by, histograms, roll-ups and drill-downs and, cross tabs.
- The cube operator is based on a relational representation of aggregate data using the ALL value to denote the set over which each aggregation is computed.
- SQL’s basic set of five aggregate functions needs careful extension to include functions such as rank, N tile, cumulative, and percent of total to ease typical data mining operations.
- These extensions require a new super-aggregate mechanism to allow efficient computation of cubes.

- Table1: Time (UCT) 96/6/1:1500 96/6/7:1500
- Table2: SQL aggregates in standard benchmarks
- Table3: Table 3
- Table4: An Excel pivot table representation of Table 3 with Ford sales data included
- Table5: Table 5
- Table6: Table 6
- Table7: Demonstrating decorations and ALL

引用论文

- Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., and Sarawagi, S. 1996. On the Computation of Multidimensional Aggregates. Proc. 21st VLDB, Bombay.
- Chamberlin, D. 1996. Using the New DB2—IBM’s Object-Relational Database System. San Francisco, CA: Morgan Kaufmann.
- DataBlade Developer’s Kit: Users Guide 2.0. Informix Software, Menlo Park, CA, 1996.
- Date, C.J. 1995. Introduction to Database Systems. 6th edition, N.Y.: Addison Wesley.
- Date, C.J. 1996. Aggregate functions. Database Programming and Design, 9(4): 17–19.
- Graefe, C.J. 1993. Query evaluation techniques for large databases. ACM Computing Surveys, 25.2, pp. 73–170.
- Gray, J. (Ed.) 1991. The Benchmark Handbook. San Francisco, CA: Morgan Kaufmann.
- Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. 1996. Data cube: A relational operator generalizing group-by, cross-tab, and roll-up. Proc. International Conf. on Data Engineering. New Orleans: IEEE Press.
- Harinarayn, V., Rajaraman, A., and Ullman, J.D. 1996. Implementing data cubes efficiently. Proc. ACM SIGMOD.
- Montreal, pp. 205–216. 1992. IS 9075 International Standard for Database Language SQL, document ISO/IEC 9075:1992, J. Melton (Ed.). 1996. ISO/IEC DBL:MCI-006 (ISO Working Draft) Database Language SQL—Part 4: Persistent Stored Modules (SQL/PSM), J. Melton (Ed.). Melton, J. and Simon, A.R. 1993. Understanding the New SQL: A Complete Guide. San Francisco, CA: Morgan
- Kaufmann. 1994. Method and Apparatus for Storing and Retrieving Multi-Dimensional Data in Computer Memory. Inventor: Earle; Robert J., Assignee: Arbor Software Corporation, US Patent 05359724. 1994. Microsoft Access Relational Database Management System for Windows, Language Reference—Functions, Statements, Methods, Properties, and Actions, DB26142, Microsoft, Redmond, WA. 1995. Microsoft Excel—User’s Guide. Microsoft. Redmond, WA. 1996. Microsoft SQL Server: Transact-SQL Reference, Document 63900. Microsoft Corp. Redmond, WA. 1994. RISQL Reference Guide, Red Brick Warehouse VPT Version 3, Part no.: 401530, Red Brick Systems, Los
- Gatos. CA. Shukla, A., Deshpande, P., Naughton, J.F., and Ramaswamy, K. 1996. Storage estimation for multidimensional aggregates in the presence of hierarchies. Proc. 21st VLDB, Bombay. 1993. The Benchmark Handbook for Database and Transaction Processing Systems—2nd edition, J. Gray (Ed.), San Francisco, CA: Morgan Kaufmann. Or http://www.tpc.org/
- Jim Gray is a specialist in database and transaction processing computer systems. At Microsoft his research focuses on scaleable computing: building super-servers and workgroup systems from commodity software and hardware. Prior to joining Microsoft, he worked at Digital, Tandem, IBM and AT&T on database and transaction processing systems including Rdb, ACMS, NonStopSQL, Pathway, System R, SQL/DS, DB2, and IMS-Fast Path. He is editor of the Performance Handbook for Database and Transaction Processing Systems, and coauthor of Transaction Processing Concepts and Techniques. He holds doctorates from Berkely and Stuttgart, is a Member of the National Academy of Engineering, Fellow of the ACM, a member of the National Research council’s computer Science and Telecommunications Board, Editor in Chief of the VLDB Journal, Trustee of the VLDB Foundation, and Editor of the Morgan Kaufmann series on Data Management.
- Surajit Chaudhuri is a researcher in the Database research Group of Microsoft Research. From 1992 to 1995, he was a Member of the Technical Staff at Hewlett-Packard Laboratories, Palo Alto. He did his B.Tech from Indian Institute of Technology, Kharagpur and his Ph.D. from Stanford University. Surajit has published in SIGMOD, VLDB and PODS in the area of optimization of queries and multimedia systems. He served in the program committees for VLDB 1996 and International Conference on Database Theory (ICDT), 1997. He is a vice-chair of the Program Committee for the upcoming International Conference on Data Engineering (ICDE), 1997. In addition to query processing and optimization, Surajit is interested in the areas of data mining, database design and uses of databases for nontraditional applications.
- Andrew Layman has been a Senior Program Manager at Microsoft Corp. since 1992. He is currently working on language integration for Internet Explorer. Before that, he designed and built a number of high-performance, data-bound Active-X controls for use across several Microsoft products and worked on the original specs for Active-X controls (nee “OLE Controls”). Formerly he was Vice-President of Symantec.

标签

评论

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn