Chrome Extension
WeChat Mini Program
Use on ChatGLM

A Compact Multivariate Histogram Representation for Query-Driven Visualization

IEEE Symposium on Large Data Analysis and Visualization(2015)

Cited 9|Views34
No score
Abstract
As the size of data continues to increase, distribution-based methods become increasingly more important for data summarization and queries. To represent the distribution from a dataset without relying on a particular parametric model, histograms are widely used in many applications as it is simple to create and efficient to query. For multivariate scientific datasets, however, storing multivariate histograms in the form of multi-dimensional arrays is very expensive as the size of the histogram grows exponentially with the number of variables. In this paper, we present a compact structure to store multivariate histograms to reduce its huge space cost while supporting different kinds of histogram queries efficiently. A data space transformation is employed first to transform the large multi-dimensional array to a much smaller array. Dictionaries are constructed to encode this transformation. Then, the multivariate histogram is represented as a sequence of index and frequency pairs where the indices are represented as bitstrings computed from a space filling curve traversal of the transformed array. With this compact representation, the storage cost for the histograms is reduced. Based on our representation, we also present several common types of queries such as histogram marginalization, bin-merging and computation of conditional probability. We parallelize both the histogram computation and queries to improve its efficiency. We present several query-driven visualization applications to explore and analyze multivariate scientific datasets. Experimental results to study the performance of our framework in terms of scalability and space cost are also discussed.
More
Translated text
Key words
multivariate histogram representation,query-driven visualization,distribution-based methods,data summarization,data queries,parametric model,multivariate scientific datasets,multidimensional arrays,data space transformation,space filling curve,histogram marginalization,bin-merging,conditional probability computation,histogram computation,query-driven visualization applications
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined