Adapting to data sparsity for efficient parallel PARAFAC tensor decomposition in Hadoop

2016 IEEE International Conference on Big Data (Big Data)(2016)

引用 7|浏览39
暂无评分
摘要
Parallel Factor Analysis (PARAFAC) is used in many scientific disciplines to decompose multimodal datasets ('tensors') into principal factors to uncover multilinear relationships in the data. Today's popular implementations of PARAFAC are single-server solutions that do not scale well to big datasets. This paper presents the design, implementation, and testing of a Big Data-enabled Parallel PARAFAC algorithm written in Java and run via Hadoop. To optimize performance across diverse datasets, three computational modes have been implemented: (i) dense, (ii) sparse, and (iii) hybrid. The input tensor is first divided into slices for parallel processing, then a one-pass test is performed on each slice to determine its size and sparsity, and finally, the best mode for each slice is chosen to minimize the overall runtime. For tensors with non-uniform data density distributions, we demonstrate that a per slice selection of the computational mode minimizes the runtime compared to using the same mode for all slices. In one representative example, the runtime with auto-selected mode was 3% and 25% faster than pure hybrid and sparse mode executions, respectively, resulting in a faster execution of PARAFAC on Big Data tensors than previously possible.
更多
查看译文
关键词
Parallel factor analysis,Hadoop,tensor,parallel computing,Big Data,data sparsity,performance testing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要