ForestLayer: Efficient training of deep forests on distributed task-parallel platforms.

Journal of Parallel and Distributed Computing(2019)

引用 18|浏览62
暂无评分
摘要
Most of the existing deep models are deep neural networks. Recently, the deep forest opens a door towards an alternative to deep neural networks for many tasks and has attracted more and more attention. At the same time, the deep forest model becomes widely used in many real-world applications. However, the existing deep forest system is inefficient and lacks scalability. In this paper, we present ForestLayer, which is an efficient and scalable deep forest system built on distributed task-parallel platforms. First, to improve the computing concurrency and reduce the communication overhead, we propose a fine-grained sub-forest based task-parallel algorithm. Next, we design a novel task splitting mechanism to reduce the training time without decreasing the accuracy of the original method. To further improve the performance of ForestLayer, we propose three system-level optimization techniques, including lazy scan, pre-pooling, and partial transmission. Besides the systematic optimization, we also propose a set of high-level programming APIs to improve the ease-of-use of ForestLayer. Finally, we have implemented ForestLayer on the distributed task-parallel platform Ray. The experimental results reveal that ForestLayer outperforms the existing deep forest system gcForest with 7× to 20.9× speedup on a range of datasets. In addition, ForestLayer outperforms TensorFlow-based implementation on most of the datasets, while achieving better predictive performance. Furthermore, ForestLayer achieves good scalability and load balance.
更多
查看译文
关键词
Deep forest,Distributed computing,Task-parallel,Random forest,Ray
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要