Logarithmic Radix Binning and Vectorized Triangle Counting

2018 IEEE High Performance extreme Computing Conference (HPEC)(2018)

引用 13|浏览15
暂无评分
摘要
Triangle counting is a building block for numerous graph applications and given the fact that graphs continue to grow in size, its scalability is important. As such, numerous algorithms have been designed for triangle counting - some of which are compute-bound rather than memory bound. Even for compute-bound algorithms, one of the key challenges is the limited control flow available on the processor. This is in-part due to the high dependency between the control flow, input data, and limited utilization of vector instructions. Not surprising, compilers are not always able to detect these data dependencies and vectorize the algorithms. Using the branch-avoiding model we show to remove control flow restrictions by replacing branches with an equivalent set of arithmetic operations. More so, we show how these can be vectorized using Intel's AVX-512 instruction set and that our new vectorized algorithms are 2×–5× faster than scalar counterparts. We also present a new load balancing method, Logarithmic Radix Binning (LRB) that ensures that threads and the vector data lanes execute a near equal amount of work at any given time. Altogether, our algorithm outperforms several 2017 HPEC Graph Challenge Champions such as the KOKKOS framework and a GPU based algorithm by anywhere from 1.5× and up to 14×.
更多
查看译文
关键词
AVX-512 instruction,logarithmic radix binning,compilers,GPU based algorithm,2017 HPEC Graph Challenge Champions,vector data lanes,control flow restrictions,branch-avoiding model,data dependencies,vector instructions,compute-bound algorithms,numerous graph applications,building block,vectorized triangle counting
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要