Parallel Algorithm for Connected-Component Analysis Using CUDA.

Algorithms(2023)

引用 1|浏览6
暂无评分
摘要
In this article, we introduce a parallel algorithm for connected-component analysis (CCA) on GPUs which drastically reduces the volume of data to transfer from GPU to the host. CCA algorithms targeting GPUs typically store the extracted features in arrays large enough to potentially hold the maximum possible number of objects for the given image size. Transferring these large arrays to the host requires large portions of the overall execution time. Therefore, we propose an algorithm which uses a CUDA kernel to merge trees of connected component feature structs. During the tree merging, various connected-component properties, such as total area, centroid and bounding box, are extracted and accumulated. The tree structure then enables us to only transfer features of valid objects to the host for further processing or storing. Our benchmarks show that this implementation significantly reduces memory transfer volume for processing results on the host whilst maintaining similar performance to state-of-the-art CCA algorithms.
更多
查看译文
关键词
connected-component analysis,image stream processing,parallel computing,CUDA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要