Experiences Porting Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-study
International Workshop on Asynchronous Many-Task Systems and Applications(2024)
摘要
Parallel algorithms relying on synchronous parallelization libraries often
experience adverse performance due to global synchronization barriers.
Asynchronous many-task runtimes offer task futurization capabilities that
minimize or remove the need for global synchronization barriers. This paper
conducts a case study of the multidimensional Fast Fourier Transform to
identify which applications will benefit from the asynchronous many-task model.
Our basis is the popular FFTW library. We use the asynchronous many-task model
HPX and a one-dimensional FFTW backend to implement multiple versions using
different HPX features and highlight overheads and pitfalls during migration.
Furthermore, we add an HPX threading backend to FFTW. The case study analyzes
shared memory scaling properties between our HPX-based parallelization and FFTW
with its pthreads, OpenMP, and HPX backends. The case study also compares
FFTW's MPI+X backend to a purely HPX-based distributed implementation. The FFT
application does not profit from asynchronous task execution. In contrast,
enforcing task synchronization results in better cache performance and thus
better runtime. Nonetheless, the HPX backend for FFTW is competitive with
existing backends. Our distributed HPX implementation based on HPX collectives
using MPI parcelport performs similarly to FFTW's MPI+OpenMP. However, the LCI
parcelport of HPX accelerated communication up to a factor of 5.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要