谷歌浏览器插件
订阅小程序
在清言上使用

An Efficient and Flexible Parallel FFT Implementation Based on FFTW

Competence in High Performance Computing 2010(2011)

引用 4|浏览0
暂无评分
摘要
In this paper we describe a new open source software library called PFFT [12], which was developed for calculating parallel complex to complex FFTs on massively parallel architectures. It combines the flexible user interface and hardware adaptiveness of FFTW [7] with a highly scalable two-dimensional data decomposition. We use a transpose FFT algorithm, that consist of one-dimensional FFTs and global data transpositions. For the implementation we utilize the FFTW software library. Therefore we are able to generalize our algorithm straight forward to d-dimensional FFTs, d≥3, real to complex FFTs and even completely in place transformations. Further retained FFTW features like the selection of planning effort via flags and a separate communicator handle distinguish PFFT from other public available parallel FFT implementations. Automatic ghost cell creation and support of oversampled FFTs complete the outstanding flexibility of PFFT. Our runtime tests up to 262144 cores of the BlueGene/P supercomputer prove PFFT to be as fast as the well known P3DFFT [11] software package, while the flexibility of FFTW is still preserved.
更多
查看译文
关键词
Fast Fourier Transform,Software Library,Wall Clock Time,Fast Fourier Transform Algorithm,Input Array
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要