Multiscale Approximation with Graphical Processing Units for Multiplicative Speedup in Molecular Dynamics.

Ramu Anandakrishnan,Mayank Daga,Alexey Onufriev,Wu-chun Feng

BCB（2016）

引用 0|浏览12

暂无评分

摘要

ABSTRACTThe timescales and structure sizes accessible via simulations of atomistic molecular dynamics (MD) can be advanced substantially by two independent techniques: (1) many-core parallelization with graphics processing units (GPUs) and (2) multiscale approximation with hierarchical charge partitioning (HCP). Achieving efficient many-core parallelization on the GPU generally requires highly synchronized and regular computation across the GPU. However, multiscale methods can result in highly asynchronous and irregular processing. Thus, one might expect that realizing such multiscale algorithms on the GPU would result in an overall loss of performance and that the total speedup obtained would be less than the product of the individual speedups for the two techniques separately, i.e., less than multiplicative speedup. To test this expectation in the context of atomistic MD, we designed and implemented our HCP multiscale method on NVIDIA GPU platforms. The HCP code was implemented in NAB, short for nucleic acid builder, and tested using the distance-dependent-dielectric, implicit solvent model. (NAB is the molecular dynamics module in the open-source Amber-Tools v1.4.) We show that for the HCP multiscale approximation and the common MD simulation model considered here, the degradation in performance due to asynchronous and irregular processing is mostly offset by a corresponding reduction in other asynchronous operations and slow global memory accesses. As a result, we realize near multiplicative speedups. For example, for a 475,000-atom virus capsid we were able to achieve a 11,071-fold combined speedup, only slightly less than the 11,706-fold multiplicative limit speedup -- 48.0-fold from the parallelization on the GPU times 243.9-fold from the multiscale approximation. The overall speedup depends on structure size, with smaller structures having lower speedups. An additional benefit of the HCP implementation on the GPU is the reduced memory requirement, which allows the processing of much larger structures that would otherwise be impossible on the limited memory GPU platform.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要