English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  A CUDA fast multipole method with highly efficient M2L farfield evaluationfield evaluation

Kohnke, B., Kutzner, C., Beckmann, A., Lube, G., Kabadshow, I., Dachsel, H., et al. (2021). A CUDA fast multipole method with highly efficient M2L farfield evaluationfield evaluation. The International Journal of High Performance Computing Applications, 35(1), 97-117. doi:10.1177/1094342020964857.

Item is

Basic

show hide
Genre: Journal Article

Files

show Files
hide Files
:
3260951.pdf (Publisher version), 3MB
Name:
3260951.pdf
Description:
-
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Kohnke, B.1, Author              
Kutzner, C.2, Author              
Beckmann, A., Author
Lube, G., Author
Kabadshow, I., Author
Dachsel, H., Author
Grubmüller, H.2, Author              
Affiliations:
1Department of Theoretical and Computational Biophysics, MPI for Biophysical Chemistry, Max Planck Society, ou_578631              
2Department of Theoretical and Computational Biophysics, MPI for biophysical chemistry, Max Planck Society, ou_578631              

Content

show
hide
Free keywords: Fast multipole method, Multipole-to-Local, molecular dynamics, electrostatics, CUDA
 Abstract: Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in manyscientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processingunits (GPUs). However, the naive pairwise summation hasOðN2Þcomputational complexity. The fast multipole method(FMM) can reduce runtime and complexity toOðNÞfor any specified precision. Here, we present a CUDA-accelerated,CþþFMM implementation for multi particle systems withr1potential that are found, e.g. in biomolecular simulations.The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmarkthree different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming andporting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA DynamicParallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach(3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memoryaccess and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance islimited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporatedwithin the GROMACS molecular dynamics package as an alternative Coulomb solver.

Details

show
hide
Language(s): eng - English
 Dates: 2020-10-122021-01
 Publication Status: Published in print
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.1177/1094342020964857
 Degree: -

Event

show

Legal Case

show

Project information

show hide
Project name : -
Grant ID : -
Funding program : Software for Exascale Computing (SPP 1648)
Funding organization : DFG

Source 1

show
hide
Title: The International Journal of High Performance Computing Applications
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 35 (1) Sequence Number: - Start / End Page: 97 - 117 Identifier: -