English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Talk

Overlapping communication and computation using the Intel MPI library's asynchronous progress control

MPS-Authors
/persons/resource/persons243356

Ohlmann,  Sebastian
Max Planck Computing and Data Facility, Max Planck Society;

/persons/resource/persons110221

Rampp,  Markus
Max Planck Computing and Data Facility, Max Planck Society;

External Ressource
No external resources are shared
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Ohlmann, S., Baruffa, F., & Rampp, M. (2020). Overlapping communication and computation using the Intel MPI library's asynchronous progress control. Talk presented at IXPUG Annual Meeting 2020 (IXPUG: Intel eXtreme Performance Users Group). Virtually Hosted by TACC. 2020-10-13 - 2020-10-16.


Cite as: http://hdl.handle.net/21.11116/0000-0007-AACB-5
Abstract
When scaling HPC applications to large-scale systems, the time spent in communication often becomes a bottleneck. A well-known technique to tackle this problem is overlapping communication and computation to hide communication time. In MPI codes, however, using non-blocking functions is not enough -- the progress of the communication needs to be triggered explicitly, either by the application code or by special features of the MPI library. In this talk, we explore overlapping communication and computation using the asynchronous progress control feature of the Intel(r) MPI library by applying it to stencil codes with a domain decomposition. With this feature, the MPI library transparently handles the progress of non-blocking MPI communication, removing the need for an explicit control in the application. First, we introduce the asynchronous progress control of the Intel(r) MPI library and how it can be used to improve performance and scalability of a simple domain-decomposition code. Second, we show how a real-world application can benefit from this feature: the electronic structure code Octopus that uses finite difference stencils to solve the time-dependent DFT equations. Moreover, we try to generalize the conditions under which other stencil codes with a domain decomposition can benefit as well. All tests have been run on Cobra, the current flagship system of the Max Planck Society with about 3400 nodes (having 2 Intel(r) Skylake 6148 Gold sockets each) and an Omnipath interconnect, hosted at the Max Planck Computing and Data Facility (MPCDF).