Optimizing Application Performance with BlueField: Accelerating Large-Message 
Blocking and Nonblocking Collective Operations

Graham, Richard; Bosilca, George; Qin, Yong; Settlemyer, Bradley; Shainer, Gilad; Stunkel, Craig; Vallee, Geoffroy; Williams, Brody; Cisneros-Stoianowski, Gerardo; Ohlmann, Sebastian; Rampp, Markus

doi:10.23919/ISC.2024.10528935

Optimizing Application Performance with BlueField: Accelerating Large-Message Blocking and Nonblocking Collective Operations

Graham, R., Bosilca, G., Qin, Y., Settlemyer, B., Shainer, G., Stunkel, C., Vallee, G., Williams, B., Cisneros-Stoianowski, G., Ohlmann, S., & Rampp, M. (2024). Optimizing Application Performance with BlueField: Accelerating Large-Message Blocking and Nonblocking Collective Operations. In ISC High Performance 2024 Research Paper Proceedings (39th International Conference). Prometeus GmbH. doi:10.23919/ISC.2024.10528935.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/21.11116/0000-000F-5B88-2 版のパーマリンク: https://hdl.handle.net/21.11116/0000-000F-5B89-1

資料種別: 会議論文

ファイル

表示: ファイル

非表示: ファイル

:

Optimizing Application Performance with BlueField Accelerating Large-Message Blocking and Nonblocking Collective Operations.pdf (全文テキスト（全般）), 375KB

ファイルのパーマリンク:
-

ファイル名:
Optimizing Application Performance with BlueField Accelerating Large-Message Blocking and Nonblocking Collective Operations.pdf

説明:
-

OA-Status:

閲覧制限:
非公開

MIMEタイプ / チェックサム:
application/pdf

技術的なメタデータ:

著作権日付:
-

著作権情報:
-

CCライセンス:
-

作成者

表示:

非表示:

作成者:
Graham, Richard, 著者
Bosilca, George, 著者
Qin, Yong, 著者
Settlemyer, Bradley, 著者
Shainer, Gilad, 著者
Stunkel, Craig, 著者
Vallee, Geoffroy, 著者
Williams, Brody, 著者
Cisneros-Stoianowski, Gerardo, 著者
Ohlmann, Sebastian¹, 著者
Rampp, Markus¹, 著者

所属:
1Max Planck Computing and Data Facility, Max Planck Society, ou_2364734

内容説明

表示:

非表示:

キーワード: -

要旨: With the end of Dennard scaling, specializing and distributing compute engines throughout the system is a promising technique to improve applications performance. For example, NVIDIA's BlueField Data Processing Unit (DPU) integrates programmable processing elements within the network and offers specialized network processing capabilities. These capabilities enable communication via offloads onto DPUs and present new application opportunities for offloading nonblocking or complex communication patterns such as collective communication operations. This paper discusses the lessons learned enabling DPU-based acceleration for collective communication algorithms by describing the impact of such offloaded collective operations on two applications: Octopus and P3DFFT++. We present new algorithms for the nonblocking MPI_Ialltoallv and blocking MPI_Allgatherv collective operations that leverage DPU offloading, which are used by the above applications, and evaluate them. Our experiments show a performance improvement in the range of 14% to 49% for P3DFFT++ and 17% for Octopus, even though the performance of those collectives in well-balanced OSU latency benchmarks shows comparable performance to well-optimized host-based implementations of these collectives. This demonstrates that taking into account load imbalance in communication algorithms can help improve application performance where such imbalance is common and large in magnitude.

資料詳細

表示:

非表示:

言語:

日付: オンライン出版: 2024-05-10

出版の状態: オンラインで出版済み

ページ: -

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: DOI: 10.23919/ISC.2024.10528935

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

種別: 会議論文集

著者・編者:

所属:

出版社, 出版地: Prometeus GmbH

ページ: - 巻号: - 通巻号: - 開始・終了ページ: - 識別子（ISBN, ISSN, DOIなど）: ISBN: 978-3-9826336-0-2

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1