A portable C++ library for memory and compute abstraction on multi-core CPUs 
and GPUs.

Incardona, Pietro; Gupta, Aryaman; Yaskovets, Serhii; Sbalzarini, Ivo F.

doi:10.1002/cpe.7870

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs.

MPS-Authors

/persons/resource/persons219256

Incardona, Pietro
Max Planck Institute for Molecular Cell Biology and Genetics, Max Planck Society;

Gupta, Aryaman
Max Planck Institute for Molecular Cell Biology and Genetics, Max Planck Society;

Yaskovets, Serhii
Max Planck Institute for Molecular Cell Biology and Genetics, Max Planck Society;

/persons/resource/persons219620

Sbalzarini, Ivo F.
Max Planck Institute for Molecular Cell Biology and Genetics, Max Planck Society;

External Resource

https://publications.mpi-cbg.de/Incardona_2023_8585.pdf
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Incardona, P., Gupta, A., Yaskovets, S., & Sbalzarini, I. F. (2023). A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs. Concurrency and Computation: Practice and Experience, 35(25): e7870, pp. 1-15. doi:10.1002/cpe.7870.

Cite as: https://hdl.handle.net/21.11116/0000-000E-AA96-9

Abstract

We present a C++ library for transparent memory and compute abstraction across CPU and GPU architectures. Our library combines generic data structures like vectors, multi-dimensional arrays, maps, graphs, and sparse grids with basic generic algorithms like arbitrary-dimensional convolutions, copying, merging, sorting, prefix sum, reductions, neighbor search, and filtering. The memory layout of the data structures is adapted at compile time using C++ tuples with optional memory double-mapping between host and device and the capability of using memory managed by external libraries with no data copying. We combine this transparent memory layout with generic thread-parallel algorithms under two alternative common interfaces: a CUDA-like kernel interface and a lambda-function interface. We quantify the memory and compute performance and portability of our implementation using micro-benchmarks, showing that the abstractions introduce negligible performance overhead, and we compare performance against the current state of the art in a real-world scientific application from computational fluid mechanics.