English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Cache Oblivious Parallelograms in Iterative Stencil Computations

Strzodka, R., Shaheen, M., Pajak, D., & Seidel, H.-P. (2010). Cache Oblivious Parallelograms in Iterative Stencil Computations. In ICS'10 (pp. 49-59). New York, NY: ACM. doi:10.1145/1810085.1810096.

Item is

Files

show Files
hide Files
:
CORALS.pdf (Any fulltext), 216KB
 
File Permalink:
-
Name:
CORALS.pdf
Description:
-
OA-Status:
Visibility:
Private
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Strzodka, Robert1, 2, Author           
Shaheen, Mohammed1, 3, Author           
Pajak, Dawid1, Author           
Seidel, Hans-Peter1, Author                 
Affiliations:
1Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047              
2Graphics - Optics - Vision, MPI for Informatics, Max Planck Society, ou_1116549              
3International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551              

Content

show
hide
Free keywords: -
 Abstract: We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains with up to 128 million double precision elements for constant and variable stencils against hand-optimized naive code and the automatic polyhedral parallelizer and locality optimizer PluTo and demonstrate the clear superiority of our results. The performance benefits stem from a tiling structure that caters for data locality, parallelism and vectorization simultaneously. Rather than tiling the iteration space from inside, we take an exterior approach with a predefined hierarchy, simple regular parallelogram tiles and a locality preserving parallelization. These advantages come at the cost of an irregular work-load distribution but a tightly integrated load-balancer ensures a high utilization of all resources.

Details

show
hide
Language(s): eng - English
 Dates: 2011-01-1920102010
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 537274
DOI: 10.1145/1810085.1810096
BibTex Citekey: StShPa_10CORALS
 Degree: -

Event

show
hide
Title: 24th ACM International Conference on Supercomputing
Place of Event: Tsukuba, Ibaraki, Japan
Start-/End Date: 2010-06-01 - 2010-06-05

Legal Case

show

Project information

show

Source 1

show
hide
Title: ICS'10
  Abbreviation : ICS 2010
  Subtitle : 2010 International Conference on Supercomputing
Source Genre: Proceedings
 Creator(s):
Affiliations:
Publ. Info: New York, NY : ACM
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 49 - 59 Identifier: ISBN: 978-1-4503-0018-6