English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Performance Optimization and Evaluation of Scalable Optoelectronics Application on Large Scale KNL Cluster

Hirokawa, Y., Boku, T., Uemoto, M., Sato, S., & Yaban, K. (2018). Performance Optimization and Evaluation of Scalable Optoelectronics Application on Large Scale KNL Cluster. In R. Yokota, M. Weiland, J. Shalf, & S. Alam (Eds.), High Performance Computing. Basel, Switzerland: Springer International Publishing. doi:10.1007/978-3-319-92040-5_11.

Item is

Basic

show hide
Genre: Conference Paper

Files

show Files
hide Files
:
Hirokawa2018_Chapter_PerformanceOptimizationAndEval.pdf (Publisher version), 2MB
 
File Permalink:
-
Name:
Hirokawa2018_Chapter_PerformanceOptimizationAndEval.pdf
Description:
-
OA-Status:
Visibility:
Private
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show
hide
Description:
-
OA-Status:

Creators

show
hide
 Creators:
Hirokawa, Y.1, Author
Boku, T.1, 2, Author
Uemoto, M.2, Author
Sato, S.3, Author           
Yaban, K.2, Author
Affiliations:
1Graduate School of Systems and Information Engineering, University of Tsukuba, ou_persistent22              
2Center for Computational Sciences, University of Tsukuba, ou_persistent22              
3Theory Group, Theory Department, Max Planck Institute for the Structure and Dynamics of Matter, Max Planck Society, ou_2266715              

Content

show
hide
Free keywords: -
 Abstract: “ARTED” is an advanced scientific code for electron dynamics simulation which has been ported to various large-scale parallel systems including the “K” Computer, the ex-fastest supercomputer in the world, and many other MPP and cluster systems.

In this paper, we describe ARTED’s code optimization and performance evaluation applied to a large-scale cluster with Intel’s latest many-core processor, KNL (Knights Landing), based on past research regarding porting ARTED to the KNC (Knights Corner) coprocessor. Code optimization for dominant computation has been thoroughly carried out in KNL to achieve the highest performance with detailed optimization such as memory access, vectorization for the AVX-512 instruction set, cache utilization, etc. For further tuning, we investigated various KNL-dedicated techniques such as combining MCDRAM/DDR4 memories and parallel vector summation.

After detailed performance tuning on each core to achieve up to 25% of theoretical peak in the kernel part with 3-D stencil computation, we evaluated the application performance on the full system (25 PFLOPS of theoretical peak) of the KNL cluster “Oakforest-PACS” which is the largest KNL-based cluster in the world using the Intel Omni-Path Architecture. It shows excellent weak scaling with a dominant Hamiltonian performance of up to 4 PFLOPS (16% efficiency of the system) in double precision irrespective of simulation size as well as reasonable strong scaling on material simulations requiring high degree of parallelism.

Details

show
hide
Language(s): eng - English
 Dates: 20182018
 Publication Status: Issued
 Pages: 21
 Publishing info: -
 Table of Contents: -
 Rev. Type: Internal
 Identifiers: DOI: 10.1007/978-3-319-92040-5_11
 Degree: -

Event

show
hide
Title: 33rd International Conference on High Performance Computing (ISC High Performance)
Place of Event: Frankfurt/Main, Germany
Start-/End Date: 2018-06-24 - 2018-06-28

Legal Case

show

Project information

show

Source 1

show
hide
Title: High Performance Computing
  Subtitle : ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 24 - 28, 2018, Revised Selected Papers
Source Genre: Book
 Creator(s):
Yokota, R.1, Editor
Weiland, M.1, Editor
Shalf, J.1, Editor
Alam, S.1, Editor
Affiliations:
1 external, ou_persistent22            
Publ. Info: Basel, Switzerland : Springer International Publishing
Pages: - Volume / Issue: 11203 Sequence Number: - Start / End Page: - Identifier: ISBN: 978-3-030-02464-2
DOI: 10.1007/978-3-030-02465-9