hide
Free keywords:
Intel Xeon Phi, Knights Landing, Electron Dynamics Simulation
Abstract:
We have been developing an advanced scientific code called "ARTED" for an electron dynamics simulation using the first-order computation of materials to be ported to various large-scale parallel systems including the "K" Computer, which was previously Japan's fastest supercomputer. In this paper, the implementation and performance evaluation of the ARTED code used in Intel's latest many-core processor, the Knights Landing (KNL) stand-alone cluster, are described based on past research on porting the code to the Knights Corner (KNC) accelerator. Our target system is Oakforest-PACS, which is currently the fastest supercomputer in Japan. For performance tuning on KNL, the largest issue is how to utilize multiple levels of parallelism, such as the instruction level (512-bit SIMD instruction), hardware thread (4 threads/core), and large number of cores. We focus on the dominant computation part of the code, where 25 points of a 3D stencil computation are required.
We successfully optimize this part to achieve 758.4 GFLOPS per node, which corresponds to 24.8% of the theoretical peak on the node of Oakforest-PACS using an Intel Xeon Phi 7250 (3046 GFLOPS peak). It is also shown that the KNL sustained performance is better than that of the two KNC accelerator cards. The entire ARTED code implies other time step computing, and was designed for a large-scale parallel execution using MPI, whereas single-node parallelization is achieved using OpenMP. We finally evaluate the entire parallel execution performance with up to 128 nodes.