Details
Original language | English |
---|---|
Pages (from-to) | 297-309 |
Number of pages | 13 |
Journal | International Journal of Computational Science and Engineering |
Volume | 17 |
Issue number | 3 |
Publication status | E-pub ahead of print - 25 Oct 2018 |
Abstract
The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.
Keywords
- CFD, Computational fluid dynamics, GPU, Graphics processing unit, High performance computing, HPC, Large-eddy simulation, LES, Many integrated core processors, MIC, MPI, OpenACC, OpenMP, Porting, Xeon Phi
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Mathematics(all)
- Modelling and Simulation
- Computer Science(all)
- Hardware and Architecture
- Mathematics(all)
- Computational Mathematics
- Computer Science(all)
- Computational Theory and Mathematics
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: International Journal of Computational Science and Engineering, Vol. 17, No. 3, 25.10.2018, p. 297-309.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report
AU - Knoop, Helge
AU - Gronemeier, Tobias
AU - Sühring, Matthias
AU - Steinbach, Peter
AU - Noack, Matthias
AU - Wende, Florian
AU - Steinke, Thomas
AU - Knigge, Christoph
AU - Raasch, Siegfried
AU - Ketelsen, Klaus
N1 - Publisher Copyright: © 2018 Inderscience Enterprises Ltd. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2018/10/25
Y1 - 2018/10/25
N2 - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.
AB - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.
KW - CFD
KW - Computational fluid dynamics
KW - GPU
KW - Graphics processing unit
KW - High performance computing
KW - HPC
KW - Large-eddy simulation
KW - LES
KW - Many integrated core processors
KW - MIC
KW - MPI
KW - OpenACC
KW - OpenMP
KW - Porting
KW - Xeon Phi
UR - http://www.scopus.com/inward/record.url?scp=85055875454&partnerID=8YFLogxK
U2 - 10.1504/IJCSE.2018.095850
DO - 10.1504/IJCSE.2018.095850
M3 - Article
AN - SCOPUS:85055875454
VL - 17
SP - 297
EP - 309
JO - International Journal of Computational Science and Engineering
JF - International Journal of Computational Science and Engineering
SN - 1742-7185
IS - 3
ER -