Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

Helge Knoop; Tobias Gronemeier; Matthias Sühring; Peter Steinbach; Matthias Noack; Florian Wende; Thomas Steinke; Christoph Knigge; Siegfried Raasch; Klaus Ketelsen

doi:10.1504/IJCSE.2018.095850

Details

Original language	English
Pages (from-to)	297-309
Number of pages	13
Journal	International Journal of Computational Science and Engineering
Volume	17
Issue number	3
Publication status	Published - 27 Oct 2018

Abstract

The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

Keywords

CFD, Computational fluid dynamics, GPU, Graphics processing unit, High performance computing, HPC, Large-eddy simulation, LES, Many integrated core processors, MIC, MPI, OpenACC, OpenMP, Porting, Xeon Phi

ASJC Scopus subject areas

Computer Science(all)
Software
Mathematics(all)
Modelling and Simulation
Computer Science(all)
Hardware and Architecture
Mathematics(all)
Computational Mathematics
Computer Science(all)
Computational Theory and Mathematics

Cite this

Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. / Knoop, Helge; Gronemeier, Tobias; Sühring, Matthias et al.
In: International Journal of Computational Science and Engineering, Vol. 17, No. 3, 27.10.2018, p. 297-309.

Research output: Contribution to journal › Article › Research › peer review

Knoop, H, Gronemeier, T, Sühring, M, Steinbach, P, Noack, M, Wende, F, Steinke, T, Knigge, C, Raasch, S & Ketelsen, K 2018, 'Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report', International Journal of Computational Science and Engineering, vol. 17, no. 3, pp. 297-309. https://doi.org/10.1504/IJCSE.2018.095850

Knoop, H., Gronemeier, T., Sühring, M., Steinbach, P., Noack, M., Wende, F., Steinke, T., Knigge, C., Raasch, S., & Ketelsen, K. (2018). Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering, 17(3), 297-309. https://doi.org/10.1504/IJCSE.2018.095850

Knoop H, Gronemeier T, Sühring M, Steinbach P, Noack M, Wende F et al. Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering. 2018 Oct 27;17(3):297-309. doi: 10.1504/IJCSE.2018.095850

Knoop, Helge ; Gronemeier, Tobias ; Sühring, Matthias et al. / Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. In: International Journal of Computational Science and Engineering. 2018 ; Vol. 17, No. 3. pp. 297-309.

Download

@article{7ce77215a87d4023b8f89265eb2199d3,

title = "Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report",

abstract = "The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.",

keywords = "CFD, Computational fluid dynamics, GPU, Graphics processing unit, High performance computing, HPC, Large-eddy simulation, LES, Many integrated core processors, MIC, MPI, OpenACC, OpenMP, Porting, Xeon Phi",

author = "Helge Knoop and Tobias Gronemeier and Matthias S{\"u}hring and Peter Steinbach and Matthias Noack and Florian Wende and Thomas Steinke and Christoph Knigge and Siegfried Raasch and Klaus Ketelsen",

year = "2018",

month = oct,

day = "27",

doi = "10.1504/IJCSE.2018.095850",

language = "English",

volume = "17",

pages = "297--309",

number = "3",

}

Download

TY - JOUR

T1 - Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

AU - Knoop, Helge

AU - Gronemeier, Tobias

AU - Sühring, Matthias

AU - Steinbach, Peter

AU - Noack, Matthias

AU - Wende, Florian

AU - Steinke, Thomas

AU - Knigge, Christoph

AU - Raasch, Siegfried

AU - Ketelsen, Klaus

PY - 2018/10/27

Y1 - 2018/10/27

N2 - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

AB - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

KW - CFD

KW - Computational fluid dynamics

KW - GPU

KW - Graphics processing unit

KW - High performance computing

KW - HPC

KW - Large-eddy simulation

KW - LES

KW - Many integrated core processors

KW - MIC

KW - MPI

KW - OpenACC

KW - OpenMP

KW - Porting

KW - Xeon Phi

UR - http://www.scopus.com/inward/record.url?scp=85055875454&partnerID=8YFLogxK

U2 - 10.1504/IJCSE.2018.095850

DO - 10.1504/IJCSE.2018.095850

M3 - Article

AN - SCOPUS:85055875454

VL - 17

SP - 297

EP - 309

JO - International Journal of Computational Science and Engineering

JF - International Journal of Computational Science and Engineering

SN - 1742-7185

IS - 3

ER -

Research@Leibniz University

Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this