Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

Helge Knoop; Tobias Gronemeier; Matthias Sühring; Peter Steinbach; Matthias Noack; Florian Wende; Thomas Steinke; Christoph Knigge; Siegfried Raasch; Klaus Ketelsen

doi:10.1504/IJCSE.2018.095850

Details

Originalsprache	Englisch
Seiten (von - bis)	297-309
Seitenumfang	13
Fachzeitschrift	International Journal of Computational Science and Engineering
Jahrgang	17
Ausgabenummer	3
Publikationsstatus	Veröffentlicht - 27 Okt. 2018

Abstract

The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Mathematik (insg.)
Modellierung und Simulation
Informatik (insg.)
Hardware und Architektur
Mathematik (insg.)
Computational Mathematics
Informatik (insg.)
Theoretische Informatik und Mathematik

Zitieren

Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. / Knoop, Helge; Gronemeier, Tobias; Sühring, Matthias et al.
in: International Journal of Computational Science and Engineering, Jahrgang 17, Nr. 3, 27.10.2018, S. 297-309.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Knoop, H, Gronemeier, T, Sühring, M, Steinbach, P, Noack, M, Wende, F, Steinke, T, Knigge, C, Raasch, S & Ketelsen, K 2018, 'Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report', International Journal of Computational Science and Engineering, Jg. 17, Nr. 3, S. 297-309. https://doi.org/10.1504/IJCSE.2018.095850

Knoop, H., Gronemeier, T., Sühring, M., Steinbach, P., Noack, M., Wende, F., Steinke, T., Knigge, C., Raasch, S., & Ketelsen, K. (2018). Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering, 17(3), 297-309. https://doi.org/10.1504/IJCSE.2018.095850

Knoop H, Gronemeier T, Sühring M, Steinbach P, Noack M, Wende F et al. Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering. 2018 Okt 27;17(3):297-309. doi: 10.1504/IJCSE.2018.095850

Knoop, Helge ; Gronemeier, Tobias ; Sühring, Matthias et al. / Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. in: International Journal of Computational Science and Engineering. 2018 ; Jahrgang 17, Nr. 3. S. 297-309.

Download

@article{7ce77215a87d4023b8f89265eb2199d3,

title = "Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report",

abstract = "The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.",

keywords = "CFD, Computational fluid dynamics, GPU, Graphics processing unit, High performance computing, HPC, Large-eddy simulation, LES, Many integrated core processors, MIC, MPI, OpenACC, OpenMP, Porting, Xeon Phi",

author = "Helge Knoop and Tobias Gronemeier and Matthias S{\"u}hring and Peter Steinbach and Matthias Noack and Florian Wende and Thomas Steinke and Christoph Knigge and Siegfried Raasch and Klaus Ketelsen",

year = "2018",

month = oct,

day = "27",

doi = "10.1504/IJCSE.2018.095850",

language = "English",

volume = "17",

pages = "297--309",

number = "3",

}

Download

TY - JOUR

T1 - Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

AU - Knoop, Helge

AU - Gronemeier, Tobias

AU - Sühring, Matthias

AU - Steinbach, Peter

AU - Noack, Matthias

AU - Wende, Florian

AU - Steinke, Thomas

AU - Knigge, Christoph

AU - Raasch, Siegfried

AU - Ketelsen, Klaus

PY - 2018/10/27

Y1 - 2018/10/27

N2 - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

AB - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

KW - CFD

KW - Computational fluid dynamics

KW - GPU

KW - Graphics processing unit

KW - High performance computing

KW - HPC

KW - Large-eddy simulation

KW - LES

KW - Many integrated core processors

KW - MIC

KW - MPI

KW - OpenACC

KW - OpenMP

KW - Porting

KW - Xeon Phi

UR - http://www.scopus.com/inward/record.url?scp=85055875454&partnerID=8YFLogxK

U2 - 10.1504/IJCSE.2018.095850

DO - 10.1504/IJCSE.2018.095850

M3 - Article

AN - SCOPUS:85055875454

VL - 17

SP - 297

EP - 309

JO - International Journal of Computational Science and Engineering

JF - International Journal of Computational Science and Engineering

SN - 1742-7185

IS - 3

ER -

Research@Leibniz University

Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren