Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Helge Knoop
  • Tobias Gronemeier
  • Matthias Sühring
  • Peter Steinbach
  • Matthias Noack
  • Florian Wende
  • Thomas Steinke
  • Christoph Knigge
  • Siegfried Raasch
  • Klaus Ketelsen

External Research Organisations

  • Scionics Computer Innovation GmbH
  • Zuse Institute Berlin (ZIB)
View graph of relations

Details

Original languageEnglish
Pages (from-to)297-309
Number of pages13
JournalInternational Journal of Computational Science and Engineering
Volume17
Issue number3
Publication statusE-pub ahead of print - 25 Oct 2018

Abstract

The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

Keywords

    CFD, Computational fluid dynamics, GPU, Graphics processing unit, High performance computing, HPC, Large-eddy simulation, LES, Many integrated core processors, MIC, MPI, OpenACC, OpenMP, Porting, Xeon Phi

ASJC Scopus subject areas

Cite this

Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. / Knoop, Helge; Gronemeier, Tobias; Sühring, Matthias et al.
In: International Journal of Computational Science and Engineering, Vol. 17, No. 3, 25.10.2018, p. 297-309.

Research output: Contribution to journalArticleResearchpeer review

Knoop, H, Gronemeier, T, Sühring, M, Steinbach, P, Noack, M, Wende, F, Steinke, T, Knigge, C, Raasch, S & Ketelsen, K 2018, 'Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report', International Journal of Computational Science and Engineering, vol. 17, no. 3, pp. 297-309. https://doi.org/10.1504/IJCSE.2018.095850
Knoop, H., Gronemeier, T., Sühring, M., Steinbach, P., Noack, M., Wende, F., Steinke, T., Knigge, C., Raasch, S., & Ketelsen, K. (2018). Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering, 17(3), 297-309. Advance online publication. https://doi.org/10.1504/IJCSE.2018.095850
Knoop H, Gronemeier T, Sühring M, Steinbach P, Noack M, Wende F et al. Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering. 2018 Oct 25;17(3):297-309. Epub 2018 Oct 25. doi: 10.1504/IJCSE.2018.095850
Knoop, Helge ; Gronemeier, Tobias ; Sühring, Matthias et al. / Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. In: International Journal of Computational Science and Engineering. 2018 ; Vol. 17, No. 3. pp. 297-309.
Download
@article{7ce77215a87d4023b8f89265eb2199d3,
title = "Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report",
abstract = "The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.",
keywords = "CFD, Computational fluid dynamics, GPU, Graphics processing unit, High performance computing, HPC, Large-eddy simulation, LES, Many integrated core processors, MIC, MPI, OpenACC, OpenMP, Porting, Xeon Phi",
author = "Helge Knoop and Tobias Gronemeier and Matthias S{\"u}hring and Peter Steinbach and Matthias Noack and Florian Wende and Thomas Steinke and Christoph Knigge and Siegfried Raasch and Klaus Ketelsen",
note = "Publisher Copyright: {\textcopyright} 2018 Inderscience Enterprises Ltd. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.",
year = "2018",
month = oct,
day = "25",
doi = "10.1504/IJCSE.2018.095850",
language = "English",
volume = "17",
pages = "297--309",
number = "3",

}

Download

TY - JOUR

T1 - Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

AU - Knoop, Helge

AU - Gronemeier, Tobias

AU - Sühring, Matthias

AU - Steinbach, Peter

AU - Noack, Matthias

AU - Wende, Florian

AU - Steinke, Thomas

AU - Knigge, Christoph

AU - Raasch, Siegfried

AU - Ketelsen, Klaus

N1 - Publisher Copyright: © 2018 Inderscience Enterprises Ltd. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2018/10/25

Y1 - 2018/10/25

N2 - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

AB - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

KW - CFD

KW - Computational fluid dynamics

KW - GPU

KW - Graphics processing unit

KW - High performance computing

KW - HPC

KW - Large-eddy simulation

KW - LES

KW - Many integrated core processors

KW - MIC

KW - MPI

KW - OpenACC

KW - OpenMP

KW - Porting

KW - Xeon Phi

UR - http://www.scopus.com/inward/record.url?scp=85055875454&partnerID=8YFLogxK

U2 - 10.1504/IJCSE.2018.095850

DO - 10.1504/IJCSE.2018.095850

M3 - Article

AN - SCOPUS:85055875454

VL - 17

SP - 297

EP - 309

JO - International Journal of Computational Science and Engineering

JF - International Journal of Computational Science and Engineering

SN - 1742-7185

IS - 3

ER -