Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Helge Knoop
  • Tobias Gronemeier
  • Matthias Sühring
  • Peter Steinbach
  • Matthias Noack
  • Florian Wende
  • Thomas Steinke
  • Christoph Knigge
  • Siegfried Raasch
  • Klaus Ketelsen

Externe Organisationen

  • Scionics Computer Innovation GmbH
  • Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)297-309
Seitenumfang13
FachzeitschriftInternational Journal of Computational Science and Engineering
Jahrgang17
Ausgabenummer3
PublikationsstatusElektronisch veröffentlicht (E-Pub) - 25 Okt. 2018

Abstract

The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

ASJC Scopus Sachgebiete

Zitieren

Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. / Knoop, Helge; Gronemeier, Tobias; Sühring, Matthias et al.
in: International Journal of Computational Science and Engineering, Jahrgang 17, Nr. 3, 25.10.2018, S. 297-309.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Knoop, H, Gronemeier, T, Sühring, M, Steinbach, P, Noack, M, Wende, F, Steinke, T, Knigge, C, Raasch, S & Ketelsen, K 2018, 'Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report', International Journal of Computational Science and Engineering, Jg. 17, Nr. 3, S. 297-309. https://doi.org/10.1504/IJCSE.2018.095850
Knoop, H., Gronemeier, T., Sühring, M., Steinbach, P., Noack, M., Wende, F., Steinke, T., Knigge, C., Raasch, S., & Ketelsen, K. (2018). Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering, 17(3), 297-309. Vorabveröffentlichung online. https://doi.org/10.1504/IJCSE.2018.095850
Knoop H, Gronemeier T, Sühring M, Steinbach P, Noack M, Wende F et al. Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. International Journal of Computational Science and Engineering. 2018 Okt 25;17(3):297-309. Epub 2018 Okt 25. doi: 10.1504/IJCSE.2018.095850
Knoop, Helge ; Gronemeier, Tobias ; Sühring, Matthias et al. / Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report. in: International Journal of Computational Science and Engineering. 2018 ; Jahrgang 17, Nr. 3. S. 297-309.
Download
@article{7ce77215a87d4023b8f89265eb2199d3,
title = "Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report",
abstract = "The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.",
keywords = "CFD, Computational fluid dynamics, GPU, Graphics processing unit, High performance computing, HPC, Large-eddy simulation, LES, Many integrated core processors, MIC, MPI, OpenACC, OpenMP, Porting, Xeon Phi",
author = "Helge Knoop and Tobias Gronemeier and Matthias S{\"u}hring and Peter Steinbach and Matthias Noack and Florian Wende and Thomas Steinke and Christoph Knigge and Siegfried Raasch and Klaus Ketelsen",
note = "Publisher Copyright: {\textcopyright} 2018 Inderscience Enterprises Ltd. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.",
year = "2018",
month = oct,
day = "25",
doi = "10.1504/IJCSE.2018.095850",
language = "English",
volume = "17",
pages = "297--309",
number = "3",

}

Download

TY - JOUR

T1 - Porting the MPI-parallelised les model PALM to multi-GPU systems and many integrated core processors - an experience report

AU - Knoop, Helge

AU - Gronemeier, Tobias

AU - Sühring, Matthias

AU - Steinbach, Peter

AU - Noack, Matthias

AU - Wende, Florian

AU - Steinke, Thomas

AU - Knigge, Christoph

AU - Raasch, Siegfried

AU - Ketelsen, Klaus

N1 - Publisher Copyright: © 2018 Inderscience Enterprises Ltd. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2018/10/25

Y1 - 2018/10/25

N2 - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

AB - The computational power and availability of graphics processing units (GPUs) and many integrated core (MIC) processors on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using OpenACC and OpenMP. PALM is written in Fortran, entails 140 kLOC and runs on HPC farms of up to 43,200 cores. The main porting challenges are the size and complexity of PALM, its inconsistent modularisation and no unit-tests. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps, describe the problems and bottlenecks we encountered and present separate performance tests for both architectures. We however, do not provide benchmark information.

KW - CFD

KW - Computational fluid dynamics

KW - GPU

KW - Graphics processing unit

KW - High performance computing

KW - HPC

KW - Large-eddy simulation

KW - LES

KW - Many integrated core processors

KW - MIC

KW - MPI

KW - OpenACC

KW - OpenMP

KW - Porting

KW - Xeon Phi

UR - http://www.scopus.com/inward/record.url?scp=85055875454&partnerID=8YFLogxK

U2 - 10.1504/IJCSE.2018.095850

DO - 10.1504/IJCSE.2018.095850

M3 - Article

AN - SCOPUS:85055875454

VL - 17

SP - 297

EP - 309

JO - International Journal of Computational Science and Engineering

JF - International Journal of Computational Science and Engineering

SN - 1742-7185

IS - 3

ER -