Normalized to: Baruffa, F.
[1]
oai:arXiv.org:2002.08161 [pdf] - 2109019
Honing and proofing Astrophysical codes on the road to Exascale.
Experiences from code modernization on many-core systems
Submitted: 2020-02-19
The complexity of modern and upcoming computing architectures poses severe
challenges for code developers and application specialists, and forces them to
expose the highest possible degree of parallelism, in order to make the best
use of the available hardware. The Intel$^{(R)}$ Xeon Phi$^{(TM)}$ of second
generation (code-named Knights Landing, henceforth KNL) is the latest many-core
system, which implements several interesting hardware features like for example
a large number of cores per node (up to 72), the 512 bits-wide vector registers
and the high-bandwidth memory. The unique features of KNL make this platform a
powerful testbed for modern HPC applications. The performance of codes on KNL
is therefore a useful proxy of their readiness for future architectures. In
this work we describe the lessons learnt during the optimisation of the widely
used codes for computational astrophysics P-Gadget-3, Flash and Echo. Moreover,
we present results for the visualisation and analysis tools VisIt and yt. These
examples show that modern architectures benefit from code optimisation at
different levels, even more than traditional multi-core systems. However, the
level of modernisation of typical community codes still needs improvements, for
them to fully utilise resources of novel architectures.
[2]
oai:arXiv.org:1910.07855 [pdf] - 1981856
Speeding simulation analysis up with yt and Intel Distribution for
Python
Submitted: 2019-10-17
As modern scientific simulations grow ever more in size and complexity, even
their analysis and post-processing becomes increasingly demanding, calling for
the use of HPC resources and methods. yt is a parallel, open source
post-processing python package for numerical simulations in astrophysics, made
popular by its cross-format compatibility, its active community of developers
and its integration with several other professional Python instruments. The
Intel Distribution for Python enhances yt's performance and parallel
scalability, through the optimization of lower-level libraries Numpy and Scipy,
which make use of the optimized Intel Math Kernel Library (Intel-MKL) and the
Intel MPI library for distributed computing. The library package yt is used for
several analysis tasks, including integration of derived quantities, volumetric
rendering, 2D phase plots, cosmological halo analysis and production of
synthetic X-ray observation. In this paper, we provide a brief tutorial for the
installation of yt and the Intel Distribution for Python, and the execution of
each analysis task. Compared to the Anaconda python distribution, using the
provided solution one can achieve net speedups up to 4.6x on Intel Xeon
Scalable processors (codename Skylake).
[3]
oai:arXiv.org:1810.04597 [pdf] - 1764494
ECHO-3DHPC: Advance the performance of astrophysics simulations with
code modernization
Submitted: 2018-10-10
We present recent developments in the parallelization scheme of ECHO-3DHPC,
an efficient astrophysical code used in the modelling of relativistic plasmas.
With the help of the Intel Software Development Tools, like Fortran compiler
and Profile-Guided Optimization (PGO), Intel MPI library, VTune Amplifier and
Inspector we have investigated the performance issues and improved the
application scalability and the time to solution. The node-level performance is
improved by $2.3 \times$ and, thanks to the improved threading parallelisation,
the hybrid MPI-OpenMP version of the code outperforms the MPI-only, thus
lowering the MPI communication overhead.
[4]
oai:arXiv.org:1612.06090 [pdf] - 1580943
Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms
for Multi/Many-Core Architectures
Submitted: 2016-12-19, last modified: 2017-05-10
We describe a strategy for code modernisation of Gadget, a widely used
community code for computational astrophysics. The focus of this work is on
node-level performance optimisation, targeting current multi/many-core IntelR
architectures. We identify and isolate a sample code kernel, which is
representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm.
The code modifications include threading parallelism optimisation, change of
the data layout into Structure of Arrays (SoA), auto-vectorisation and
algorithmic improvements in the particle sorting. We obtain shorter execution
time and improved threading scalability both on Intel XeonR ($2.6 \times$ on
Ivy Bridge) and Xeon PhiTM ($13.7 \times$ on Knights Corner) systems. First few
tests of the optimised code result in $19.1 \times$ faster execution on second
generation Xeon Phi (Knights Landing), thus demonstrating the portability of
the devised optimisation solutions to upcoming architectures.
[5]
oai:arXiv.org:1411.1289 [pdf] - 1251885
The 3D MHD code GOEMHD3 for large-Reynolds-number astrophysical plasmas
Submitted: 2014-11-05, last modified: 2015-04-08
The numerical simulation of turbulence and flows in almost ideal,
large-Reynolds-number astrophysical plasmas motivates the implementation of
almost conservative MHD computer codes. They should efficiently calculate, use
highly parallelized schemes scaling well with large numbers of CPU cores,
allows to obtain a high grid resolution over large simulation domains and which
can easily be adapted to new computer architectures as well as to new initial
and boundary conditions, allow modular extensions. The new massively parallel
simulation code GOEMHD3 enables efficient and fast simulations of almost ideal,
large-Reynolds-number astrophysical plasma flows, well resolved and on huge
grids covering large domains. Its abilities are validated by major tests of
ideal and weakly dissipative plasma phenomena. The high resolution ($2048^3$
grid points) simulation of a large part of the solar corona above an observed
active region proved the excellent parallel scalability of the code using more
than 30.000 processor cores.