Full-text search for arXiv

Baruffa, F.

Normalized to: Baruffa, F.

5 article(s) in total. 9 co-authors, from 1 to 4 common article(s). Median position in authors list is 3,0.

[1] oai:arXiv.org:2002.08161 [pdf] - 2109019

Honing and proofing Astrophysical codes on the road to Exascale. Experiences from code modernization on many-core systems

Cielo, Salvatore; Iapichino, Luigi; Baruffa, Fabio; Bugli, Matteo; Federrath, Christoph

Comments: 16 pages, 10 figures, 4 tables. To be published in Future Generation of Computer Systems (FGCS), Special Issue on "On The Road to Exascale II: Advances in High Performance Computing and Simulations"

Submitted: 2020-02-19

The complexity of modern and upcoming computing architectures poses severe challenges for code developers and application specialists, and forces them to expose the highest possible degree of parallelism, in order to make the best use of the available hardware. The Intel$^{(R)}$ Xeon Phi$^{(TM)}$ of second generation (code-named Knights Landing, henceforth KNL) is the latest many-core system, which implements several interesting hardware features like for example a large number of cores per node (up to 72), the 512 bits-wide vector registers and the high-bandwidth memory. The unique features of KNL make this platform a powerful testbed for modern HPC applications. The performance of codes on KNL is therefore a useful proxy of their readiness for future architectures. In this work we describe the lessons learnt during the optimisation of the widely used codes for computational astrophysics P-Gadget-3, Flash and Echo. Moreover, we present results for the visualisation and analysis tools VisIt and yt. These examples show that modern architectures benefit from code optimisation at different levels, even more than traditional multi-core systems. However, the level of modernisation of typical community codes still needs improvements, for them to fully utilise resources of novel architectures.

[2] oai:arXiv.org:1910.07855 [pdf] - 1981856

Speeding simulation analysis up with yt and Intel Distribution for Python

Cielo, Salvatore; Iapichino, Luigi; Baruffa, Fabio

Comments: 3 pages, 1 figure, published on Intel Parallel Universe Magazine

Submitted: 2019-10-17

As modern scientific simulations grow ever more in size and complexity, even their analysis and post-processing becomes increasingly demanding, calling for the use of HPC resources and methods. yt is a parallel, open source post-processing python package for numerical simulations in astrophysics, made popular by its cross-format compatibility, its active community of developers and its integration with several other professional Python instruments. The Intel Distribution for Python enhances yt's performance and parallel scalability, through the optimization of lower-level libraries Numpy and Scipy, which make use of the optimized Intel Math Kernel Library (Intel-MKL) and the Intel MPI library for distributed computing. The library package yt is used for several analysis tasks, including integration of derived quantities, volumetric rendering, 2D phase plots, cosmological halo analysis and production of synthetic X-ray observation. In this paper, we provide a brief tutorial for the installation of yt and the Intel Distribution for Python, and the execution of each analysis task. Compared to the Anaconda python distribution, using the provided solution one can achieve net speedups up to 4.6x on Intel Xeon Scalable processors (codename Skylake).

[3] oai:arXiv.org:1810.04597 [pdf] - 1764494

ECHO-3DHPC: Advance the performance of astrophysics simulations with code modernization

Bugli, Matteo; Iapichino, Luigi; Baruffa, Fabio

Comments: 7 pages, 6 figures. Accepted for publication on The Parallel Universe Magazine ( https://software.intel.com/en-us/parallel-universe-magazine )

Submitted: 2018-10-10

We present recent developments in the parallelization scheme of ECHO-3DHPC, an efficient astrophysical code used in the modelling of relativistic plasmas. With the help of the Intel Software Development Tools, like Fortran compiler and Profile-Guided Optimization (PGO), Intel MPI library, VTune Amplifier and Inspector we have investigated the performance issues and improved the application scalability and the time to solution. The node-level performance is improved by $2.3 \times$ and, thanks to the improved threading parallelisation, the hybrid MPI-OpenMP version of the code outperforms the MPI-only, thus lowering the MPI communication overhead.

[4] oai:arXiv.org:1612.06090 [pdf] - 1580943

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Baruffa, Fabio; Iapichino, Luigi; Hammer, Nicolay J.; Karakasis, Vasileios

Comments: 8 pages, 2 columns, 4 figures, accepted as paper at HPCS Proceedings 2017, IEEE XPLORE

Submitted: 2016-12-19, last modified: 2017-05-10

We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We obtain shorter execution time and improved threading scalability both on Intel XeonR ($2.6 \times$ on Ivy Bridge) and Xeon PhiTM ($13.7 \times$ on Knights Corner) systems. First few tests of the optimised code result in $19.1 \times$ faster execution on second generation Xeon Phi (Knights Landing), thus demonstrating the portability of the devised optimisation solutions to upcoming architectures.

[5] oai:arXiv.org:1411.1289 [pdf] - 1251885

The 3D MHD code GOEMHD3 for large-Reynolds-number astrophysical plasmas

Skála, J.; Baruffa, F.; Büchner, J.; Rampp, M.

Comments: The revised version

Submitted: 2014-11-05, last modified: 2015-04-08

The numerical simulation of turbulence and flows in almost ideal, large-Reynolds-number astrophysical plasmas motivates the implementation of almost conservative MHD computer codes. They should efficiently calculate, use highly parallelized schemes scaling well with large numbers of CPU cores, allows to obtain a high grid resolution over large simulation domains and which can easily be adapted to new computer architectures as well as to new initial and boundary conditions, allow modular extensions. The new massively parallel simulation code GOEMHD3 enables efficient and fast simulations of almost ideal, large-Reynolds-number astrophysical plasma flows, well resolved and on huge grids covering large domains. Its abilities are validated by major tests of ideal and weakly dissipative plasma phenomena. The high resolution ($2048^3$ grid points) simulation of a large part of the solar corona above an observed active region proved the excellent parallel scalability of the code using more than 30.000 processor cores.