Normalized to: Goz, D.
[1]
oai:arXiv.org:2003.03283 [pdf] - 2077028
Performance and energy footprint assessment of FPGAs and GPUs on HPC
systems using Astrophysics application
Goz, David;
Ieronymakis, Georgios;
Papaefstathiou, Vassilis;
Dimou, Nikolaos;
Bertocco, Sara;
Simula, Francesco;
Ragagnin, Antonio;
Tornatore, Luca;
Coretti, Igor;
Taffoni, Giuliano
Submitted: 2020-03-06, last modified: 2020-04-10
New challenges in Astronomy and Astrophysics (AA) are urging the need for a
large number of exceptionally computationally intensive simulations. "Exascale"
(and beyond) computational facilities are mandatory to address the size of
theoretical problems and data coming from the new generation of observational
facilities in AA. Currently, the High Performance Computing (HPC) sector is
undergoing a profound phase of innovation, in which the primary challenge to
the achievement of the "Exascale" is the power-consumption. The goal of this
work is to give some insights about performance and energy footprint of
contemporary architectures for a real astrophysical application in an HPC
context. We use a state-of-the-art N-body application that we re-engineered and
optimized to exploit the heterogeneous underlying hardware fully. We
quantitatively evaluate the impact of computation on energy consumption when
running on four different platforms. Two of them represent the current HPC
systems (Intel-based and equipped with NVIDIA GPUs), one is a micro-cluster
based on ARM-MPSoC, and one is a "prototype towards Exascale" equipped with
ARM-MPSoCs tightly coupled with FPGAs. We investigate the behavior of the
different devices where the high-end GPUs excel in terms of time-to-solution
while MPSoC-FPGA systems outperform GPUs in power consumption. Our experience
reveals that considering FPGAs for computationally intensive application seems
very promising, as their performance is improving to meet the requirements of
scientific applications. This work can be a reference for future platforms
development for astrophysics applications where computationally intensive
calculations are required.
[2]
oai:arXiv.org:2003.10850 [pdf] - 2069198
Gadget3 on GPUs with OpenACC
Submitted: 2020-03-24
We present preliminary results of a GPU porting of all main Gadget3 modules
(gravity computation, SPH density computation, SPH hydrodynamic force, and
thermal conduction) using OpenACC directives. Here we assign one GPU to each
MPI rank and exploit both the host and accellerator capabilities by overlapping
computations on the CPUs and GPUs: while GPUs asynchronously compute
interactions between particles within their MPI ranks, CPUs perform tree-walks
and MPI communications of neighbouring particles. We profile various portions
of the code to understand the origin of our speedup, where we find that a peak
speedup is not achieved because of time-steps with few active particles. We run
a hydrodynamic cosmological simulation from the Magneticum project, with
$2\cdot10^{7}$ particles, where we find a final total speedup of $\approx 2.$
We also present the results of an encouraging scaling test of a preliminary
gravity-only OpenACC porting, run in the context of the EuroHack17 event, where
the prototype of the porting proved to keep a constant speedup up to $1024$
GPUs.
[3]
oai:arXiv.org:1912.05340 [pdf] - 2013400
INAF Trieste Astronomical Observatory Information Technology Framework
Submitted: 2019-12-11
INAF Trieste Astronomical Observatory (OATs) has a long tradition in
information technology applied to Astronomical and Astrophysical use cases,
particularly for what regards computing for data reduction, analysis and
simulations; data and archives management; space missions data processing;
design and software development for ground-based instruments. The ensemble of
these activities, in the last years, pushed the need to acquire new computing
resources and technologies and to deep competences in theirs management. In
this paper we describe INAF-OATs computing centre technological stuff, our
involvement in different EU Projects both in the path of building of EOSC, the
European Open Science Cloud; in the design and prototyping of new Exascale
supercomputers in Europe and the main research activities carried on using our
computing centre.
[4]
oai:arXiv.org:1910.14496 [pdf] - 1989177
Direct N-body application on low-power and energy-efficient parallel
architectures
Submitted: 2019-10-31
The aim of this work is to quantitatively evaluate the impact of computation
on the energy consumption on ARM MPSoC platforms, exploiting CPUs, embedded
GPUs and FPGAs. One of them possibly represents the future of High Performance
Computing systems: a prototype of an Exascale supercomputer. Performance and
energy measurements are made using a state-of-the-art direct $N$-body code from
the astrophysical domain. We provide a comparison of the time-to-solution and
energy delay product metrics, for different software configurations. We have
shown that FPGA technologies can be used for application kernel acceleration
and are emerging as a promising alternative to "traditional" technologies for
HPC, which purely focus on peak-performance than on power-efficiency.
[5]
oai:arXiv.org:1904.11720 [pdf] - 1873855
Shall numerical astrophysics step into the era of Exascale computing?
Submitted: 2019-04-26
High performance computing numerical simulations are today one of the more
effective instruments to implement and study new theoretical models, and they
are mandatory during the preparatory phase and operational phase of any
scientific experiment. New challenges in Cosmology and Astrophysics will
require a large number of new extremely computationally intensive simulations
to investigate physical processes at different scales. Moreover, the size and
complexity of the new generation of observational facilities also implies a new
generation of high performance data reduction and analysis tools pushing toward
the use of Exascale computing capabilities. Exascale supercomputers cannot be
produced today. We discuss the major technological challenges in the design,
development and use of such computing capabilities and we will report on the
progresses that has been made in the last years in Europe, in particular in the
framework of the ExaNeSt European funded project. We also discuss the impact of
this new computing resources on the numerical codes in Astronomy and
Astrophysics.
[6]
oai:arXiv.org:1901.08532 [pdf] - 1989046
Direct $N$-body code on low-power embedded ARM GPUs
Submitted: 2019-01-24
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct $N$-body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.
[7]
oai:arXiv.org:1812.00367 [pdf] - 1791897
Astrophysical code migration into Exascale Era
Submitted: 2018-12-02
The ExaNeSt and EuroExa H2020 EU-funded projects aim to design and develop an
exascale ready computing platform prototype based on low-energy-consumption
ARM64 cores and FPGA accelerators. We participate in the application-driven
design of the hardware solutions and prototype validation. To carry on this
work we are using, among others, Hy-Nbody, a state-of-the-art direct N-body
code. Core algorithms of Hy-Nbody have been improved in such a way to
increasingly fit them to the exascale target platform. Waiting for the ExaNest
prototype release, we are performing tests and code tuning operations on an
ARM64 SoC facility: a SLURM managed HPC cluster based on 64-bit ARMv8
Cortex-A72/Cortex-A53 core design and powered by a Mali-T864 embedded GPU. In
parallel, we are porting a kernel of Hy-Nbody on FPGA aiming to test and
compare the performance-per-watt of our algorithms on different platforms. In
this paper we describe how we re-engineered the application and we show first
results on ARM SoC.
[8]
oai:arXiv.org:1712.00252 [pdf] - 1596487
Cosmological Simulations in Exascale Era
Submitted: 2017-12-01
The architecture of Exascale computing facilities, which involves millions of
heterogeneous processing units, will deeply impact on scientific applications.
Future astrophysical HPC applications must be designed to make such computing
systems exploitable. The ExaNeSt H2020 EU-funded project aims to design and
develop an exascale ready prototype based on low-energy-consumption ARM64 cores
and FPGA accelerators. We participate to the design of the platform and to the
validation of the prototype with cosmological N-body and hydrodynamical codes
suited to perform large-scale, high-resolution numerical simulations of cosmic
structures formation and evolution. We discuss our activities on astrophysical
applications to take advantage of the underlying architecture.
[9]
oai:arXiv.org:1610.09843 [pdf] - 1580498
Panchromatic Spectral Energy Distributions of simulated galaxies:
results at redshift $z=0$
Submitted: 2016-10-31
We present predictions of Spectral Energy Distributions (SEDs), from the UV
to the FIR, of simulated galaxies at $z=0$. These were obtained by
post-processing the results of an N-body+hydro simulation of a small
cosmological volume, that uses the Multi-Phase Particle Integrator (MUPPI) for
star formation and stellar feedback, with the GRASIL-3D radiative transfer
code, that includes reprocessing of UV light by dust. Physical properties of
galaxies resemble observed ones, though with some tension at small and large
stellar masses. Comparing predicted SEDs of simulated galaxies with different
samples of local galaxies, we find that these resemble observed ones, when
normalised at 3.6 $\mu$m. A comparison with the Herschel Reference Survey shows
that, when binning galaxies in Star Formation Rate (SFR), average SEDs are
reproduced to within a factor of $\sim2$ even in normalization, while binning
in stellar mass highlights the same tension that is present in the stellar mass
-- SFR plane. We use our sample to investigate the correlation of IR luminosity
in Spitzer and Herschel bands with several galaxy properties. SFR is the
quantity that best correlates with IR light up to $160\ \mu$m, while at longer
wavelengths better correlations are found with molecular mass and, at $500\
\mu$m, with dust mass. However, using the position of the FIR peak as a proxy
for cold dust temperature, we assess that heating of cold dust is mostly
determined by SFR, with stellar mass giving only a minor contribution. We
finally show how our sample of simulated galaxies can be used as a guide to
understand the physical properties and selection biases of observed samples.
[10]
oai:arXiv.org:1412.2883 [pdf] - 910067
Properties of barred spiral disks in hydrodynamical cosmological
simulations
Submitted: 2014-12-09
We present a quantification of the properties of bars in two N-body+SPH
cosmological simulations of spiral galaxies, named GA and AqC. The initial
conditions were obtained using the zoom-in technique and represent two dark
matter (DM) halos of $2-3\times10^{12}\ {\rm M}_\odot$, available at two
different resolutions. The resulting galaxies are presented in the companion
paper of Murante et al. (2014). We find that the GA galaxy has a bar of length
$8.8$ kpc, present at the two resolution levels even though with a slightly
different strength. Classical bar signatures (e.g. pattern of streaming
motions, high $m=2$ Fourier mode with roughly constant phase) are consistently
found at both resolutions. Though a close encounter with a merging satellite at
$z\sim0.6$ (mass ratio $1:50$) causes a strong, transient spiral pattern and
some heating of the disk, we find that bar instability is due to secular
process, caused by a low Toomre parameter $Q\lesssim1$ due to accumulation of
mass in the disk. The AqC galaxy has a slightly different history: it suffers a
similar tidal disturbance due to a merging satellite at $z\sim0.5$ but with a
mass ratio of $1:32$, that triggers a bar in the high-resolution simulation,
while at low resolution the merging is found to take place at a later time, so
that both secular evolution and merging are plausible triggers for bar
instability.
[11]
oai:arXiv.org:1411.3671 [pdf] - 898716
Simulating realistic disk galaxies with a novel sub-resolution ISM model
Submitted: 2014-11-13, last modified: 2014-11-14
We present results of cosmological simulations of disk galaxies carried out
with the GADGET-3 TreePM+SPH code, where star formation and stellar feedback
are described using our MUlti Phase Particle Integrator (MUPPI) model. This
description is based on simple multi-phase model of the interstellar medium at
unresolved scales, where mass and energy flows among the components are
explicitly followed by solving a system of ordinary differential equations.
Thermal energy from SNe is injected into the local hot phase, so as to avoid
that it is promptly radiated away. A kinetic feedback prescription generates
the massive outflows needed to avoid the over-production of stars. We use two
sets of zoomed-in initial conditions of isolated cosmological halos with masses
(2-3) * 10^{12} Msun, both available at several resolution levels. In all cases
we obtain spiral galaxies with small bulge-over-total stellar mass ratios (B/T
\approx 0.2), extended stellar and gas disks, flat rotation curves and
realistic values of stellar masses. Gas profiles are relatively flat, molecular
gas is found to dominate at the centre of galaxies, with star formation rates
following the observed Schmidt-Kennicutt relation. Stars kinematically
belonging to the bulge form early, while disk stars show a clear inside-out
formation pattern and mostly form after redshift z=2. However, the baryon
conversion efficiencies in our simulations differ from the relation given by
Moster et al. (2010) at a 3 sigma level, thus indicating that our stellar disks
are still too massive for the Dark Matter halo in which they reside. Results
are found to be remarkably stable against resolution. This further demonstrates
the feasibility of carrying out simulations producing a realistic population of
galaxies within representative cosmological volumes, at a relatively modest
resolution.