Normalized to: Roffler, C.
[1]
oai:arXiv.org:2003.10850 [pdf] - 2069198
Gadget3 on GPUs with OpenACC
Submitted: 2020-03-24
We present preliminary results of a GPU porting of all main Gadget3 modules
(gravity computation, SPH density computation, SPH hydrodynamic force, and
thermal conduction) using OpenACC directives. Here we assign one GPU to each
MPI rank and exploit both the host and accellerator capabilities by overlapping
computations on the CPUs and GPUs: while GPUs asynchronously compute
interactions between particles within their MPI ranks, CPUs perform tree-walks
and MPI communications of neighbouring particles. We profile various portions
of the code to understand the origin of our speedup, where we find that a peak
speedup is not achieved because of time-steps with few active particles. We run
a hydrodynamic cosmological simulation from the Magneticum project, with
$2\cdot10^{7}$ particles, where we find a final total speedup of $\approx 2.$
We also present the results of an encouraging scaling test of a preliminary
gravity-only OpenACC porting, run in the context of the EuroHack17 event, where
the prototype of the porting proved to keep a constant speedup up to $1024$
GPUs.