Normalized to: Bal, H.
[1]
oai:arXiv.org:1806.06606 [pdf] - 1990016
On optimising cost and value in compute systems for radio astronomy
Submitted: 2018-06-18, last modified: 2019-10-17
Large-scale science instruments, such as the distributed radio telescope
LOFAR, show that we are in an era of data-intensive scientific discovery. Such
instruments rely critically on significant computing resources, both hardware
and software, to do science. Considering limited science budgets, and the small
fraction of these that can be dedicated to compute hardware and software, there
is a strong and obvious desire for low-cost computing. However, optimising for
cost is only part of the equation; the value potential over the lifetime of the
solution should also be taken into account. Using a tangible example, compute
hardware, we introduce a conceptual model to approximate the lifetime relative
science value of such a system. While the introduced model is not intended to
result in a numeric value for merit, it does enumerate some components that
define this metric. The intent of this paper is to show how compute system
related design and procurement decisions in data-intensive science projects
should be weighed and valued. By using both total cost and science value as a
driver, the science output per invested Euro is maximised. With a number of
case studies, focused on computing applications in radio astronomy past,
present and future, we show that the hardware-based analysis can be, and has
been, applied more broadly.
[2]
oai:arXiv.org:1601.05052 [pdf] - 1343114
Auto-Tuning Dedispersion for Many-Core Accelerators
Submitted: 2016-01-18
In this paper, we study the parallelization of the dedispersion algorithm on
many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon
Phi. An important contribution is the computational analysis of the algorithm,
from which we conclude that dedispersion is inherently memory-bound in any
realistic scenario, in contrast to earlier reports. We also provide empirical
proof that, even in unrealistic scenarios, hardware limitations keep the
arithmetic intensity low, thus limiting performance. We exploit auto-tuning to
adapt the algorithm, not only to different accelerators, but also to different
observations, and even telescopes. Our experiments show how the algorithm is
tuned automatically for different scenarios and how it exploits and highlights
the underlying specificities of the hardware: in some observations, the tuner
automatically optimizes device occupancy, while in others it optimizes memory
bandwidth. We quantitatively analyze the problem space, and by comparing the
results of optimal auto-tuned versions against the best performing fixed codes,
we show the impact that auto-tuning has on performance, and conclude that it is
statistically relevant.
[3]
oai:arXiv.org:1601.01165 [pdf] - 1336516
Real-Time Dedispersion for Fast Radio Transient Surveys, using Auto
Tuning on Many-Core Accelerators
Submitted: 2016-01-06
Dedispersion, the removal of deleterious smearing of impulsive signals by the
interstellar matter, is one of the most intensive processing steps in any radio
survey for pulsars and fast transients. We here present a study of the
parallelization of this algorithm on many-core accelerators, including GPUs
from AMD and NVIDIA, and the Intel Xeon Phi. We find that dedispersion is
inherently memory-bound. Even in a perfect scenario, hardware limitations keep
the arithmetic intensity low, thus limiting performance. We next exploit
auto-tuning to adapt dedispersion to different accelerators, observations, and
even telescopes. We demonstrate that the optimal settings differ between
observational setups, and that auto-tuning significantly improves performance.
This impacts time-domain surveys from Apertif to SKA.
[4]
oai:arXiv.org:1203.0321 [pdf] - 483444
High-Performance Distributed Multi-Model / Multi-Kernel Simulations: A
Case-Study in Jungle Computing
Submitted: 2012-03-01
High-performance scientific applications require more and more compute power.
The concurrent use of multiple distributed compute resources is vital for
making scientific progress. The resulting distributed system, a so-called
Jungle Computing System, is both highly heterogeneous and hierarchical,
potentially consisting of grids, clouds, stand-alone machines, clusters,
desktop grids, mobile devices, and supercomputers, possibly with accelerators
such as GPUs.
One striking example of applications that can benefit greatly of Jungle
Computing Systems are Multi-Model / Multi-Kernel simulations. In these
simulations, multiple models, possibly implemented using different techniques
and programming models, are coupled into a single simulation of a physical
system. Examples include the domain of computational astrophysics and climate
modeling.
In this paper we investigate the use of Jungle Computing Systems for such
Multi-Model / Multi-Kernel simulations. We make use of the software developed
in the Ibis project, which addresses many of the problems faced when running
applications on Jungle Computing Systems. We create a prototype Jungle-aware
version of AMUSE, an astrophysical simulation framework. We show preliminary
experiments with the resulting system, using clusters, grids, stand-alone
machines, and GPUs.