Full-text search for arXiv

Bal, Henri E.

Normalized to: Bal, H.

4 article(s) in total. 14 co-authors, from 1 to 3 common article(s). Median position in authors list is 3,5.

[1] oai:arXiv.org:1806.06606 [pdf] - 1990016

On optimising cost and value in compute systems for radio astronomy

Broekema, P. Chris; Allan, Verity L.; van Nieuwpoort, Rob V.; Bal, Henri E.

Comments:

Submitted: 2018-06-18, last modified: 2019-10-17

Large-scale science instruments, such as the distributed radio telescope LOFAR, show that we are in an era of data-intensive scientific discovery. Such instruments rely critically on significant computing resources, both hardware and software, to do science. Considering limited science budgets, and the small fraction of these that can be dedicated to compute hardware and software, there is a strong and obvious desire for low-cost computing. However, optimising for cost is only part of the equation; the value potential over the lifetime of the solution should also be taken into account. Using a tangible example, compute hardware, we introduce a conceptual model to approximate the lifetime relative science value of such a system. While the introduced model is not intended to result in a numeric value for merit, it does enumerate some components that define this metric. The intent of this paper is to show how compute system related design and procurement decisions in data-intensive science projects should be weighed and valued. By using both total cost and science value as a driver, the science output per invested Euro is maximised. With a number of case studies, focused on computing applications in radio astronomy past, present and future, we show that the hardware-based analysis can be, and has been, applied more broadly.

[2] oai:arXiv.org:1601.05052 [pdf] - 1343114

Auto-Tuning Dedispersion for Many-Core Accelerators

Sclocco, Alessio; Bal, Henri E.; Hessels, Jason; van Leeuwen, Joeri; van Nieuwpoort, Rob V.

Comments: 10 pages, published in the proceedings of IPDPS 2014

Submitted: 2016-01-18

In this paper, we study the parallelization of the dedispersion algorithm on many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon Phi. An important contribution is the computational analysis of the algorithm, from which we conclude that dedispersion is inherently memory-bound in any realistic scenario, in contrast to earlier reports. We also provide empirical proof that, even in unrealistic scenarios, hardware limitations keep the arithmetic intensity low, thus limiting performance. We exploit auto-tuning to adapt the algorithm, not only to different accelerators, but also to different observations, and even telescopes. Our experiments show how the algorithm is tuned automatically for different scenarios and how it exploits and highlights the underlying specificities of the hardware: in some observations, the tuner automatically optimizes device occupancy, while in others it optimizes memory bandwidth. We quantitatively analyze the problem space, and by comparing the results of optimal auto-tuned versions against the best performing fixed codes, we show the impact that auto-tuning has on performance, and conclude that it is statistically relevant.

[3] oai:arXiv.org:1601.01165 [pdf] - 1336516

Real-Time Dedispersion for Fast Radio Transient Surveys, using Auto Tuning on Many-Core Accelerators

Sclocco, Alessio; van Leeuwen, Joeri; Bal, Henri E.; van Nieuwpoort, Rob V.

Comments: 8 pages, accepted for publication in Astronomy and Computing

Submitted: 2016-01-06

Dedispersion, the removal of deleterious smearing of impulsive signals by the interstellar matter, is one of the most intensive processing steps in any radio survey for pulsars and fast transients. We here present a study of the parallelization of this algorithm on many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon Phi. We find that dedispersion is inherently memory-bound. Even in a perfect scenario, hardware limitations keep the arithmetic intensity low, thus limiting performance. We next exploit auto-tuning to adapt dedispersion to different accelerators, observations, and even telescopes. We demonstrate that the optimal settings differ between observational setups, and that auto-tuning significantly improves performance. This impacts time-domain surveys from Apertif to SKA.

[4] oai:arXiv.org:1203.0321 [pdf] - 483444

High-Performance Distributed Multi-Model / Multi-Kernel Simulations: A Case-Study in Jungle Computing

Drost, Niels; Maassen, Jason; van Meersbergen, Maarten A. J.; Bal, Henri E.; Pelupessy, F. Inti; Zwart, Simon Portegies; Kliphuis, Michael; Dijkstra, Henk A.; Seinstra, Frank J.

Comments:

Submitted: 2012-03-01

High-performance scientific applications require more and more compute power. The concurrent use of multiple distributed compute resources is vital for making scientific progress. The resulting distributed system, a so-called Jungle Computing System, is both highly heterogeneous and hierarchical, potentially consisting of grids, clouds, stand-alone machines, clusters, desktop grids, mobile devices, and supercomputers, possibly with accelerators such as GPUs. One striking example of applications that can benefit greatly of Jungle Computing Systems are Multi-Model / Multi-Kernel simulations. In these simulations, multiple models, possibly implemented using different techniques and programming models, are coupled into a single simulation of a physical system. Examples include the domain of computational astrophysics and climate modeling. In this paper we investigate the use of Jungle Computing Systems for such Multi-Model / Multi-Kernel simulations. We make use of the software developed in the Ibis project, which addresses many of the problems faced when running applications on Jungle Computing Systems. We create a prototype Jungle-aware version of AMUSE, an astrophysical simulation framework. We show preliminary experiments with the resulting system, using clusters, grids, stand-alone machines, and GPUs.