Normalized to: Novotný, J.
[1]
oai:arXiv.org:1912.07704 [pdf] - 2032882
Development of production-ready GPU data processing pipeline software
for AstroAccelerate
Submitted: 2019-12-16, last modified: 2020-01-16
Upcoming large scale telescope projects such as the Square Kilometre Array
(SKA) will see high data rates and large data volumes; requiring tools that can
analyse telescope event data quickly and accurately. In modern radio
telescopes, analysis software forms a core part of the data read out, and
long-term software stability and maintainability are essential. AstroAccelerate
is a many core accelerated software package that uses NVIDIA(R) GPUs to perform
realtime analysis of radio telescope data, and it has been shown to be
substantially faster than realtime at processing simulated SKA-like data.
AstroAccelerate contains optimised GPU implementations of signal processing
tools used in radio astronomy including dedispersion, Fourier domain
acceleration search, single pulse detection, and others. This article describes
the transformation of AstroAccelerate from a C-like prototype code to a
production-ready software library with a C++ API and a Python interface; while
preserving compatibility with legacy software that is implemented in C. The
design of the software library interfaces, refactoring aspects, and coding
techniques are discussed.
[2]
oai:arXiv.org:1911.01353 [pdf] - 1990479
Searching for pulsars in extreme orbits -- GPU acceleration of the
Fourier domain 'jerk' search
Submitted: 2019-11-04
Binary pulsars are an important target for radio surveys because they present
a natural laboratory for a wide range of astrophysics for example testing
general relativity, including detection of gravitational waves. The orbital
motion of a pulsar which is locked in a binary system causes a frequency shift
(a Doppler shift) in their normally very periodic pulse emissions. These shifts
cause a reduction in the sensitivity of traditional periodicity searches. To
correct this smearing Ransom [2001], Ransom et al. [2002] developed the Fourier
domain acceleration search (FDAS) which uses a matched filtering technique.
This method is however limited to a constant pulsar acceleration. Therefore,
Andersen and Ransom [2018] broadened the Fourier domain acceleration search to
account also for a linear change in the acceleration by implementing the
Fourier domain "jerk" search into the PRESTO software package. This extension
increases the number of matched filters used significantly. We have implemented
the Fourier domain "jerk" search (JERK) on GPUs using CUDA. We have achieved
90x performance increase when compared to the parallel implementation of JERK
in PRESTO. This work is part of the AstroAccelerate project Armour et al.
[2019], a many-core accelerated time-domain signal processing library for radio
astronomy.
[3]
oai:arXiv.org:1611.05327 [pdf] - 1515844
General relativistic polytropes with a repulsive cosmological constant
Submitted: 2016-11-15
Spherically symmetric equilibrium configurations of perfect fluid obeying a
polytropic equation of state are studied in spacetimes with a repulsive
cosmological constant. The configurations are specified in terms of three
parameters---the polytropic index $n$, the ratio of central pressure and
central energy density of matter $\sigma$, and the ratio of energy density of
vacuum and central density of matter $\lambda$. The static equilibrium
configurations are determined by two coupled first-order nonlinear differential
equations that are solved by numerical methods with the exception of polytropes
with $n=0$ corresponding to the configurations with a uniform distribution of
energy density, when the solution is given in terms of elementary functions.
The geometry of the polytropes is conveniently represented by embedding
diagrams of both the ordinary space geometry and the optical reference geometry
reflecting some dynamical properties of the geodesic motion. The polytropes are
represented by radial profiles of energy density, pressure, mass, and metric
coefficients. For all tested values of $n>0$, the static equilibrium
configurations with fixed parameters $n$, $\sigma$, are allowed only up to a
critical value of the cosmological parameter
$\lambda_{\mathrm{c}}=\lambda_{\mathrm{c}}(n,\sigma)$. In the case of $n>3$,
the critical value $\lambda_{\mathrm{c}}$ tends to zero for special values of
$\sigma$. The gravitational potential energy and the binding energy of the
polytropes are determined and studied by numerical methods. We discuss in
detail the polytropes with an extension comparable to those of the dark matter
halos related to galaxies, i.e., with extension $\ell > 100\,\mathrm{kpc}$ and
mass $M > 10^{12}\,\mathrm{M}_{\odot}$. ...
[4]
oai:arXiv.org:1511.03599 [pdf] - 1394532
A polyphase filter for many-core architectures
Submitted: 2015-11-11, last modified: 2016-04-21
In this article we discuss our implementation of a polyphase filter for
real-time data processing in radio astronomy. We describe in detail our
implementation of the polyphase filter algorithm and its behaviour on three
generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi
(Knights Corner) platforms. All of our implementations aim to exploit the
potential for data reuse that the algorithm offers. Our GPU implementations
explore two different methods for achieving this, the first makes use of
L1/Texture cache, the second uses shared memory. We discuss the usability of
each of our implementations along with their behaviours. We measure performance
in execution time, which is a critical factor for real-time systems, we also
present results in terms of bandwidth (GB/s), compute (GFlop/s) and type
conversions (GTc/s). We include a presentation of our results in terms of the
sample rate which can be processed in real-time by a chosen platform, which
more intuitively describes the expected performance in a signal processing
setting. Our findings show that, for the GPUs considered, the performance of
our polyphase filter when using lower precision input data is limited by type
conversions rather than device bandwidth. We compare these results to an
implementation on the Xeon Phi. We show that our Xeon Phi implementation has a
performance that is 1.47x to 1.95x greater than our CPU implementation, however
is not insufficient to compete with the performance of GPUs. We conclude with a
comparison of our best performing code to two other implementations of the
polyphase filter, showing that our implementation is faster in nearly all
cases. This work forms part of the Astro-Accelerate project, a many-core
accelerated real-time data processing library for digital signal processing of
time-domain radio astronomy data.