Normalized to: Adámek, K.
[1]
oai:arXiv.org:1912.07704 [pdf] - 2032882
Development of production-ready GPU data processing pipeline software
for AstroAccelerate
Submitted: 2019-12-16, last modified: 2020-01-16
Upcoming large scale telescope projects such as the Square Kilometre Array
(SKA) will see high data rates and large data volumes; requiring tools that can
analyse telescope event data quickly and accurately. In modern radio
telescopes, analysis software forms a core part of the data read out, and
long-term software stability and maintainability are essential. AstroAccelerate
is a many core accelerated software package that uses NVIDIA(R) GPUs to perform
realtime analysis of radio telescope data, and it has been shown to be
substantially faster than realtime at processing simulated SKA-like data.
AstroAccelerate contains optimised GPU implementations of signal processing
tools used in radio astronomy including dedispersion, Fourier domain
acceleration search, single pulse detection, and others. This article describes
the transformation of AstroAccelerate from a C-like prototype code to a
production-ready software library with a C++ API and a Python interface; while
preserving compatibility with legacy software that is implemented in C. The
design of the software library interfaces, refactoring aspects, and coding
techniques are discussed.
[2]
oai:arXiv.org:1911.01353 [pdf] - 1990479
Searching for pulsars in extreme orbits -- GPU acceleration of the
Fourier domain 'jerk' search
Submitted: 2019-11-04
Binary pulsars are an important target for radio surveys because they present
a natural laboratory for a wide range of astrophysics for example testing
general relativity, including detection of gravitational waves. The orbital
motion of a pulsar which is locked in a binary system causes a frequency shift
(a Doppler shift) in their normally very periodic pulse emissions. These shifts
cause a reduction in the sensitivity of traditional periodicity searches. To
correct this smearing Ransom [2001], Ransom et al. [2002] developed the Fourier
domain acceleration search (FDAS) which uses a matched filtering technique.
This method is however limited to a constant pulsar acceleration. Therefore,
Andersen and Ransom [2018] broadened the Fourier domain acceleration search to
account also for a linear change in the acceleration by implementing the
Fourier domain "jerk" search into the PRESTO software package. This extension
increases the number of matched filters used significantly. We have implemented
the Fourier domain "jerk" search (JERK) on GPUs using CUDA. We have achieved
90x performance increase when compared to the parallel implementation of JERK
in PRESTO. This work is part of the AstroAccelerate project Armour et al.
[2019], a many-core accelerated time-domain signal processing library for radio
astronomy.
[3]
oai:arXiv.org:1812.02647 [pdf] - 1793668
A GPU implementation of the harmonic sum algorithm
Submitted: 2018-12-06
Time-domain radio astronomy utilizes a harmonic sum algorithm as part of the
Fourier domain periodicity search, this type of search is used to discover
single pulsars. The harmonic sum algorithm is also used as part of the Fourier
domain acceleration search which aims to discover pulsars that are locked in
orbit around another pulsar or compact object. However porting the harmonic sum
to many-core architectures like GPUs is not a straightforward task. The main
problem that must be overcome is the very unfavourable memory access pattern,
which gets worse as the dimensionality of the harmonic sum increases. We
present a set of algorithms for calculating the harmonic sum that are more
suited to many-core architectures such as GPUs. We present an evaluation of the
sensitivity of these different approaches, and their performance. This work
forms part of the AstroAccelerate project which is a GPU accelerated software
package for processing time-domain radio astronomy data.
[4]
oai:arXiv.org:1711.10855 [pdf] - 1595318
Improved Acceleration of the GPU Fourier Domain Acceleration Search
Algorithm
Submitted: 2017-11-29
We present an improvement of our implementation of the Correlation Technique
for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics
Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new
improved convolution code which uses our custom GPU FFT code is between 2.5 and
3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100)
and allows for a wider range of filter sizes then our previous version. By
using this new version of our convolution code in FDAS we have achieved 44%
performance increase over our previous best implementation. It is also
approximately 8 times faster than the existing PRESTO GPU implementation of
FDAS (Luo 2013). This work is part of the AstroAccelerate project (Armour et
al. 2002), a many-core accelerated time-domain signal processing library for
radio astronomy.
[5]
oai:arXiv.org:1611.09704 [pdf] - 1523952
A Real-time Single Pulse Detection Algorithm for GPUs
Submitted: 2016-11-29
The detection of non-repeating events in the radio spectrum has become an
important area of study in radio astronomy over the last decade due to the
discovery of fast radio bursts (FRBs). We have implemented a single pulse
detection algorithm, for NVIDIA GPUs, which use boxcar filters of varying
widths. Our code performs the calculation of standard deviation, matched
filtering by using boxcar filters and thresholding based on the signal-to-noise
ratio. We present our parallel implementation of our single pulse detection
algorithm. Our GPU algorithm is approximately 17x faster than our current CPU
OpenMP code (NVIDIA Titan XP vs Intel E5-2650v3). This code is part of the
AstroAccelerate project which is a many-core accelerated time-domain signal
processing code for radio astronomy. This work allows our AstroAccelerate code
to perform a single pulse search on SKA-like data 4.3x faster than real-time.
[6]
oai:arXiv.org:1611.06087 [pdf] - 1532708
Constraining models of twin peak quasi-periodic oscillations with
realistic neutron star equations of state
Török, Gabriel;
Goluchová, Kateřina;
Urbanec, Martin;
Šrámková, Eva;
Adámek, Karel;
Urbancová, Gabriela;
Pecháček, Tomáš;
Bakala, Pavel;
Stuchlík, Zdeněk;
Horák, Jiří;
Juryšek, Jakub
Submitted: 2016-11-18
Twin-peak quasi-periodic oscillations (QPOs) are observed in the X-ray
power-density spectra of several accreting low-mass neutron star (NS) binaries.
In our previous work we have considered several QPO models. We have identified
and explored mass-angular-momentum relations implied by individual QPO models
for the atoll source 4U 1636-53. In this paper we extend our study and confront
QPO models with various NS equations of state (EoS). We start with simplified
calculations assuming Kerr background geometry and then present results of
detailed calculations considering the influence of NS quadrupole moment
(related to rotationally induced NS oblateness) assuming Hartle-Thorne
spacetimes. We show that the application of concrete EoS together with a
particular QPO model yields a specific mass-angular-momentum relation. However,
we demonstrate that the degeneracy in mass and angular momentum can be removed
when the NS spin frequency inferred from the X-ray burst observations is
considered. We inspect a large set of EoS and discuss their compatibility with
the considered QPO models. We conclude that when the NS spin frequency in 4U
1636-53 is close to 580Hz we can exclude 51 from 90 of the considered
combinations of EoS and QPO models. We also discuss additional restrictions
that may exclude even more combinations. Namely, there are 13 EOS compatible
with the observed twin peak QPOs and the relativistic precession model.
However, when considering the low frequency QPOs and Lense-Thirring precession,
only 5 EOS are compatible with the model.
[7]
oai:arXiv.org:1511.03599 [pdf] - 1394532
A polyphase filter for many-core architectures
Submitted: 2015-11-11, last modified: 2016-04-21
In this article we discuss our implementation of a polyphase filter for
real-time data processing in radio astronomy. We describe in detail our
implementation of the polyphase filter algorithm and its behaviour on three
generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi
(Knights Corner) platforms. All of our implementations aim to exploit the
potential for data reuse that the algorithm offers. Our GPU implementations
explore two different methods for achieving this, the first makes use of
L1/Texture cache, the second uses shared memory. We discuss the usability of
each of our implementations along with their behaviours. We measure performance
in execution time, which is a critical factor for real-time systems, we also
present results in terms of bandwidth (GB/s), compute (GFlop/s) and type
conversions (GTc/s). We include a presentation of our results in terms of the
sample rate which can be processed in real-time by a chosen platform, which
more intuitively describes the expected performance in a signal processing
setting. Our findings show that, for the GPUs considered, the performance of
our polyphase filter when using lower precision input data is limited by type
conversions rather than device bandwidth. We compare these results to an
implementation on the Xeon Phi. We show that our Xeon Phi implementation has a
performance that is 1.47x to 1.95x greater than our CPU implementation, however
is not insufficient to compete with the performance of GPUs. We conclude with a
comparison of our best performing code to two other implementations of the
polyphase filter, showing that our implementation is faster in nearly all
cases. This work forms part of the Astro-Accelerate project, a many-core
accelerated real-time data processing library for digital signal processing of
time-domain radio astronomy data.