Full-text search for arXiv

Adámek, Karel

Normalized to: Adámek, K.

7 article(s) in total. 15 co-authors, from 1 to 6 common article(s). Median position in authors list is 1,0.

[1] oai:arXiv.org:1912.07704 [pdf] - 2032882

Development of production-ready GPU data processing pipeline software for AstroAccelerate

Carels, Cees; Adámek, Karel; Novotný, Jan; Armour, Wesley

Comments: 4 pages, 2 figures. ASP Conference Series in preparation. Credit is hereby given to the ASP Conference Series. To appear in proceedings of The Annual Conference on Astronomical Data Analysis and Software Systems (ADASS) 2019

Submitted: 2019-12-16, last modified: 2020-01-16

Upcoming large scale telescope projects such as the Square Kilometre Array (SKA) will see high data rates and large data volumes; requiring tools that can analyse telescope event data quickly and accurately. In modern radio telescopes, analysis software forms a core part of the data read out, and long-term software stability and maintainability are essential. AstroAccelerate is a many core accelerated software package that uses NVIDIA(R) GPUs to perform realtime analysis of radio telescope data, and it has been shown to be substantially faster than realtime at processing simulated SKA-like data. AstroAccelerate contains optimised GPU implementations of signal processing tools used in radio astronomy including dedispersion, Fourier domain acceleration search, single pulse detection, and others. This article describes the transformation of AstroAccelerate from a C-like prototype code to a production-ready software library with a C++ API and a Python interface; while preserving compatibility with legacy software that is implemented in C. The design of the software library interfaces, refactoring aspects, and coding techniques are discussed.

[2] oai:arXiv.org:1911.01353 [pdf] - 1990479

Searching for pulsars in extreme orbits -- GPU acceleration of the Fourier domain 'jerk' search

Adámek, Karel; Novotný, Jan; Dimoudi, Sofia; Armour, Wesley

Comments: Proceedings of ADASSXXIX (2019)

Submitted: 2019-11-04

Binary pulsars are an important target for radio surveys because they present a natural laboratory for a wide range of astrophysics for example testing general relativity, including detection of gravitational waves. The orbital motion of a pulsar which is locked in a binary system causes a frequency shift (a Doppler shift) in their normally very periodic pulse emissions. These shifts cause a reduction in the sensitivity of traditional periodicity searches. To correct this smearing Ransom [2001], Ransom et al. [2002] developed the Fourier domain acceleration search (FDAS) which uses a matched filtering technique. This method is however limited to a constant pulsar acceleration. Therefore, Andersen and Ransom [2018] broadened the Fourier domain acceleration search to account also for a linear change in the acceleration by implementing the Fourier domain "jerk" search into the PRESTO software package. This extension increases the number of matched filters used significantly. We have implemented the Fourier domain "jerk" search (JERK) on GPUs using CUDA. We have achieved 90x performance increase when compared to the parallel implementation of JERK in PRESTO. This work is part of the AstroAccelerate project Armour et al. [2019], a many-core accelerated time-domain signal processing library for radio astronomy.

[3] oai:arXiv.org:1812.02647 [pdf] - 1793668

A GPU implementation of the harmonic sum algorithm

Adámek, Karel; Armour, Wesley

Comments: 4 pages, 2 figures

Submitted: 2018-12-06

Time-domain radio astronomy utilizes a harmonic sum algorithm as part of the Fourier domain periodicity search, this type of search is used to discover single pulsars. The harmonic sum algorithm is also used as part of the Fourier domain acceleration search which aims to discover pulsars that are locked in orbit around another pulsar or compact object. However porting the harmonic sum to many-core architectures like GPUs is not a straightforward task. The main problem that must be overcome is the very unfavourable memory access pattern, which gets worse as the dimensionality of the harmonic sum increases. We present a set of algorithms for calculating the harmonic sum that are more suited to many-core architectures such as GPUs. We present an evaluation of the sensitivity of these different approaches, and their performance. This work forms part of the AstroAccelerate project which is a GPU accelerated software package for processing time-domain radio astronomy data.

[4] oai:arXiv.org:1711.10855 [pdf] - 1595318

Improved Acceleration of the GPU Fourier Domain Acceleration Search Algorithm

Adámek, Karel; Dimoudi, Sofia; Giles, Mike; Armour, Wesley

Comments: proceeding from ADASS XXVII conference, 4 pages

Submitted: 2017-11-29

We present an improvement of our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new improved convolution code which uses our custom GPU FFT code is between 2.5 and 3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100) and allows for a wider range of filter sizes then our previous version. By using this new version of our convolution code in FDAS we have achieved 44% performance increase over our previous best implementation. It is also approximately 8 times faster than the existing PRESTO GPU implementation of FDAS (Luo 2013). This work is part of the AstroAccelerate project (Armour et al. 2002), a many-core accelerated time-domain signal processing library for radio astronomy.

[5] oai:arXiv.org:1611.09704 [pdf] - 1523952

A Real-time Single Pulse Detection Algorithm for GPUs

Adámek, Karel; Armour, Wesley

Comments: Proceedings from ADASSXXVI, Trieste, Italy

Submitted: 2016-11-29

The detection of non-repeating events in the radio spectrum has become an important area of study in radio astronomy over the last decade due to the discovery of fast radio bursts (FRBs). We have implemented a single pulse detection algorithm, for NVIDIA GPUs, which use boxcar filters of varying widths. Our code performs the calculation of standard deviation, matched filtering by using boxcar filters and thresholding based on the signal-to-noise ratio. We present our parallel implementation of our single pulse detection algorithm. Our GPU algorithm is approximately 17x faster than our current CPU OpenMP code (NVIDIA Titan XP vs Intel E5-2650v3). This code is part of the AstroAccelerate project which is a many-core accelerated time-domain signal processing code for radio astronomy. This work allows our AstroAccelerate code to perform a single pulse search on SKA-like data 4.3x faster than real-time.

[6] oai:arXiv.org:1611.06087 [pdf] - 1532708

Constraining models of twin peak quasi-periodic oscillations with realistic neutron star equations of state

Török, Gabriel; Goluchová, Kateřina; Urbanec, Martin; Šrámková, Eva; Adámek, Karel; Urbancová, Gabriela; Pecháček, Tomáš; Bakala, Pavel; Stuchlík, Zdeněk; Horák, Jiří; Juryšek, Jakub

Comments: 12 pages, 9 figures, 3 tables, accepted for publication in The Astrophysical Journal

Submitted: 2016-11-18

Twin-peak quasi-periodic oscillations (QPOs) are observed in the X-ray power-density spectra of several accreting low-mass neutron star (NS) binaries. In our previous work we have considered several QPO models. We have identified and explored mass-angular-momentum relations implied by individual QPO models for the atoll source 4U 1636-53. In this paper we extend our study and confront QPO models with various NS equations of state (EoS). We start with simplified calculations assuming Kerr background geometry and then present results of detailed calculations considering the influence of NS quadrupole moment (related to rotationally induced NS oblateness) assuming Hartle-Thorne spacetimes. We show that the application of concrete EoS together with a particular QPO model yields a specific mass-angular-momentum relation. However, we demonstrate that the degeneracy in mass and angular momentum can be removed when the NS spin frequency inferred from the X-ray burst observations is considered. We inspect a large set of EoS and discuss their compatibility with the considered QPO models. We conclude that when the NS spin frequency in 4U 1636-53 is close to 580Hz we can exclude 51 from 90 of the considered combinations of EoS and QPO models. We also discuss additional restrictions that may exclude even more combinations. Namely, there are 13 EOS compatible with the observed twin peak QPOs and the relativistic precession model. However, when considering the low frequency QPOs and Lense-Thirring precession, only 5 EOS are compatible with the model.

[7] oai:arXiv.org:1511.03599 [pdf] - 1394532

A polyphase filter for many-core architectures

Adámek, Karel; Novotný, Jan; Armour, Wes

Comments: 19 pages, 20 figures, 5 tables

Submitted: 2015-11-11, last modified: 2016-04-21

In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFlop/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.47x to 1.95x greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.