Normalized to: Barsdell, B.
[1]
oai:arXiv.org:1709.09313 [pdf] - 1682470
Design and characterization of the Large-Aperture Experiment to Detect
the Dark Age (LEDA) radiometer systems
Price, D. C.;
Greenhill, L. J.;
Fialkov, A.;
Bernardi, G.;
Garsden, H.;
Barsdell, B. R.;
Kocz, J.;
Anderson, M. M.;
Bourke, S. A.;
Craig, J.;
Dexter, M. R.;
Dowell, J.;
Eastwood, M. W.;
Eftekhari, T.;
Ellingson, S. W.;
Hallinan, G.;
Hartman, J. M.;
Kimberk, R.;
Lazio, T. J. W.;
Leiker, S.;
MacMahon, D.;
Monroe, R.;
Schinzel, F.;
Taylor, G. B.;
Tong, E.;
Werthimer, D.;
Woody, D. P.
Submitted: 2017-09-26, last modified: 2018-05-15
The Large-Aperture Experiment to Detect the Dark Age (LEDA) was designed to
detect the predicted O(100)mK sky-averaged absorption of the Cosmic Microwave
Background by Hydrogen in the neutral pre- and intergalactic medium just after
the cosmological Dark Age. The spectral signature would be associated with
emergence of a diffuse Ly$\alpha$ background from starlight during 'Cosmic
Dawn'. Recently, Bowman et al. (2018) have reported detection of this predicted
absorption feature, with an unexpectedly large amplitude of 530 mK, centered at
78 MHz. Verification of this result by an independent experiment, such as LEDA,
is pressing. In this paper, we detail design and characterization of the LEDA
radiometer systems, and a first-generation pipeline that instantiates a signal
path model. Sited at the Owens Valley Radio Observatory Long Wavelength Array,
LEDA systems include the station correlator, five well-separated redundant dual
polarization radiometers and backend electronics. The radiometers deliver a
30-85MHz band (16<z<34) and operate as part of the larger interferometric
array, for purposes ultimately of in situ calibration. Here, we report on the
LEDA system design, calibration approach, and progress in characterization as
of January 2016. The LEDA systems are currently being modified to improve
performance near 78 MHz in order to verify the purported absorption feature.
[2]
oai:arXiv.org:1711.00466 [pdf] - 1716958
The Radio Sky at Meter Wavelengths: m-Mode Analysis Imaging with the
Owens Valley Long Wavelength Array
Eastwood, Michael W.;
Anderson, Marin M.;
Monroe, Ryan M.;
Hallinan, Gregg;
Barsdell, Benjamin R.;
Bourke, Stephen A.;
Clark, M. A.;
Ellingson, Steven W.;
Dowell, Jayce;
Garsden, Hugh;
Greenhill, Lincoln J.;
Hartman, Jacob M.;
Kocz, Jonathon;
Lazio, T. Joseph W.;
Price, Danny C.;
Schinzel, Frank K.;
Taylor, Gregory B.;
Vedantham, Harish K.;
Wang, Yuankun;
Woody, David P.
Submitted: 2017-10-26
A host of new low-frequency radio telescopes seek to measure the 21-cm
transition of neutral hydrogen from the early universe. These telescopes have
the potential to directly probe star and galaxy formation at redshifts $20
\gtrsim z \gtrsim 7$, but are limited by the dynamic range they can achieve
against foreground sources of low-frequency radio emission. Consequently, there
is a growing demand for modern, high-fidelity maps of the sky at frequencies
below 200 MHz for use in foreground modeling and removal. We describe a new
widefield imaging technique for drift-scanning interferometers,
Tikhonov-regularized $m$-mode analysis imaging. This technique constructs
images of the entire sky in a single synthesis imaging step with exact
treatment of widefield effects. We describe how the CLEAN algorithm can be
adapted to deconvolve maps generated by $m$-mode analysis imaging. We
demonstrate Tikhonov-regularized $m$-mode analysis imaging using the Owens
Valley Long Wavelength Array (OVRO-LWA) by generating 8 new maps of the sky
north of $\delta=-30^\circ$ with 15 arcmin angular resolution, at frequencies
evenly spaced between 36.528 MHz and 73.152 MHz, and $\sim$800 mJy/beam thermal
noise. These maps are a 10-fold improvement in angular resolution over existing
full-sky maps at comparable frequencies, which have angular resolutions $\ge
2^\circ$. Each map is constructed exclusively from interferometric observations
and does not represent the globally averaged sky brightness. Future
improvements will incorporate total power radiometry, improved thermal noise,
and improved angular resolution -- due to the planned expansion of the OVRO-LWA
to 2.6 km baselines. These maps serve as a first step on the path to the use of
more sophisticated foreground filters in 21-cm cosmology incorporating the
measured angular and frequency structure of all foreground contaminants.
[3]
oai:arXiv.org:1708.00720 [pdf] - 1586679
Bifrost: a Python/C++ Framework for High-Throughput Stream Processing in
Astronomy
Cranmer, Miles D.;
Barsdell, Benjamin R.;
Price, Danny C.;
Dowell, Jayce;
Garsden, Hugh;
Dike, Veronica;
Eftekhari, Tarraneh;
Hegedus, Alexander M.;
Malins, Joseph;
Obenberger, Kenneth S.;
Schinzel, Frank;
Stovall, Kevin;
Taylor, Gregory B.;
Greenhill, Lincoln J.
Submitted: 2017-08-02
Radio astronomy observatories with high throughput back end instruments
require real-time data processing. While computing hardware continues to
advance rapidly, development of real-time processing pipelines remains
difficult and time-consuming, which can limit scientific productivity.
Motivated by this, we have developed Bifrost: an open-source software framework
for rapid pipeline development. Bifrost combines a high-level Python interface
with highly efficient reconfigurable data transport and a library of computing
blocks for CPU and GPU processing. The framework is generalizable, but
initially it emphasizes the needs of high-throughput radio astronomy pipelines,
such as the ability to process data buffers as if they were continuous streams,
the capacity to partition processing into distinct data sequences (e.g.,
separate observations), and the ability to extract specific intervals from
buffered data. Computing blocks in the library are designed for applications
such as interferometry, pulsar dedispersion and timing, and transient search
pipelines. We describe the design and implementation of the Bifrost framework
and demonstrate its use as the backbone in the correlation and beamforming back
end of the Long Wavelength Array station in the Sevilleta National Wildlife
Refuge, NM.
[4]
oai:arXiv.org:1505.06421 [pdf] - 1296158
HDFITS: porting the FITS data model to HDF5
Submitted: 2015-05-24, last modified: 2015-10-20
The FITS (Flexible Image Transport System) data format has been the de facto
data format for astronomy-related data products since its inception in the late
1970s. While the FITS file format is widely supported, it lacks many of the
features of more modern data serialization, such as the Hierarchical Data
Format (HDF5). The HDF5 file format offers considerable advantages over FITS,
such as improved I/O speed and compression, but has yet to gain widespread
adoption within astronomy. One of the major holdbacks is that HDF5 is not well
supported by data reduction software packages and image viewers. Here, we
present a comparison of FITS and HDF5 as a format for storage of astronomy
datasets. We show that the underlying data model of FITS can be ported to HDF5
in a straightforward manner, and that by doing so the advantages of the HDF5
file format can be leveraged immediately. In addition, we present a software
tool, fits2hdf, for converting between FITS and a new `HDFITS' format, where
data are stored in HDF5 in a FITS-like manner. We show that HDFITS allows
faster reading of data (up to 100x of FITS in some use cases), and improved
compression (higher compression ratios and higher throughput). Finally, we show
that by only changing the import lines in Python-based FITS utilities, HDFITS
formatted data can be presented transparently as an in-memory FITS equivalent.
[5]
oai:arXiv.org:1407.8116 [pdf] - 1296073
Optimizing performance per watt on GPUs in High Performance Computing:
temperature, frequency and voltage effects
Submitted: 2014-07-30, last modified: 2015-10-20
The magnitude of the real-time digital signal processing challenge attached
to large radio astronomical antenna arrays motivates use of high performance
computing (HPC) systems. The need for high power efficiency (performance per
watt) at remote observatory sites parallels that in HPC broadly, where
efficiency is an emerging critical metric. We investigate how the performance
per watt of graphics processing units (GPUs) is affected by temperature, core
clock frequency and voltage. Our results highlight how the underlying physical
processes that govern transistor operation affect power efficiency. In
particular, we show experimentally that GPU power consumption grows
non-linearly with both temperature and supply voltage, as predicted by physical
transistor models. We show lowering GPU supply voltage and increasing clock
frequency while maintaining a low die temperature increases the power
efficiency of an NVIDIA K20 GPU by up to 37-48% over default settings when
running xGPU, a compute-bound code used in radio astronomy. We discuss how
temperature-aware power models could be used to reduce power consumption for
future HPC installations. Automatic temperature-aware and application-dependent
voltage and frequency scaling (T-DVFS and A-DVFS) may provide a mechanism to
achieve better power efficiency for a wider range of codes running on GPUs
[6]
oai:arXiv.org:1508.04884 [pdf] - 1284959
A survey of FRB fields: Limits on repeatability
Petroff, E.;
Johnston, S.;
Keane, E. F.;
van Straten, W.;
Bailes, M.;
Barr, E. D.;
Barsdell, B. R.;
Burke-Spolaor, S.;
Caleb, M.;
Champion, D. J.;
Flynn, C.;
Jameson, A.;
Kramer, M.;
Ng, C.;
Possenti, A.;
Stappers, B. W.
Submitted: 2015-08-20
Several theories exist to explain the source of the bright, millisecond
duration pulses known as fast radio bursts (FRBs). If the progenitors of FRBs
are non-cataclysmic, such as giant pulses from pulsars, pulsar-planet binaries,
or magnetar flares, FRB emission may be seen to repeat. We have undertaken a
survey of the fields of eight known FRBs from the High Time Resolution Universe
survey to search for repeating pulses. Although no repeat pulses were detected
the survey yielded the detection of a new FRB, described in Petroff et al.
(2015a). From our observations we rule out periodic repeating sources with
periods P $\leq$ 8.6 hours and rule out sources with periods 8.6 < P < 21 hours
at the 90% confidence level. At P $\geq$ 21 hours our limits fall off as ~1/P.
Dedicated and persistent observations of FRB source fields are needed to rule
out repetition on longer timescales, a task well-suited to next generation
wide-field transient detectors.
[7]
oai:arXiv.org:1411.3751 [pdf] - 918380
Digital Signal Processing using Stream High Performance Computing: A
512-input Broadband Correlator for Radio Astronomy
Kocz, J.;
Greenhill, L. J;
Barsdell, B. R.;
Price, D.;
Bernardi, G.;
Bourke, S.;
Clark, M. A.;
Craig, J.;
Dexter, M.;
Dowell, J.;
Eftekhari, T.;
Ellingson, S.;
Hallinan, G.;
Hartman, J.;
Jameson, A.;
MacMahon, D.;
Taylor, G.;
Schinzel, F.;
Werthimer, D.
Submitted: 2014-11-13, last modified: 2015-01-06
A "large-N" correlator that makes use of Field Programmable Gate Arrays and
Graphics Processing Units has been deployed as the digital signal processing
system for the Long Wavelength Array station at Owens Valley Radio Observatory
(LWA-OV), to enable the Large Aperture Experiment to Detect the Dark Ages
(LEDA). The system samples a ~100MHz baseband and processes signals from 512
antennas (256 dual polarization) over a ~58MHz instantaneous sub-band,
achieving 16.8Tops/s and 0.236 Tbit/s throughput in a 9kW envelope and single
rack footprint. The output data rate is 260MB/s for 9 second time averaging of
cross-power and 1 second averaging of total-power data. At deployment, the
LWA-OV correlator was the largest in production in terms of N and is the third
largest in terms of complex multiply accumulations, after the Very Large Array
and Atacama Large Millimeter Array. The correlator's comparatively fast
development time and low cost establish a practical foundation for the
scalability of a modular, heterogeneous, computing architecture.
[8]
oai:arXiv.org:1412.0342 [pdf] - 1441465
A real-time fast radio burst: polarization detection and multiwavelength
follow-up
Petroff, E.;
Bailes, M.;
Barr, E. D.;
Barsdell, B. R.;
Bhat, N. D. R.;
Bian, F.;
Burke-Spolaor, S.;
Caleb, M.;
Champion, D.;
Chandra, P.;
Da Costa, G.;
Delvaux, C.;
Flynn, C.;
Gehrels, N.;
Greiner, J.;
Jameson, A.;
Johnston, S.;
Kasliwal, M. M.;
Keane, E. F.;
Keller, S.;
Kocz, J.;
Kramer, M.;
Leloudas, G.;
Malesani, D.;
Mulchaey, J. S.;
Ng, C.;
Ofek, E. O.;
Perley, D. A.;
Possenti, A.;
Schmidt, B. P.;
Shen, Yue;
Stappers, B.;
Tisserand, P.;
van Straten, W.;
Wolf, C.
Submitted: 2014-11-30
Fast radio bursts (FRBs) are one of the most tantalizing mysteries of the
radio sky; their progenitors and origins remain unknown and until now no rapid
multiwavelength follow-up of an FRB has been possible. New instrumentation has
decreased the time between observation and discovery from years to seconds, and
enables polarimetry to be performed on FRBs for the first time. We have
discovered an FRB (FRB 140514) in real-time on 14 May, 2014 at 17:14:11.06 UTC
at the Parkes radio telescope and triggered follow-up at other wavelengths
within hours of the event. FRB 140514 was found with a dispersion measure (DM)
of 562.7(6) cm$^{-3}$ pc, giving an upper limit on source redshift of $z
\lesssim 0.5$. FRB 140514 was found to be 21$\pm$7% (3-$\sigma$) circularly
polarized on the leading edge with a 1-$\sigma$ upper limit on linear
polarization $<10%$. We conclude that this polarization is intrinsic to the
FRB. If there was any intrinsic linear polarization, as might be expected from
coherent emission, then it may have been depolarized by Faraday rotation caused
by passing through strong magnetic fields and/or high density environments. FRB
140514 was discovered during a campaign to re-observe known FRB fields, and
lies close to a previous discovery, FRB 110220; based on the difference in DMs
of these bursts and time-on-sky arguments, we attribute the proximity to
sampling bias and conclude that they are distinct objects. Follow-up conducted
by 12 telescopes observing from X-ray to radio wavelengths was unable to
identify a variable multiwavelength counterpart, allowing us to rule out models
in which FRBs originate from nearby ($z < 0.3$) supernovae and long duration
gamma-ray bursts.
[9]
oai:arXiv.org:1411.0507 [pdf] - 891542
Is HDF5 a good format to replace UVFITS?
Submitted: 2014-11-03
The FITS (Flexible Image Transport System) data format was developed in the
late 1970s for storage and exchange of astronomy-related image data. Since
then, it has become a standard file format not only for images, but also for
radio interferometer data (e.g. UVFITS, FITS-IDI). But is FITS the right format
for next-generation telescopes to adopt? The newer Hierarchical Data Format
(HDF5) file format offers considerable advantages over FITS, but has yet to
gain widespread adoption within radio astronomy. One of the major holdbacks is
that HDF5 is not well supported by data reduction software packages. Here, we
present a comparison of FITS, HDF5, and the MeasurementSet (MS) format for
storage of interferometric data. In addition, we present a tool for converting
between formats. We show that the underlying data model of FITS can be ported
to HDF5, a first step toward achieving wider HDF5 support.
[10]
oai:arXiv.org:1401.8288 [pdf] - 785252
A Scalable Hybrid FPGA/GPU FX Correlator
Kocz, J.;
Greenhill, L. J.;
Barsdell, B. R.;
Bernardi, G.;
Jameson, A.;
Clark, M. A.;
Craig, J.;
Price, D.;
Taylor, G. B.;
Schinzel, F.;
Werthimer, D.
Submitted: 2014-01-31, last modified: 2014-02-18
Radio astronomical imaging arrays comprising large numbers of antennas,
O(10^2-10^3) have posed a signal processing challenge because of the required
O(N^2) cross correlation of signals from each antenna and requisite signal
routing. This motivated the implementation of a Packetized Correlator
architecture that applies Field Programmable Gate Arrays (FPGAs) to the O(N)
"F-stage" transforming time domain to frequency domain data, and Graphics
Processing Units (GPUs) to the O(N^2) "X-stage" performing an outer product
among spectra for each antenna. The design is readily scalable to at least
O(10^3) antennas. Fringes, visibility amplitudes and sky image results obtained
during field testing are presented.
[11]
oai:arXiv.org:1307.1628 [pdf] - 689147
A Population of Fast Radio Bursts at Cosmological Distances
Thornton, D.;
Stappers, B.;
Bailes, M.;
Barsdell, B. R.;
Bates, S. D.;
Bhat, N. D. R.;
Burgay, M.;
Burke-Spolaor, S.;
Champion, D. J.;
Coster, P.;
D'Amico, N.;
Jameson, A.;
Johnston, S.;
Keith, M. J.;
Kramer, M.;
Levin, L.;
Milia, S.;
Ng, C.;
Possenti, A.;
van Straten, W.
Submitted: 2013-07-05
Searches for transient astrophysical sources often reveal unexpected classes
of objects that are useful physical laboratories. In a recent survey for
pulsars and fast transients we have uncovered four millisecond-duration radio
transients all more than 40{\deg} from the Galactic plane. The bursts'
properties indicate that they are of celestial rather than terrestrial origin.
Host galaxy and intergalactic medium models suggest that they have cosmological
redshifts of 0.5 to 1, and distances of up to 3 gigaparsecs. No temporally
coincident x- or gamma-ray signature was identified in association with the
bursts. Characterization of the source population and identification of host
galaxies offers an opportunity to determine the baryonic content of the
Universe.
[12]
oai:arXiv.org:1306.4190 [pdf] - 1172117
The High Time Resolution Universe Pulsar Survey VIII: The Galactic
millisecond pulsar population
Levin, L.;
Bailes, M.;
Barsdell, B. R.;
Bates, S. D.;
Bhat, N. D. R.;
Burgay, M.;
Burke-Spolaor, S.;
Champion, D. J.;
Coster, P.;
D'Amico, N.;
Jameson, A.;
Johnston, S.;
Keith, M. J.;
Kramer, M.;
Milia, S.;
Ng, C.;
Possenti, A.;
Stappers, B.;
Thornton, D.;
van Straten, W.
Submitted: 2013-06-18
We have used millisecond pulsars (MSPs) from the southern High Time
Resolution Universe (HTRU) intermediate latitude survey area to simulate the
distribution and total population of MSPs in the Galaxy. Our model makes use of
the scale factor method, which estimates the ratio of the total number of MSPs
in the Galaxy to the known sample. Using our best fit value for the z-height,
z=500 pc, we find an underlying population of MSPs of 8.3(\pm 4.2)*10^4 sources
down to a limiting luminosity of L_min=0.1 mJy kpc^2 and a luminosity
distribution with a steep slope of d\log N/d\log L = -1.45(\pm 0.14). However,
at the low end of the luminosity distribution, the uncertainties introduced by
small number statistics are large. By omitting very low luminosity pulsars, we
find a Galactic population above L_min=0.2 mJy kpc^2 of only 3.0(\pm 0.7)*10^4
MSPs. We have also simulated pulsars with periods shorter than any known MSP,
and estimate the maximum number of sub-MSPs in the Galaxy to be 7.8(\pm
5.0)*10^4 pulsars at L=0.1 mJy kpc^2. In addition, we estimate that the high
and low latitude parts of the southern HTRU survey will detect 68 and 42 MSPs
respectively, including 78 new discoveries. Pulsar luminosity, and hence flux
density, is an important input parameter in the model. Some of the published
flux densities for the pulsars in our sample do not agree with the observed
flux densities from our data set, and we have instead calculated average
luminosities from archival data from the Parkes Telescope. We found many
luminosities to be very different than their catalogue values, leading to very
different population estimates. Large variations in flux density highlight the
importance of including scintillation effects in MSP population studies.
[13]
oai:arXiv.org:1209.0793 [pdf] - 1151125
The High Time Resolution Universe Survey VI: An Artificial Neural
Network and Timing of 75 Pulsars
Bates, S. D.;
Bailes, M.;
Barsdell, B. R.;
Bhat, N. D. R.;
Burgay, M.;
Burke-Spolaor, S.;
Champion, D. J.;
Coster, P.;
D'Amico, N.;
Jameson, A.;
Johnston, S.;
Keith, M. J.;
Kramer, M.;
Levin, L.;
Lyne, A.;
Milia, S.;
Ng, C.;
Nietner, C.;
Possenti, A.;
Stappers, B.;
Thornton, D.;
van Straten, W.
Submitted: 2012-09-04, last modified: 2012-09-09
We present 75 pulsars discovered in the mid-latitude portion of the High Time
Resolution Universe survey, 54 of which have full timing solutions. All the
pulsars have spin periods greater than 100 ms, and none of those with timing
solutions are in binaries. Two display particularly interesting behaviour; PSR
J1054-5944 is found to be an intermittent pulsar, and PSR J1809-0119 has
glitched twice since its discovery.
In the second half of the paper we discuss the development and application of
an artificial neural network in the data-processing pipeline for the survey. We
discuss the tests that were used to generate scores and find that our neural
network was able to reject over 99% of the candidates produced in the data
processing, and able to blindly detect 85% of pulsars. We suggest that
improvements to the accuracy should be possible if further care is taken when
training an artificial neural network; for example ensuring that a
representative sample of the pulsar population is used during the training
process, or the use of different artificial neural networks for the detection
of different types of pulsars.
[14]
oai:arXiv.org:1201.5380 [pdf] - 1093248
Accelerating incoherent dedispersion
Submitted: 2012-01-25
Incoherent dedispersion is a computationally intensive problem that appears
frequently in pulsar and transient astronomy. For current and future transient
pipelines, dedispersion can dominate the total execution time, meaning its
computational speed acts as a constraint on the quality and quantity of science
results. It is thus critical that the algorithm be able to take advantage of
trends in commodity computing hardware. With this goal in mind, we present
analysis of the 'direct', 'tree' and 'sub-band' dedispersion algorithms with
respect to their potential for efficient execution on modern graphics
processing units (GPUs). We find all three to be excellent candidates, and
proceed to describe implementations in C for CUDA using insight gained from the
analysis. Using recent CPU and GPU hardware, the transition to the GPU provides
a speed-up of 9x for the direct algorithm when compared to an optimised
quad-core CPU code. For realistic recent survey parameters, these speeds are
high enough that further optimisation is unnecessary to achieve real-time
processing. Where further speed-ups are desirable, we find that the tree and
sub-band algorithms are able to provide 3-7x better performance at the cost of
certain smearing, memory consumption and development time trade-offs. We finish
with a discussion of the implications of these results for future transient
surveys. Our GPU dedispersion code is publicly available as a C library at:
http://dedisp.googlecode.com/
[15]
oai:arXiv.org:1112.4532 [pdf] - 1092508
Three-dimensional shapelets and an automated classification scheme for
dark matter haloes
Submitted: 2011-12-19
We extend the two-dimensional Cartesian shapelet formalism to d-dimensions.
Concentrating on the three-dimensional case, we derive shapelet-based equations
for the mass, centroid, root-mean-square radius, and components of the
quadrupole moment and moment of inertia tensors. Using cosmological N-body
simulations as an application domain, we show that three-dimensional shapelets
can be used to replicate the complex sub-structure of dark matter halos and
demonstrate the basis of an automated classification scheme for halo shapes. We
investigate the shapelet decomposition process from an algorithmic viewpoint,
and consider opportunities for accelerating the computation of shapelet-based
representations using graphics processing units (GPUs).
[16]
oai:arXiv.org:1112.0065 [pdf] - 446151
Spotting Radio Transients with the help of GPUs
Submitted: 2011-11-30
Exploration of the time-domain radio sky has huge potential for advancing our
knowledge of the dynamic universe. Past surveys have discovered large numbers
of pulsars, rotating radio transients and other transient radio phenomena;
however, they have typically relied upon off-line processing to cope with the
high data and processing rate. This paradigm rules out the possibility of
obtaining high-resolution base-band dumps of significant events or of
performing immediate follow-up observations, limiting analysis power to what
can be gleaned from detection data alone. To overcome this limitation,
real-time processing and detection of transient radio events is required. By
exploiting the significant computing power of modern graphics processing units
(GPUs), we are developing a transient-detection pipeline that runs in real-time
on data from the Parkes radio telescope. In this paper we discuss the
algorithms used in our pipeline, the details of their implementation on the GPU
and the challenges posed by the presence of radio frequency interference.
[17]
oai:arXiv.org:1101.2254 [pdf] - 956017
Fitting Galaxies on GPUs
Submitted: 2011-01-11
Structural parameters are normally extracted from observed galaxies by
fitting analytic light profiles to the observations. Obtaining accurate fits to
high-resolution images is a computationally expensive task, requiring many
model evaluations and convolutions with the imaging point spread function.
While these algorithms contain high degrees of parallelism, current
implementations do not exploit this property. With evergrowing volumes of
observational data, an inability to make use of advances in computing power can
act as a constraint on scientific outcomes. This is the motivation behind our
work, which aims to implement the model-fitting procedure on a graphics
processing unit (GPU). We begin by analysing the algorithms involved in model
evaluation with respect to their suitability for modern many-core computing
architectures like GPUs, finding them to be well-placed to take advantage of
the high memory bandwidth offered by this hardware. Following our analysis, we
briefly describe a preliminary implementation of the model fitting procedure
using freely-available GPU libraries. Early results suggest a speed-up of
around 10x over a CPU implementation. We discuss the opportunities such a
speed-up could provide, including the ability to use more computationally
expensive but better-performing fitting routines to increase the quality and
robustness of fits.
[18]
oai:arXiv.org:1008.4623 [pdf] - 294777
Astrophysical Supercomputing with GPUs: Critical Decisions for Early
Adopters
Submitted: 2010-08-26
General purpose computing on graphics processing units (GPGPU) is
dramatically changing the landscape of high performance computing in astronomy.
In this paper, we identify and investigate several key decision areas, with a
goal of simplyfing the early adoption of GPGPU in astronomy. We consider the
merits of OpenCL as an open standard in order to reduce risks associated with
coding in a native, vendor-specific programming environment, and present a GPU
programming philosophy based on using brute force solutions. We assert that
effective use of new GPU-based supercomputing facilities will require a change
in approach from astronomers. This will likely include improved programming
training, an increased need for software development best-practice through the
use of profiling and related optimisation tools, and a greater reliance on
third-party code libraries. As with any new technology, those willing to take
the risks, and make the investment of time and effort to become early adopters
of GPGPU in astronomy, stand to reap great benefits.
[19]
oai:arXiv.org:1007.1660 [pdf] - 1033610
Analysing Astronomy Algorithms for GPUs and Beyond
Submitted: 2010-07-09
Astronomy depends on ever increasing computing power. Processor clock-rates
have plateaued, and increased performance is now appearing in the form of
additional processor cores on a single chip. This poses significant challenges
to the astronomy software community. Graphics Processing Units (GPUs), now
capable of general-purpose computation, exemplify both the difficult
learning-curve and the significant speedups exhibited by massively-parallel
hardware architectures. We present a generalised approach to tackling this
paradigm shift, based on the analysis of algorithms. We describe a small
collection of foundation algorithms relevant to astronomy and explain how they
may be used to ease the transition to massively-parallel computing
architectures. We demonstrate the effectiveness of our approach by applying it
to four well-known astronomy problems: Hogbom CLEAN, inverse ray-shooting for
gravitational lensing, pulsar dedispersion and volume rendering. Algorithms
with well-defined memory access patterns and high arithmetic intensity stand to
receive the greatest performance boost from massively-parallel architectures,
while those that involve a significant amount of decision-making may struggle
to take advantage of the available processing power.
[20]
oai:arXiv.org:1005.5198 [pdf] - 1032807
Computational advances in gravitational microlensing: a comparison of
CPU, GPU, and parallel, large data codes
Submitted: 2010-05-27
To assess how future progress in gravitational microlensing computation at
high optical depth will rely on both hardware and software solutions, we
compare a direct inverse ray-shooting code implemented on a graphics processing
unit (GPU) with both a widely-used hierarchical tree code on a single-core CPU,
and a recent implementation of a parallel tree code suitable for a CPU-based
cluster supercomputer. We examine the accuracy of the tree codes through
comparison with a direct code over a much wider range of parameter space than
has been feasible before. We demonstrate that all three codes present
comparable accuracy, and choice of approach depends on considerations relating
to the scale and nature of the microlensing problem under investigation. On
current hardware, there is little difference in the processing speed of the
single-core CPU tree code and the GPU direct code, however the recent plateau
in single-core CPU speeds means the existing tree code is no longer able to
take advantage of Moore's law-like increases in processing speed. Instead, we
anticipate a rapid increase in GPU capabilities in the next few years, which is
advantageous to the direct code. We suggest that progress in other areas of
astrophysical computation may benefit from a transition to GPUs through the use
of "brute force" algorithms, rather than attempting to port the current best
solution directly to a GPU language -- for certain classes of problems, the
simple implementation on GPUs may already be no worse than an optimised
single-core CPU version.
[21]
oai:arXiv.org:1001.2048 [pdf] - 32714
Advanced Architectures for Astrophysical Supercomputing
Submitted: 2010-01-12
Astronomers have come to rely on the increasing performance of computers to
reduce, analyze, simulate and visualize their data. In this environment, faster
computation can mean more science outcomes or the opening up of new parameter
spaces for investigation. If we are to avoid major issues when implementing
codes on advanced architectures, it is important that we have a solid
understanding of our algorithms. A recent addition to the high-performance
computing scene that highlights this point is the graphics processing unit
(GPU). The hardware originally designed for speeding-up graphics rendering in
video games is now achieving speed-ups of $O(100\times)$ in general-purpose
computation -- performance that cannot be ignored. We are using a generalized
approach, based on the analysis of astronomy algorithms, to identify the
optimal problem-types and techniques for taking advantage of both current GPU
hardware and future developments in computing architectures.
[22]
oai:arXiv.org:0905.2453 [pdf] - 24296
Teraflop per second gravitational lensing ray-shooting using graphics
processing units
Submitted: 2009-05-14
Gravitational lensing calculation using a direct inverse ray-shooting
approach is a computationally expensive way to determine magnification maps,
caustic patterns, and light-curves (e.g. as a function of source profile and
size). However, as an easily parallelisable calculation, gravitational
ray-shooting can be accelerated using programmable graphics processing units
(GPUs). We present our implementation of inverse ray-shooting for the NVIDIA
G80 generation of graphics processors using the NVIDIA Compute Unified Device
Architecture (CUDA) software development kit. We also extend our code to
multiple-GPU systems, including a 4-GPU NVIDIA S1070 Tesla unit. We achieve
sustained processing performance of 182 Gflop/s on a single GPU, and 1.28
Tflop/s using the Tesla unit. We demonstrate that billion-lens microlensing
simulations can be run on a single computer with a Tesla unit in timescales of
order a day without the use of a hierarchical tree code.