Full-text search for arXiv

22 article(s) in total. 87 co-authors, from 1 to 9 common article(s). Median position in authors list is 3,0.

[1] oai:arXiv.org:1709.09313 [pdf] - 1682470

Design and characterization of the Large-Aperture Experiment to Detect the Dark Age (LEDA) radiometer systems

Comments: Accepted to MNRAS

Submitted: 2017-09-26, last modified: 2018-05-15

The Large-Aperture Experiment to Detect the Dark Age (LEDA) was designed to detect the predicted O(100)mK sky-averaged absorption of the Cosmic Microwave Background by Hydrogen in the neutral pre- and intergalactic medium just after the cosmological Dark Age. The spectral signature would be associated with emergence of a diffuse Ly$\alpha$ background from starlight during 'Cosmic Dawn'. Recently, Bowman et al. (2018) have reported detection of this predicted absorption feature, with an unexpectedly large amplitude of 530 mK, centered at 78 MHz. Verification of this result by an independent experiment, such as LEDA, is pressing. In this paper, we detail design and characterization of the LEDA radiometer systems, and a first-generation pipeline that instantiates a signal path model. Sited at the Owens Valley Radio Observatory Long Wavelength Array, LEDA systems include the station correlator, five well-separated redundant dual polarization radiometers and backend electronics. The radiometers deliver a 30-85MHz band (16<z<34) and operate as part of the larger interferometric array, for purposes ultimately of in situ calibration. Here, we report on the LEDA system design, calibration approach, and progress in characterization as of January 2016. The LEDA systems are currently being modified to improve performance near 78 MHz in order to verify the purported absorption feature.

[2] oai:arXiv.org:1711.00466 [pdf] - 1716958

The Radio Sky at Meter Wavelengths: m-Mode Analysis Imaging with the Owens Valley Long Wavelength Array

Comments: 27 pages, 18 figures

Submitted: 2017-10-26

A host of new low-frequency radio telescopes seek to measure the 21-cm transition of neutral hydrogen from the early universe. These telescopes have the potential to directly probe star and galaxy formation at redshifts $20 \gtrsim z \gtrsim 7$, but are limited by the dynamic range they can achieve against foreground sources of low-frequency radio emission. Consequently, there is a growing demand for modern, high-fidelity maps of the sky at frequencies below 200 MHz for use in foreground modeling and removal. We describe a new widefield imaging technique for drift-scanning interferometers, Tikhonov-regularized $m$-mode analysis imaging. This technique constructs images of the entire sky in a single synthesis imaging step with exact treatment of widefield effects. We describe how the CLEAN algorithm can be adapted to deconvolve maps generated by $m$-mode analysis imaging. We demonstrate Tikhonov-regularized $m$-mode analysis imaging using the Owens Valley Long Wavelength Array (OVRO-LWA) by generating 8 new maps of the sky north of $\delta=-30^\circ$ with 15 arcmin angular resolution, at frequencies evenly spaced between 36.528 MHz and 73.152 MHz, and $\sim$800 mJy/beam thermal noise. These maps are a 10-fold improvement in angular resolution over existing full-sky maps at comparable frequencies, which have angular resolutions $\ge 2^\circ$. Each map is constructed exclusively from interferometric observations and does not represent the globally averaged sky brightness. Future improvements will incorporate total power radiometry, improved thermal noise, and improved angular resolution -- due to the planned expansion of the OVRO-LWA to 2.6 km baselines. These maps serve as a first step on the path to the use of more sophisticated foreground filters in 21-cm cosmology incorporating the measured angular and frequency structure of all foreground contaminants.

[3] oai:arXiv.org:1708.00720 [pdf] - 1586679

Bifrost: a Python/C++ Framework for High-Throughput Stream Processing in Astronomy

Cranmer, Miles D.; Barsdell, Benjamin R.; Price, Danny C.; Dowell, Jayce; Garsden, Hugh; Dike, Veronica; Eftekhari, Tarraneh; Hegedus, Alexander M.; Malins, Joseph; Obenberger, Kenneth S.; Schinzel, Frank; Stovall, Kevin; Taylor, Gregory B.; Greenhill, Lincoln J.

Comments: 25 pages, 13 figures, submitted to JAI. For the code, see https://github.com/ledatelescope/bifrost

Submitted: 2017-08-02

Radio astronomy observatories with high throughput back end instruments require real-time data processing. While computing hardware continues to advance rapidly, development of real-time processing pipelines remains difficult and time-consuming, which can limit scientific productivity. Motivated by this, we have developed Bifrost: an open-source software framework for rapid pipeline development. Bifrost combines a high-level Python interface with highly efficient reconfigurable data transport and a library of computing blocks for CPU and GPU processing. The framework is generalizable, but initially it emphasizes the needs of high-throughput radio astronomy pipelines, such as the ability to process data buffers as if they were continuous streams, the capacity to partition processing into distinct data sequences (e.g., separate observations), and the ability to extract specific intervals from buffered data. Computing blocks in the library are designed for applications such as interferometry, pulsar dedispersion and timing, and transient search pipelines. We describe the design and implementation of the Bifrost framework and demonstrate its use as the backbone in the correlation and beamforming back end of the Long Wavelength Array station in the Sevilleta National Wildlife Refuge, NM.

[4] oai:arXiv.org:1505.06421 [pdf] - 1296158

HDFITS: porting the FITS data model to HDF5

Price, D. C.; Barsdell, B. R.; Greenhill, L. J.

Comments: In Astronomy and Computing special issue on the future of astronomical data formats. Volume 12, September 2015, Pages 212-220. doi:10.1016/j.ascom.2015.05.001

Submitted: 2015-05-24, last modified: 2015-10-20

The FITS (Flexible Image Transport System) data format has been the de facto data format for astronomy-related data products since its inception in the late 1970s. While the FITS file format is widely supported, it lacks many of the features of more modern data serialization, such as the Hierarchical Data Format (HDF5). The HDF5 file format offers considerable advantages over FITS, such as improved I/O speed and compression, but has yet to gain widespread adoption within astronomy. One of the major holdbacks is that HDF5 is not well supported by data reduction software packages and image viewers. Here, we present a comparison of FITS and HDF5 as a format for storage of astronomy datasets. We show that the underlying data model of FITS can be ported to HDF5 in a straightforward manner, and that by doing so the advantages of the HDF5 file format can be leveraged immediately. In addition, we present a software tool, fits2hdf, for converting between FITS and a new `HDFITS' format, where data are stored in HDF5 in a FITS-like manner. We show that HDFITS allows faster reading of data (up to 100x of FITS in some use cases), and improved compression (higher compression ratios and higher throughput). Finally, we show that by only changing the import lines in Python-based FITS utilities, HDFITS formatted data can be presented transparently as an in-memory FITS equivalent.

[5] oai:arXiv.org:1407.8116 [pdf] - 1296073

Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects

Price, D. C.; Clark, M. A.; Barsdell, B. R.; Babich, R.; Greenhill, L. J.

Comments: In Computer Science - Research and Development special issue on Energy-Aware High-Performance Computing. The final publication is available at Springer via http://dx.doi.org/10.1007/s00450-015-0300-5

Submitted: 2014-07-30, last modified: 2015-10-20

The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency (performance per watt) at remote observatory sites parallels that in HPC broadly, where efficiency is an emerging critical metric. We investigate how the performance per watt of graphics processing units (GPUs) is affected by temperature, core clock frequency and voltage. Our results highlight how the underlying physical processes that govern transistor operation affect power efficiency. In particular, we show experimentally that GPU power consumption grows non-linearly with both temperature and supply voltage, as predicted by physical transistor models. We show lowering GPU supply voltage and increasing clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU by up to 37-48% over default settings when running xGPU, a compute-bound code used in radio astronomy. We discuss how temperature-aware power models could be used to reduce power consumption for future HPC installations. Automatic temperature-aware and application-dependent voltage and frequency scaling (T-DVFS and A-DVFS) may provide a mechanism to achieve better power efficiency for a wider range of codes running on GPUs

[6] oai:arXiv.org:1508.04884 [pdf] - 1284959

A survey of FRB fields: Limits on repeatability

Petroff, E.; Johnston, S.; Keane, E. F.; van Straten, W.; Bailes, M.; Barr, E. D.; Barsdell, B. R.; Burke-Spolaor, S.; Caleb, M.; Champion, D. J.; Flynn, C.; Jameson, A.; Kramer, M.; Ng, C.; Possenti, A.; Stappers, B. W.

Comments: 6 pages, 1 figure; accepted for publication in MNRAS

Submitted: 2015-08-20

Several theories exist to explain the source of the bright, millisecond duration pulses known as fast radio bursts (FRBs). If the progenitors of FRBs are non-cataclysmic, such as giant pulses from pulsars, pulsar-planet binaries, or magnetar flares, FRB emission may be seen to repeat. We have undertaken a survey of the fields of eight known FRBs from the High Time Resolution Universe survey to search for repeating pulses. Although no repeat pulses were detected the survey yielded the detection of a new FRB, described in Petroff et al. (2015a). From our observations we rule out periodic repeating sources with periods P $\leq$ 8.6 hours and rule out sources with periods 8.6 < P < 21 hours at the 90% confidence level. At P $\geq$ 21 hours our limits fall off as ~1/P. Dedicated and persistent observations of FRB source fields are needed to rule out repetition on longer timescales, a task well-suited to next generation wide-field transient detectors.

[7] oai:arXiv.org:1411.3751 [pdf] - 918380

Digital Signal Processing using Stream High Performance Computing: A 512-input Broadband Correlator for Radio Astronomy

Kocz, J.; Greenhill, L. J; Barsdell, B. R.; Price, D.; Bernardi, G.; Bourke, S.; Clark, M. A.; Craig, J.; Dexter, M.; Dowell, J.; Eftekhari, T.; Ellingson, S.; Hallinan, G.; Hartman, J.; Jameson, A.; MacMahon, D.; Taylor, G.; Schinzel, F.; Werthimer, D.

Comments: 10 pages, 8 figures, submitted to JAI

Submitted: 2014-11-13, last modified: 2015-01-06

A "large-N" correlator that makes use of Field Programmable Gate Arrays and Graphics Processing Units has been deployed as the digital signal processing system for the Long Wavelength Array station at Owens Valley Radio Observatory (LWA-OV), to enable the Large Aperture Experiment to Detect the Dark Ages (LEDA). The system samples a ~100MHz baseband and processes signals from 512 antennas (256 dual polarization) over a ~58MHz instantaneous sub-band, achieving 16.8Tops/s and 0.236 Tbit/s throughput in a 9kW envelope and single rack footprint. The output data rate is 260MB/s for 9 second time averaging of cross-power and 1 second averaging of total-power data. At deployment, the LWA-OV correlator was the largest in production in terms of N and is the third largest in terms of complex multiply accumulations, after the Very Large Array and Atacama Large Millimeter Array. The correlator's comparatively fast development time and low cost establish a practical foundation for the scalability of a modular, heterogeneous, computing architecture.

[8] oai:arXiv.org:1412.0342 [pdf] - 1441465

A real-time fast radio burst: polarization detection and multiwavelength follow-up

Comments: 11 pages, 3 figures, accepted for publication in MNRAS

Submitted: 2014-11-30

Fast radio bursts (FRBs) are one of the most tantalizing mysteries of the radio sky; their progenitors and origins remain unknown and until now no rapid multiwavelength follow-up of an FRB has been possible. New instrumentation has decreased the time between observation and discovery from years to seconds, and enables polarimetry to be performed on FRBs for the first time. We have discovered an FRB (FRB 140514) in real-time on 14 May, 2014 at 17:14:11.06 UTC at the Parkes radio telescope and triggered follow-up at other wavelengths within hours of the event. FRB 140514 was found with a dispersion measure (DM) of 562.7(6) cm$^{-3}$ pc, giving an upper limit on source redshift of $z \lesssim 0.5$. FRB 140514 was found to be 21$\pm$7% (3-$\sigma$) circularly polarized on the leading edge with a 1-$\sigma$ upper limit on linear polarization $<10%$. We conclude that this polarization is intrinsic to the FRB. If there was any intrinsic linear polarization, as might be expected from coherent emission, then it may have been depolarized by Faraday rotation caused by passing through strong magnetic fields and/or high density environments. FRB 140514 was discovered during a campaign to re-observe known FRB fields, and lies close to a previous discovery, FRB 110220; based on the difference in DMs of these bursts and time-on-sky arguments, we attribute the proximity to sampling bias and conclude that they are distinct objects. Follow-up conducted by 12 telescopes observing from X-ray to radio wavelengths was unable to identify a variable multiwavelength counterpart, allowing us to rule out models in which FRBs originate from nearby ($z < 0.3$) supernovae and long duration gamma-ray bursts.

[9] oai:arXiv.org:1411.0507 [pdf] - 891542

Is HDF5 a good format to replace UVFITS?

Price, Danny C.; Barsdell, Benjamin R.; Greenhill, Lincoln J.

Comments: Submitted to Proceedings of ADASS XXIV

Submitted: 2014-11-03

The FITS (Flexible Image Transport System) data format was developed in the late 1970s for storage and exchange of astronomy-related image data. Since then, it has become a standard file format not only for images, but also for radio interferometer data (e.g. UVFITS, FITS-IDI). But is FITS the right format for next-generation telescopes to adopt? The newer Hierarchical Data Format (HDF5) file format offers considerable advantages over FITS, but has yet to gain widespread adoption within radio astronomy. One of the major holdbacks is that HDF5 is not well supported by data reduction software packages. Here, we present a comparison of FITS, HDF5, and the MeasurementSet (MS) format for storage of interferometric data. In addition, we present a tool for converting between formats. We show that the underlying data model of FITS can be ported to HDF5, a first step toward achieving wider HDF5 support.

[10] oai:arXiv.org:1401.8288 [pdf] - 785252

A Scalable Hybrid FPGA/GPU FX Correlator

Kocz, J.; Greenhill, L. J.; Barsdell, B. R.; Bernardi, G.; Jameson, A.; Clark, M. A.; Craig, J.; Price, D.; Taylor, G. B.; Schinzel, F.; Werthimer, D.

Comments: 8 pages, 10 figures, accepted by JAI

Submitted: 2014-01-31, last modified: 2014-02-18

Radio astronomical imaging arrays comprising large numbers of antennas, O(10^2-10^3) have posed a signal processing challenge because of the required O(N^2) cross correlation of signals from each antenna and requisite signal routing. This motivated the implementation of a Packetized Correlator architecture that applies Field Programmable Gate Arrays (FPGAs) to the O(N) "F-stage" transforming time domain to frequency domain data, and Graphics Processing Units (GPUs) to the O(N^2) "X-stage" performing an outer product among spectra for each antenna. The design is readily scalable to at least O(10^3) antennas. Fringes, visibility amplitudes and sky image results obtained during field testing are presented.

[11] oai:arXiv.org:1307.1628 [pdf] - 689147

A Population of Fast Radio Bursts at Cosmological Distances

Comments: 20 pages, 6 figures, 2 tables. Energies in table 1 corrected from hard-copy version

Submitted: 2013-07-05

Searches for transient astrophysical sources often reveal unexpected classes of objects that are useful physical laboratories. In a recent survey for pulsars and fast transients we have uncovered four millisecond-duration radio transients all more than 40{\deg} from the Galactic plane. The bursts' properties indicate that they are of celestial rather than terrestrial origin. Host galaxy and intergalactic medium models suggest that they have cosmological redshifts of 0.5 to 1, and distances of up to 3 gigaparsecs. No temporally coincident x- or gamma-ray signature was identified in association with the bursts. Characterization of the source population and identification of host galaxies offers an opportunity to determine the baryonic content of the Universe.

[12] oai:arXiv.org:1306.4190 [pdf] - 1172117

The High Time Resolution Universe Pulsar Survey VIII: The Galactic millisecond pulsar population

Comments: 13 pages, 5 figures, 5 tables. Accepted for publication in MNRAS

Submitted: 2013-06-18

We have used millisecond pulsars (MSPs) from the southern High Time Resolution Universe (HTRU) intermediate latitude survey area to simulate the distribution and total population of MSPs in the Galaxy. Our model makes use of the scale factor method, which estimates the ratio of the total number of MSPs in the Galaxy to the known sample. Using our best fit value for the z-height, z=500 pc, we find an underlying population of MSPs of 8.3(\pm 4.2)*10^4 sources down to a limiting luminosity of L_min=0.1 mJy kpc^2 and a luminosity distribution with a steep slope of d\log N/d\log L = -1.45(\pm 0.14). However, at the low end of the luminosity distribution, the uncertainties introduced by small number statistics are large. By omitting very low luminosity pulsars, we find a Galactic population above L_min=0.2 mJy kpc^2 of only 3.0(\pm 0.7)*10^4 MSPs. We have also simulated pulsars with periods shorter than any known MSP, and estimate the maximum number of sub-MSPs in the Galaxy to be 7.8(\pm 5.0)*10^4 pulsars at L=0.1 mJy kpc^2. In addition, we estimate that the high and low latitude parts of the southern HTRU survey will detect 68 and 42 MSPs respectively, including 78 new discoveries. Pulsar luminosity, and hence flux density, is an important input parameter in the model. Some of the published flux densities for the pulsars in our sample do not agree with the observed flux densities from our data set, and we have instead calculated average luminosities from archival data from the Parkes Telescope. We found many luminosities to be very different than their catalogue values, leading to very different population estimates. Large variations in flux density highlight the importance of including scintillation effects in MSP population studies.

[13] oai:arXiv.org:1209.0793 [pdf] - 1151125

The High Time Resolution Universe Survey VI: An Artificial Neural Network and Timing of 75 Pulsars

Comments: 15 pages, 8 figures

Submitted: 2012-09-04, last modified: 2012-09-09

We present 75 pulsars discovered in the mid-latitude portion of the High Time Resolution Universe survey, 54 of which have full timing solutions. All the pulsars have spin periods greater than 100 ms, and none of those with timing solutions are in binaries. Two display particularly interesting behaviour; PSR J1054-5944 is found to be an intermittent pulsar, and PSR J1809-0119 has glitched twice since its discovery. In the second half of the paper we discuss the development and application of an artificial neural network in the data-processing pipeline for the survey. We discuss the tests that were used to generate scores and find that our neural network was able to reject over 99% of the candidates produced in the data processing, and able to blindly detect 85% of pulsars. We suggest that improvements to the accuracy should be possible if further care is taken when training an artificial neural network; for example ensuring that a representative sample of the pulsar population is used during the training process, or the use of different artificial neural networks for the detection of different types of pulsars.

[14] oai:arXiv.org:1201.5380 [pdf] - 1093248

Accelerating incoherent dedispersion

Barsdell, Benjamin R.; Bailes, Matthew; Barnes, David G.; Fluke, Christopher J.

Comments: 15 pages, 4 figures, 2 tables, accepted for publication in MNRAS

Submitted: 2012-01-25

Incoherent dedispersion is a computationally intensive problem that appears frequently in pulsar and transient astronomy. For current and future transient pipelines, dedispersion can dominate the total execution time, meaning its computational speed acts as a constraint on the quality and quantity of science results. It is thus critical that the algorithm be able to take advantage of trends in commodity computing hardware. With this goal in mind, we present analysis of the 'direct', 'tree' and 'sub-band' dedispersion algorithms with respect to their potential for efficient execution on modern graphics processing units (GPUs). We find all three to be excellent candidates, and proceed to describe implementations in C for CUDA using insight gained from the analysis. Using recent CPU and GPU hardware, the transition to the GPU provides a speed-up of 9x for the direct algorithm when compared to an optimised quad-core CPU code. For realistic recent survey parameters, these speeds are high enough that further optimisation is unnecessary to achieve real-time processing. Where further speed-ups are desirable, we find that the tree and sub-band algorithms are able to provide 3-7x better performance at the cost of certain smearing, memory consumption and development time trade-offs. We finish with a discussion of the implications of these results for future transient surveys. Our GPU dedispersion code is publicly available as a C library at: http://dedisp.googlecode.com/

[15] oai:arXiv.org:1112.4532 [pdf] - 1092508

Three-dimensional shapelets and an automated classification scheme for dark matter haloes

Fluke, C. J.; Malec, A. L.; Lasky, P. D.; Barsdell, B. R.

Comments: 19 pages, 11 figures, accepted for publication in MNRAS

Submitted: 2011-12-19

We extend the two-dimensional Cartesian shapelet formalism to d-dimensions. Concentrating on the three-dimensional case, we derive shapelet-based equations for the mass, centroid, root-mean-square radius, and components of the quadrupole moment and moment of inertia tensors. Using cosmological N-body simulations as an application domain, we show that three-dimensional shapelets can be used to replicate the complex sub-structure of dark matter halos and demonstrate the basis of an automated classification scheme for halo shapes. We investigate the shapelet decomposition process from an algorithmic viewpoint, and consider opportunities for accelerating the computation of shapelet-based representations using graphics processing units (GPUs).

[16] oai:arXiv.org:1112.0065 [pdf] - 446151

Spotting Radio Transients with the help of GPUs

Barsdell, Benjamin R.; Bailes, Matthew; Barnes, David G.; Fluke, Christopher J.

Comments: 4 Pages. To appear in the proceedings of ADASS XXI, ed. P.Ballester and D.Egret, ASP Conf. Series

Submitted: 2011-11-30

Exploration of the time-domain radio sky has huge potential for advancing our knowledge of the dynamic universe. Past surveys have discovered large numbers of pulsars, rotating radio transients and other transient radio phenomena; however, they have typically relied upon off-line processing to cope with the high data and processing rate. This paradigm rules out the possibility of obtaining high-resolution base-band dumps of significant events or of performing immediate follow-up observations, limiting analysis power to what can be gleaned from detection data alone. To overcome this limitation, real-time processing and detection of transient radio events is required. By exploiting the significant computing power of modern graphics processing units (GPUs), we are developing a transient-detection pipeline that runs in real-time on data from the Parkes radio telescope. In this paper we discuss the algorithms used in our pipeline, the details of their implementation on the GPU and the challenges posed by the presence of radio frequency interference.

[17] oai:arXiv.org:1101.2254 [pdf] - 956017

Fitting Galaxies on GPUs

Barsdell, Benjamin R.; Barnes, David G.; Fluke, Christopher J.

Comments: 4 pages, to appear in the proceedings of the ADASS XX conference, Nov. 7-11 2010, Boston

Submitted: 2011-01-11

Structural parameters are normally extracted from observed galaxies by fitting analytic light profiles to the observations. Obtaining accurate fits to high-resolution images is a computationally expensive task, requiring many model evaluations and convolutions with the imaging point spread function. While these algorithms contain high degrees of parallelism, current implementations do not exploit this property. With evergrowing volumes of observational data, an inability to make use of advances in computing power can act as a constraint on scientific outcomes. This is the motivation behind our work, which aims to implement the model-fitting procedure on a graphics processing unit (GPU). We begin by analysing the algorithms involved in model evaluation with respect to their suitability for modern many-core computing architectures like GPUs, finding them to be well-placed to take advantage of the high memory bandwidth offered by this hardware. Following our analysis, we briefly describe a preliminary implementation of the model fitting procedure using freely-available GPU libraries. Early results suggest a speed-up of around 10x over a CPU implementation. We discuss the opportunities such a speed-up could provide, including the ability to use more computationally expensive but better-performing fitting routines to increase the quality and robustness of fits.

[18] oai:arXiv.org:1008.4623 [pdf] - 294777

Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

Fluke, Christopher J.; Barnes, David G.; Barsdell, Benjamin R.; Hassan, Amr H.

Comments: 13 pages, 5 figures, accepted for publication in PASA

Submitted: 2010-08-26

General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.

[19] oai:arXiv.org:1007.1660 [pdf] - 1033610

Analysing Astronomy Algorithms for GPUs and Beyond

Barsdell, Benjamin R.; Barnes, David G.; Fluke, Christopher J.

Comments: 10 pages, 3 figures, accepted for publication in MNRAS

Submitted: 2010-07-09

Astronomy depends on ever increasing computing power. Processor clock-rates have plateaued, and increased performance is now appearing in the form of additional processor cores on a single chip. This poses significant challenges to the astronomy software community. Graphics Processing Units (GPUs), now capable of general-purpose computation, exemplify both the difficult learning-curve and the significant speedups exhibited by massively-parallel hardware architectures. We present a generalised approach to tackling this paradigm shift, based on the analysis of algorithms. We describe a small collection of foundation algorithms relevant to astronomy and explain how they may be used to ease the transition to massively-parallel computing architectures. We demonstrate the effectiveness of our approach by applying it to four well-known astronomy problems: Hogbom CLEAN, inverse ray-shooting for gravitational lensing, pulsar dedispersion and volume rendering. Algorithms with well-defined memory access patterns and high arithmetic intensity stand to receive the greatest performance boost from massively-parallel architectures, while those that involve a significant amount of decision-making may struggle to take advantage of the available processing power.

[20] oai:arXiv.org:1005.5198 [pdf] - 1032807

Computational advances in gravitational microlensing: a comparison of CPU, GPU, and parallel, large data codes

Bate, N. F.; Fluke, C. J.; Barsdell, B. R.; Garsden, H.; Lewis, G. F.

Comments: 11 pages, 4 figures, accepted for publication in New Astronomy

Submitted: 2010-05-27

To assess how future progress in gravitational microlensing computation at high optical depth will rely on both hardware and software solutions, we compare a direct inverse ray-shooting code implemented on a graphics processing unit (GPU) with both a widely-used hierarchical tree code on a single-core CPU, and a recent implementation of a parallel tree code suitable for a CPU-based cluster supercomputer. We examine the accuracy of the tree codes through comparison with a direct code over a much wider range of parameter space than has been feasible before. We demonstrate that all three codes present comparable accuracy, and choice of approach depends on considerations relating to the scale and nature of the microlensing problem under investigation. On current hardware, there is little difference in the processing speed of the single-core CPU tree code and the GPU direct code, however the recent plateau in single-core CPU speeds means the existing tree code is no longer able to take advantage of Moore's law-like increases in processing speed. Instead, we anticipate a rapid increase in GPU capabilities in the next few years, which is advantageous to the direct code. We suggest that progress in other areas of astrophysical computation may benefit from a transition to GPUs through the use of "brute force" algorithms, rather than attempting to port the current best solution directly to a GPU language -- for certain classes of problems, the simple implementation on GPUs may already be no worse than an optimised single-core CPU version.

[21] oai:arXiv.org:1001.2048 [pdf] - 32714

Advanced Architectures for Astrophysical Supercomputing

Barsdell, Benjamin R.; Barnes, David G.; Fluke, Christopher J.

Comments: 4 pages, 1 figure, to appear in the proceedings of ADASS XIX, Oct 4-8 2009, Sapporo, Japan (ASP Conf. Series)

Submitted: 2010-01-12

Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of $O(100\times)$ in general-purpose computation -- performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.

[22] oai:arXiv.org:0905.2453 [pdf] - 24296

Teraflop per second gravitational lensing ray-shooting using graphics processing units

Thompson, Alexander C.; Fluke, Christopher J.; Barnes, David G.; Barsdell, Benjamin R.

Comments: 21 pages, 4 figures, submitted to New Astronomy

Submitted: 2009-05-14

Gravitational lensing calculation using a direct inverse ray-shooting approach is a computationally expensive way to determine magnification maps, caustic patterns, and light-curves (e.g. as a function of source profile and size). However, as an easily parallelisable calculation, gravitational ray-shooting can be accelerated using programmable graphics processing units (GPUs). We present our implementation of inverse ray-shooting for the NVIDIA G80 generation of graphics processors using the NVIDIA Compute Unified Device Architecture (CUDA) software development kit. We also extend our code to multiple-GPU systems, including a 4-GPU NVIDIA S1070 Tesla unit. We achieve sustained processing performance of 182 Gflop/s on a single GPU, and 1.28 Tflop/s using the Tesla unit. We demonstrate that billion-lens microlensing simulations can be run on a single computer with a Tesla unit in timescales of order a day without the use of a hierarchical tree code.