Normalized to: Dale, K.
[1]
oai:arXiv.org:1003.5575 [pdf] - 1025993
Enabling a High Throughput Real Time Data Pipeline for a Large Radio
Telescope Array with GPUs
Submitted: 2010-03-29, last modified: 2010-06-14
The Murchison Widefield Array (MWA) is a next-generation radio telescope
currently under construction in the remote Western Australia Outback. Raw data
will be generated continuously at 5GiB/s, grouped into 8s cadences. This high
throughput motivates the development of on-site, real time processing and
reduction in preference to archiving, transport and off-line processing. Each
batch of 8s data must be completely reduced before the next batch arrives.
Maintaining real time operation will require a sustained performance of around
2.5TFLOP/s (including convolutions, FFTs, interpolations and matrix
multiplications). We describe a scalable heterogeneous computing pipeline
implementation, exploiting both the high computing density and FLOP-per-Watt
ratio of modern GPUs. The architecture is highly parallel within and across
nodes, with all major processing elements performed by GPUs. Necessary
scatter-gather operations along the pipeline are loosely synchronized between
the nodes hosting the GPUs. The MWA will be a frontier scientific instrument
and a pathfinder for planned peta- and exascale facilities.
[2]
oai:arXiv.org:0902.0915 [pdf] - 21063
GPUs for data processing in the MWA
Submitted: 2009-02-05
The MWA is a next-generation radio interferometer under construction in
remote Western Australia. The data rate from the correlator makes storing the
raw data infeasible, so the data must be processed in real-time. The processing
task is of order ~10 TFLOPS. The remote location of the MWA limits the power
that can be allocated to computing. We describe the design and implementation
of elements of the MWA real-time data processing system which leverage the
computing abilities of modern graphics processing units (GPUs). The matrix
algebra and texture mapping capabilities of GPUs are well suited to the
majority of tasks involved in real-time calibration and imaging. Considerable
performance advantages over a conventional CPU-based reference implementation
are obtained.