Normalized to: Marais, P.
[1]
oai:arXiv.org:1501.07719 [pdf] - 1219164
Montblanc: GPU accelerated Radio Interferometer Measurement Equations in
support of Bayesian Inference for Radio Observations
Submitted: 2015-01-30, last modified: 2015-06-19
We present Montblanc, a GPU implementation of the Radio interferometer
measurement equation (RIME) in support of the Bayesian inference for radio
observations (BIRO) technique. BIRO uses Bayesian inference to select sky
models that best match the visibilities observed by a radio interferometer. To
accomplish this, BIRO evaluates the RIME multiple times, varying sky model
parameters to produce multiple model visibilities. Chi-squared values computed
from the model and observed visibilities are used as likelihood values to drive
the Bayesian sampling process and select the best sky model.
As most of the elements of the RIME and chi-squared calculation are
independent of one another, they are highly amenable to parallel computation.
Additionally, Montblanc caters for iterative RIME evaluation to produce
multiple chi-squared values. Modified model parameters are transferred to the
GPU between each iteration.
We implemented Montblanc as a Python package based upon NVIDIA's CUDA
architecture. As such, it is easy to extend and implement different pipelines.
At present, Montblanc supports point and Gaussian morphologies, but is designed
for easy addition of new source profiles.
Montblanc's RIME implementation is performant: On an NVIDIA K40, it is
approximately 250 times faster than MeqTrees on a dual hexacore Intel E5-2620v2
CPU. Compared to the OSKAR simulator's GPU-implemented RIME components it is
7.7 and 12 times faster on the same K40 for single and double-precision
floating point respectively. However, OSKAR's RIME implementation is more
general than Montblanc's BIRO-tailored RIME.
Theoretical analysis of Montblanc's dominant CUDA kernel suggests that it is
memory bound. In practice, profiling shows that is balanced between compute and
memory, as much of the data required by the problem is retained in L1 and L2
cache.