Normalized to: Grigori, L.
[1]
oai:arXiv.org:2002.02833 [pdf] - 2124618
Accelerating linear system solvers for time domain component separation
of cosmic microwave background data
Submitted: 2020-02-07, last modified: 2020-06-01
Component separation is one of the key stages of any modern, cosmic microwave
background (CMB) data analysis pipeline. It is an inherently non-linear
procedure and typically involves a series of sequential solutions of linear
systems with similar, albeit not identical system matrices, derived for
different data models of the same data set. Sequences of this kind arise for
instance in the maximization of the data likelihood with respect to foreground
parameters or sampling of their posterior distribution. However, they are also
common in many other contexts. In this work we consider solving the component
separation problem directly in the measurement (time) domain, which can have a
number of important advantageous over the more standard pixel-based methods, in
particular if non-negligible time-domain noise correlations are present as it
is commonly the case. The time-domain based approach implies, however,
significant computational effort due to the need to manipulate the full volume
of time-domain data set. To address this challenge, we propose and study
efficient solvers adapted to solving time-domain-based, component separation
systems and their sequences and which are capable of capitalizing on
information derived from the previous solutions. This is achieved either via
adapting the initial guess of the subsequent system or through a so-called
subspace recycling, which allows to construct progressively more efficient,
two-level preconditioners. We report an overall speed-up over solving the
systems independently of a factor of nearly 7, or 5, in the worked examples
inspired respectively by the likelihood maximization and likelihood sampling
procedures we consider in this work.
[2]
oai:arXiv.org:1803.03462 [pdf] - 1770546
Solving linear equations with messenger-field and conjugate gradients
techniques - an application to CMB data analysis
Submitted: 2018-03-09, last modified: 2018-10-22
We discuss linear system solvers invoking a messenger-field and compare them
with (preconditioned) conjugate gradients approaches. We show that the
messenger-field techniques correspond to fixed point iterations of an
appropriately preconditioned initial system of linear equations. We then argue
that a conjugate gradient solver applied to the same preconditioned system, or
equivalently a preconditioned conjugate gradient solver using the same
preconditioner and applied to the original system, will in general ensure at
least a comparable and typically better performance in terms of the number of
iterations to convergence and time-to-solution. We illustrate our conclusions
on two common examples drawn from the Cosmic Microwave Background data
analysis: Wiener filtering and map-making. In addition, and contrary to the
standard lore in the CMB field, we show that the performance of the
preconditioned conjugate gradient solver can depend importantly on the starting
vector. This observation seems of particular importance in the cases of
map-making of high signal-to-noise sky maps and therefore should be of
relevance for the next generation of CMB experiments.
[3]
oai:arXiv.org:1408.3048 [pdf] - 907961
Accelerating Cosmic Microwave Background map-making procedure through
preconditioning
Submitted: 2014-08-13, last modified: 2014-12-15
Estimation of the sky signal from sequences of time ordered data is one of
the key steps in Cosmic Microwave Background (CMB) data analysis, commonly
referred to as the map-making problem. Some of the most popular and general
methods proposed for this problem involve solving generalised least squares
(GLS) equations with non-diagonal noise weights given by a block-diagonal
matrix with Toeplitz blocks. In this work we study new map-making solvers
potentially suitable for applications to the largest anticipated data sets.
They are based on iterative conjugate gradient (CG) approaches enhanced with
novel, parallel, two-level preconditioners. We apply the proposed solvers to
examples of simulated non-polarised and polarised CMB observations, and a set
of idealised scanning strategies with sky coverage ranging from nearly a full
sky down to small sky patches. We discuss in detail their implementation for
massively parallel computational platforms and their performance for a broad
range of parameters characterising the simulated data sets. We find that our
best new solver can outperform carefully-optimised standard solvers used today
by a factor of as much as 5 in terms of the convergence rate and a factor of up
to $4$ in terms of the time to solution, and to do so without significantly
increasing the memory consumption and the volume of inter-processor
communication. The performance of the new algorithms is also found to be more
stable and robust, and less dependent on specific characteristics of the
analysed data set. We therefore conclude that the proposed approaches are well
suited to address successfully challenges posed by new and forthcoming CMB data
sets.
[4]
oai:arXiv.org:1106.0159 [pdf] - 646350
Parallel Spherical Harmonic Transforms on heterogeneous architectures
(GPUs/multi-core CPUs)
Submitted: 2011-06-01, last modified: 2013-04-01
Spherical Harmonic Transforms (SHT) are at the heart of many scientific and
practical applications ranging from climate modelling to cosmological
observations. In many of these areas new, cutting-edge science goals have been
recently proposed requiring simulations and analyses of experimental or
observational data at very high resolutions and of unprecedented volumes. Both
these aspects pose formidable challenge for the currently existing
implementations of the transforms.
This paper describes parallel algorithms for computing SHT with two variants
of intra-node parallelism appropriate for novel supercomputer architectures,
multi-core processors and Graphic Processing Units (GPU). It also discusses
their performance, alone and embedded within a top-level, MPI-based
parallelisation layer ported from the S2HAT library, in terms of their
accuracy, overall efficiency and scalability. We show that our inverse SHT run
on GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi")
outperforms the state of the art implementation for a multi-core processor
executed on a current Intel Core i7-2600K. Furthermore, we show that an
MPI/CUDA version of the inverse transform run on a cluster of 128 Nvidia Tesla
S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed
on the same number of quad-core processors Intel Nahalem for problem sizes
motivated by our target applications. Performance of the direct transforms is
however found to be at the best comparable in these cases. We discuss in detail
the algorithmic solutions devised for major steps involved in the transforms
calculation, emphasising those with a major impact on their overall
performance, and elucidates the sources of the dichotomy between the direct and
the inverse operations.
[5]
oai:arXiv.org:1010.1260 [pdf] - 514578
Spherical harmonic transform with GPUs
Submitted: 2010-10-06
We describe an algorithm for computing an inverse spherical harmonic
transform suitable for graphic processing units (GPU). We use CUDA and base our
implementation on a Fortran90 routine included in a publicly available parallel
package, S2HAT. We focus our attention on the two major sequential steps
involved in the transforms computation, retaining the efficient parallel
framework of the original code. We detail optimization techniques used to
enhance the performance of the CUDA-based code and contrast them with those
implemented in the Fortran90 version. We also present performance comparisons
of a single CPU plus GPU unit with the S2HAT code running on either a single or
4 processors. In particular we find that use of the latest generation of GPUs,
such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms
by as much as 18 times with respect to S2HAT executed on one core, and by as
much as 5.5 with respect to S2HAT on 4 cores, with the overall performance
being limited by the Fast Fourier transforms. The work presented here has been
performed in the context of the Cosmic Microwave Background simulations and
analysis. However, we expect that the developed software will be of more
general interest and applicability.