Full-text search for arXiv

Grigori, Laura

Normalized to: Grigori, L.

5 article(s) in total. 7 co-authors, from 1 to 5 common article(s). Median position in authors list is 2,0.

[1] oai:arXiv.org:2002.02833 [pdf] - 2124618

Accelerating linear system solvers for time domain component separation of cosmic microwave background data

Comments:

Submitted: 2020-02-07, last modified: 2020-06-01

Component separation is one of the key stages of any modern, cosmic microwave background (CMB) data analysis pipeline. It is an inherently non-linear procedure and typically involves a series of sequential solutions of linear systems with similar, albeit not identical system matrices, derived for different data models of the same data set. Sequences of this kind arise for instance in the maximization of the data likelihood with respect to foreground parameters or sampling of their posterior distribution. However, they are also common in many other contexts. In this work we consider solving the component separation problem directly in the measurement (time) domain, which can have a number of important advantageous over the more standard pixel-based methods, in particular if non-negligible time-domain noise correlations are present as it is commonly the case. The time-domain based approach implies, however, significant computational effort due to the need to manipulate the full volume of time-domain data set. To address this challenge, we propose and study efficient solvers adapted to solving time-domain-based, component separation systems and their sequences and which are capable of capitalizing on information derived from the previous solutions. This is achieved either via adapting the initial guess of the subsequent system or through a so-called subspace recycling, which allows to construct progressively more efficient, two-level preconditioners. We report an overall speed-up over solving the systems independently of a factor of nearly 7, or 5, in the worked examples inspired respectively by the likelihood maximization and likelihood sampling procedures we consider in this work.

[2] oai:arXiv.org:1803.03462 [pdf] - 1770546

Solving linear equations with messenger-field and conjugate gradients techniques - an application to CMB data analysis

Papez, J.; Grigori, L.; Stompor, R.

Comments:

Submitted: 2018-03-09, last modified: 2018-10-22

We discuss linear system solvers invoking a messenger-field and compare them with (preconditioned) conjugate gradients approaches. We show that the messenger-field techniques correspond to fixed point iterations of an appropriately preconditioned initial system of linear equations. We then argue that a conjugate gradient solver applied to the same preconditioned system, or equivalently a preconditioned conjugate gradient solver using the same preconditioner and applied to the original system, will in general ensure at least a comparable and typically better performance in terms of the number of iterations to convergence and time-to-solution. We illustrate our conclusions on two common examples drawn from the Cosmic Microwave Background data analysis: Wiener filtering and map-making. In addition, and contrary to the standard lore in the CMB field, we show that the performance of the preconditioned conjugate gradient solver can depend importantly on the starting vector. This observation seems of particular importance in the cases of map-making of high signal-to-noise sky maps and therefore should be of relevance for the next generation of CMB experiments.

[3] oai:arXiv.org:1408.3048 [pdf] - 907961

Accelerating Cosmic Microwave Background map-making procedure through preconditioning

Szydlarski, Mikolaj; Grigori, Laura; Stompor, Radek

Comments: 19 pages // Final version submitted to A&A

Submitted: 2014-08-13, last modified: 2014-12-15

Estimation of the sky signal from sequences of time ordered data is one of the key steps in Cosmic Microwave Background (CMB) data analysis, commonly referred to as the map-making problem. Some of the most popular and general methods proposed for this problem involve solving generalised least squares (GLS) equations with non-diagonal noise weights given by a block-diagonal matrix with Toeplitz blocks. In this work we study new map-making solvers potentially suitable for applications to the largest anticipated data sets. They are based on iterative conjugate gradient (CG) approaches enhanced with novel, parallel, two-level preconditioners. We apply the proposed solvers to examples of simulated non-polarised and polarised CMB observations, and a set of idealised scanning strategies with sky coverage ranging from nearly a full sky down to small sky patches. We discuss in detail their implementation for massively parallel computational platforms and their performance for a broad range of parameters characterising the simulated data sets. We find that our best new solver can outperform carefully-optimised standard solvers used today by a factor of as much as 5 in terms of the convergence rate and a factor of up to $4$ in terms of the time to solution, and to do so without significantly increasing the memory consumption and the volume of inter-processor communication. The performance of the new algorithms is also found to be more stable and robust, and less dependent on specific characteristics of the analysed data set. We therefore conclude that the proposed approaches are well suited to address successfully challenges posed by new and forthcoming CMB data sets.

[4] oai:arXiv.org:1106.0159 [pdf] - 646350

Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)

Szydlarski, Mikolaj; Esterie, Pierre; Falcou, Joel; Grigori, Laura; Stompor, R.

Comments:

Submitted: 2011-06-01, last modified: 2013-04-01

Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes parallel algorithms for computing SHT with two variants of intra-node parallelism appropriate for novel supercomputer architectures, multi-core processors and Graphic Processing Units (GPU). It also discusses their performance, alone and embedded within a top-level, MPI-based parallelisation layer ported from the S2HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run on GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi") outperforms the state of the art implementation for a multi-core processor executed on a current Intel Core i7-2600K. Furthermore, we show that an MPI/CUDA version of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed on the same number of quad-core processors Intel Nahalem for problem sizes motivated by our target applications. Performance of the direct transforms is however found to be at the best comparable in these cases. We discuss in detail the algorithmic solutions devised for major steps involved in the transforms calculation, emphasising those with a major impact on their overall performance, and elucidates the sources of the dichotomy between the direct and the inverse operations.

[5] oai:arXiv.org:1010.1260 [pdf] - 514578

Spherical harmonic transform with GPUs

Hupca, Ioan O.; Falcou, Joel; Grigori, Laura; Stompor, Radek

Comments:

Submitted: 2010-10-06

We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original code. We detail optimization techniques used to enhance the performance of the CUDA-based code and contrast them with those implemented in the Fortran90 version. We also present performance comparisons of a single CPU plus GPU unit with the S2HAT code running on either a single or 4 processors. In particular we find that use of the latest generation of GPUs, such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms by as much as 18 times with respect to S2HAT executed on one core, and by as much as 5.5 with respect to S2HAT on 4 cores, with the overall performance being limited by the Fast Fourier transforms. The work presented here has been performed in the context of the Cosmic Microwave Background simulations and analysis. However, we expect that the developed software will be of more general interest and applicability.