Full-text search for arXiv

2 article(s) in total. 20 co-authors. Median position in authors list is 6,0.

[1] oai:arXiv.org:1808.04728 [pdf] - 1782849

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter; Meadows, Lawrence; Arnemann, James; Shao, Lei; He, Siyu; Karna, Tuomas; Moise, Daina; Pennycook, Simon J.; Maschoff, Kristyn; Sewall, Jason; Kumar, Nalini; Ho, Shirley; Ringenburg, Mike; Prabhat; Lee, Victor

Comments: 11 pages, 6 pages, presented at SuperComputing 2018

Submitted: 2018-08-14, last modified: 2018-11-09

Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters $\Omega_M$, $\sigma_8$ and n$_s$ with unprecedented accuracy.

[2] oai:arXiv.org:1503.08809 [pdf] - 1347438

Separable projection integrals for higher-order correlators of the cosmic microwave sky: Acceleration by factors exceeding 100

Briggs, J. P.; Pennycook, S. J.; Fergusson, J. R.; Jäykkä, J.; Shellard, E. P. S.

Comments: Accepted by Journal of Computational Physics

Submitted: 2015-03-30, last modified: 2016-01-26

We present a case study describing efforts to optimise and modernise "Modal", the simulation and analysis pipeline used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum (or three-point correlator) of the cosmic microwave background radiation. We focus on one particular element of the code: the projection of bispectra from the end of inflation to the spherical shell at decoupling, which defines the CMB we observe today. This code involves a three-dimensional inner product between two functions, one of which requires an integral, on a non-rectangular domain containing a sparse grid. We show that by employing separable methods this calculation can be reduced to a one-dimensional summation plus two integrations, reducing the overall dimensionality from four to three. The introduction of separable functions also solves the issue of the non-rectangular sparse grid. This separable method can become unstable in certain cases and so the slower non-separable integral must be calculated instead. We present a discussion of the optimisation of both approaches. We show significant speed-ups of ~100x, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3x and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38x. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.