Normalized to: Pennycook, S.
[1]
oai:arXiv.org:1808.04728 [pdf] - 1782849
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Mathuriya, Amrita;
Bard, Deborah;
Mendygral, Peter;
Meadows, Lawrence;
Arnemann, James;
Shao, Lei;
He, Siyu;
Karna, Tuomas;
Moise, Daina;
Pennycook, Simon J.;
Maschoff, Kristyn;
Sewall, Jason;
Kumar, Nalini;
Ho, Shirley;
Ringenburg, Mike;
Prabhat;
Lee, Victor
Submitted: 2018-08-14, last modified: 2018-11-09
Deep learning is a promising tool to determine the physical model that
describes our universe. To handle the considerable computational cost of this
problem, we present CosmoFlow: a highly scalable deep learning application
built on top of the TensorFlow framework. CosmoFlow uses efficient
implementations of 3D convolution and pooling primitives, together with
improvements in threading for many element-wise operations, to improve training
performance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PE
Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate
fully synchronous data-parallel training on 8192 nodes of Cori with 77%
parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our
knowledge, this is the first large-scale science application of the TensorFlow
framework at supercomputer scale with fully-synchronous training. These
enhancements enable us to process large 3D dark matter distribution and predict
the cosmological parameters $\Omega_M$, $\sigma_8$ and n$_s$ with unprecedented
accuracy.
[2]
oai:arXiv.org:1503.08809 [pdf] - 1347438
Separable projection integrals for higher-order correlators of the
cosmic microwave sky: Acceleration by factors exceeding 100
Submitted: 2015-03-30, last modified: 2016-01-26
We present a case study describing efforts to optimise and modernise "Modal",
the simulation and analysis pipeline used by the Planck satellite experiment
for constraining general non-Gaussian models of the early universe via the
bispectrum (or three-point correlator) of the cosmic microwave background
radiation. We focus on one particular element of the code: the projection of
bispectra from the end of inflation to the spherical shell at decoupling, which
defines the CMB we observe today. This code involves a three-dimensional inner
product between two functions, one of which requires an integral, on a
non-rectangular domain containing a sparse grid. We show that by employing
separable methods this calculation can be reduced to a one-dimensional
summation plus two integrations, reducing the overall dimensionality from four
to three. The introduction of separable functions also solves the issue of the
non-rectangular sparse grid. This separable method can become unstable in
certain cases and so the slower non-separable integral must be calculated
instead. We present a discussion of the optimisation of both approaches. We
show significant speed-ups of ~100x, arising from a combination of algorithmic
improvements and architecture-aware optimisations targeted at improving thread
and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of
executing on clusters containing processors and/or coprocessors, with
strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single
coprocessor outperforms two processor sockets by a factor of 1.3x and that
running the same code across a combination of both microarchitectures improves
performance-per-node by a factor of 3.38x. By making bispectrum calculations
competitive with those for the power spectrum (or two-point correlator) we are
now able to consider joint analysis for cosmological science exploitation of
new data.