Normalized to: Ivkin, N.
[1]
oai:arXiv.org:1912.11432 [pdf] - 2020737
Six Dimensional Streaming Algorithm for Cluster Finding in N-Body
Simulations
Submitted: 2019-12-24
Cosmological N-body simulations are crucial for understanding how the
Universe evolves. Studying large-scale distributions of matter in these
simulations and comparing them to observations usually involves detecting dense
clusters of particles called "halos,'' which are gravitationally bound and
expected to form galaxies. However, traditional cluster finders are
computationally expensive and use massive amounts of memory. Recent work by Liu
et al (Liu et al. (2015)) showed the connection between cluster detection and
memory-efficient streaming algorithms and presented a halo finder based on
heavy hitter algorithm. Later, Ivkin et al. (Ivkin et al. (2018)) improved the
scalability of suggested streaming halo finder with efficient GPU
implementation. Both works map particles' positions onto a discrete grid, and
therefore lose the rest of the information, such as their velocities.
Therefore, two halos travelling through each other are indistinguishable in
positional space, while the velocity distribution of those halos can help to
identify this process which is worth further studying. In this project we
analyze data from the Millennium Simulation Project (Springel et al. (2005)) to
motivate the inclusion of the velocity into streaming method we introduce. We
then demonstrate a use of suggested method, which allows one to find the same
halos as before, while also detecting those which were indistinguishable in
prior methods.
[2]
oai:arXiv.org:1711.00975 [pdf] - 1677254
Scalable Streaming Tools for Analyzing $N$-body Simulations: Finding
Halos and Investigating Excursion Sets in One Pass
Submitted: 2017-11-02, last modified: 2018-04-28
Cosmological $N$-body simulations play a vital role in studying models for
the evolution of the Universe. To compare to observations and make a scientific
inference, statistic analysis on large simulation datasets, e.g., finding
halos, obtaining multi-point correlation functions, is crucial. However,
traditional in-memory methods for these tasks do not scale to the datasets that
are forbiddingly large in modern simulations. Our prior paper proposes
memory-efficient streaming algorithms that can find the largest halos in a
simulation with up to $10^9$ particles on a small server or desktop. However,
this approach fails when directly scaling to larger datasets. This paper
presents a robust streaming tool that leverages state-of-the-art techniques on
GPU boosting, sampling, and parallel I/O, to significantly improve performance
and scalability. Our rigorous analysis of the sketch parameters improves the
previous results from finding the centers of the $10^3$ largest halos to $\sim
10^4-10^5$, and reveals the trade-offs between memory, running time and number
of halos. Our experiments show that our tool can scale to datasets with up to
$\sim 10^{12}$ particles while using less than an hour of running time on a
single GPU Nvidia GTX 1080.