Normalized to: Iwasawa, M.
[1]
oai:arXiv.org:2006.16560 [pdf] - 2125009
PeTar: a high-performance N-body code for modeling massive collisional
stellar systems
Submitted: 2020-06-30
The numerical simulations of massive collisional stellar systems, such as
globular clusters (GCs), are very time-consuming. Until now, only a few
realistic million-body simulations of GCs with a small fraction of binaries
(5%) have been performed by using the NBODY6++GPU code. Such models took half a
year computational time on a GPU based super-computer. In this work, we develop
a new N-body code, PeTar, by combining the methods of Barnes-Hut tree, Hermite
integrator and slow-down algorithmic regularization (SDAR). The code can
accurately handle an arbitrary fraction of multiple systems (e.g. binaries,
triples) while keeping a high performance by using the hybrid parallelization
methods with MPI, OpenMP, SIMD instructions and GPU. A few benchmarks indicate
that PeTar and NBODY6++GPU have a very good agreement on the long-term
evolution of the global structure, binary orbits and escapers. On a highly
configured GPU desktop computer, the performance of a million-body simulation
with all stars in binaries by using PeTar is 11 times faster than that of
NBODY6++GPU. Moreover, on the Cray XC50 supercomputer, PeTar well scales when
number of cores increase. The ten million-body problem, which covers the region
of ultra compact dwarfs and nuclearstar clusters, becomes possible to be
solved.
[2]
oai:arXiv.org:1907.02290 [pdf] - 2046222
Accelerated FDPS --- Algorithms to Use Accelerators with FDPS
Submitted: 2019-07-04
In this paper, we describe the algorithms we implemented in FDPS to make
efficient use of accelerator hardware such as GPGPUs. We have developed FDPS to
make it possible for many researchers to develop their own high-performance
parallel particle-based simulation programs without spending large amount of
time for parallelization and performance tuning. The basic idea of FDPS is to
provide a high-performance implementation of parallel algorithms for
particle-based simulations in a "generic" form, so that researchers can define
their own particle data structure and interparticle interaction functions and
supply them to FDPS. FDPS compiled with user-supplied data type and interaction
function provides all necessary functions for parallelization, and using those
functions researchers can write their programs as though they are writing
simple non-parallel program. It has been possible to use accelerators with
FDPS, by writing the interaction function that uses the accelerator. However,
the efficiency was limited by the latency and bandwidth of communication
between the CPU and the accelerator and also by the mismatch between the
available degree of parallelism of the interaction function and that of the
hardware parallelism. We have modified the interface of user-provided
interaction function so that accelerators are more efficiently used. We also
implemented new techniques which reduce the amount of work on the side of CPU
and amount of communication between CPU and accelerators. We have measured the
performance of N-body simulations on a systems with NVIDIA Volta GPGPU using
FDPS and the achieved performance is around 27 \% of the theoretical peak
limit. We have constructed a detailed performance model, and found that the
current implementation can achieve good performance on systems with much
smaller memory and communication bandwidth.
[3]
oai:arXiv.org:1907.02289 [pdf] - 1910783
Implementation and Performance of Barnes-Hut N-body algorithm on
Extreme-scale Heterogeneous Many-core Architectures
Iwasawa, Masaki;
Namekata, Daisuke;
Sakamoto, Ryo;
Nakamura, Takashi;
Kimura, Yasuyuki;
Nitadori, Keigo;
Wang, Long;
Tsubouchi, Miyuki;
Makino, Jun;
Liu, Zhao;
Fu, Haohuan;
Yang, Guangwen
Submitted: 2019-07-04
In this paper, we report the implementation and measured performance of our
extreme-scale global simulation code on Sunway TaihuLight and two PEZY-SC2
systems: Shoubu System B and Gyoukou. The numerical algorithm is the parallel
Barnes-Hut tree algorithm, which has been used in many large-scale
astrophysical particle-based simulations. Our implementation is based on our
FDPS framework. However, the extremely large numbers of cores of the systems
used (10M on TaihuLight and 16M on Gyoukou) and their relatively poor memory
and network bandwidth pose new challenges. We describe the new algorithms
introduced to achieve high efficiency on machines with low memory bandwidth.
The measured performance is 47.9, 10.6 PF, and 1.01PF on TaihuLight, Gyoukou
and Shoubu System B (efficiency 40\%, 23.5\% and 35.5\%). The current code is
developed for the simulation of planetary rings, but most of the new algorithms
are useful for other simulations, and are now available in the FDPS framework.
[4]
oai:arXiv.org:1903.03138 [pdf] - 1868141
A Mean-Field Approach to Simulating the Merging of Collisionless Stellar
Systems Using a Particle-Based Method
Submitted: 2019-03-07
We present a mean-field approach to simulating merging processes of two
spherical collisionless stellar systems. This approach is realized with a
self-consistent field (SCF) method in which the full spatial dependence of the
density and potential of a system is expanded in a set of basis functions for
solving Poisson's equation. In order to apply this SCF method to a merging
situation where two systems are moving in space, we assign the expansion center
to the center of mass of each system, the position of which is followed by a
mass-less particle placed at that position initially. Merging simulations over
a wide range of impact parameters are performed using both an SCF code
developed here and a tree code. The results of each simulation produced by the
two codes show excellent agreement in the evolving morphology of the merging
systems and in the density and velocity dispersion profiles of the merged
systems. However, comparing the results generated by the tree code to those
obtained with the softening-free SCF code, we have found that in large impact
parameter cases, a softening length of the Plummer type introduced in the tree
code has an effect of advancing the orbital phase of the two systems in the
merging process at late times. We demonstrate that the faster orbital phase
originates from the larger convergence length to the pure Newtonian force.
Other application problems suitable to the current SCF code are also discussed.
[5]
oai:arXiv.org:1810.11970 [pdf] - 1774889
PENTACLE: Parallelized Particle-Particle Particle-Tree Code for Planet
Formation
Submitted: 2018-10-29
We have newly developed a Parallelized Particle-Particle Particle-tree code
for Planet formation, PENTACLE, which is a parallelized hybrid $N$-body
integrator executed on a CPU-based (super)computer. PENTACLE uses a 4th-order
Hermite algorithm to calculate gravitational interactions between particles
within a cutoff radius and a Barnes-Hut tree method for gravity from particles
beyond. It also implements an open-source library designed for full automatic
parallelization of particle simulations, FDPS (Framework for Developing
Particle Simulator) to parallelize a Barnes-Hut tree algorithm for a
memory-distributed supercomputer. These allow us to handle $1-10$ million
particles in a high-resolution $N$-body simulation on CPU clusters for
collisional dynamics, including physical collisions in a planetesimal disc. In
this paper, we show the performance and the accuracy of PENTACLE in terms of
$\tilde{R}_{\rm cut}$ and a time-step $\Delta t$. It turns out that the
accuracy of a hybrid $N$-body simulation is controlled through $\Delta t /
\tilde{R}_{\rm cut}$ and $\Delta t / \tilde{R}_{\rm cut} \sim 0.1$ is necessary
to simulate accurately accretion process of a planet for $\geq 10^6$ years. For
all those who interested in large-scale particle simulations, PENTACLE
customized for planet formation will be freely available from
https://github.com/PENTACLE-Team/PENTACLE under the MIT lisence.
[6]
oai:arXiv.org:1804.08935 [pdf] - 1705276
Fortran interface layer of the framework for developing particle
simulator FDPS
Submitted: 2018-04-24, last modified: 2018-04-25
Numerical simulations based on particle methods have been widely used in
various fields including astrophysics. To date, simulation softwares have been
developed by individual researchers or research groups in each field, with a
huge amount of time and effort, even though numerical algorithms used are very
similar. To improve the situation, we have developed a framework, called FDPS,
which enables researchers to easily develop massively parallel particle
simulation codes for arbitrary particle methods. Until version 3.0, FDPS have
provided API only for C++ programing language. This limitation comes from the
fact that FDPS is developed using the template feature in C++, which is
essential to support arbitrary data types of particle. However, there are many
researchers who use Fortran to develop their codes. Thus, the previous versions
of FDPS require such people to invest much time to learn C++. This is
inefficient. To cope with this problem, we newly developed a Fortran interface
layer in FDPS, which provides API for Fortran. In order to support arbitrary
data types of particle in Fortran, we design the Fortran interface layer as
follows. Based on a given derived data type in Fortran representing particle, a
Python script provided by us automatically generates a library that manipulates
the C++ core part of FDPS. This library is seen as a Fortran module providing
API of FDPS from the Fortran side and uses C programs internally to
interoperate Fortran with C++. In this way, we have overcome several technical
issues when emulating `template' in Fortran. By using the Fortran interface,
users can develop all parts of their codes in Fortran. We show that the
overhead of the Fortran interface part is sufficiently small and a code written
in Fortran shows a performance practically identical to the one written in C++.
[7]
oai:arXiv.org:1612.06984 [pdf] - 1580972
Unconvergence of Very Large Scale GI Simulations
Submitted: 2016-12-21
The giant impact (GI) is one of the most important hypotheses both in
planetary science and geoscience, since it is related to the origin of the Moon
and also the initial condition of the Earth. A number of numerical simulations
have been done using the smoothed particle hydrodynamics (SPH) method. However,
GI hypothesis is currently in a crisis. The "canonical" GI scenario failed to
explain the identical isotope ratio between the Earth and the Moon. On the
other hand, little has been known about the reliability of the result of GI
simulations. In this paper, we discuss the effect of the resolution on the
results of the GI simulations by varying the number of particles from $3
\times10^3$ to $10^8$. We found that the results does not converge, but shows
oscillatory behaviour. We discuss the origin of this oscillatory behaviour.
[8]
oai:arXiv.org:1601.03138 [pdf] - 1422207
Implementation and performance of FDPS: A Framework Developing Parallel
Particle Simulation Codes
Submitted: 2016-01-13, last modified: 2016-04-24
We present the basic idea, implementation, measured performance and
performance model of FDPS (Framework for developing particle simulators). FDPS
is an application-development framework which helps the researchers to develop
particle-based simulation programs for large-scale distributed-memory parallel
supercomputers. A particle-based simulation program for distributed-memory
parallel computers needs to perform domain decomposition, redistribution of
particles, and gathering of particle information for interaction calculation.
Also, even if distributed-memory parallel computers are not used, in order to
reduce the amount of computation, algorithms such as Barnes-Hut tree method
should be used for long-range interactions. For short-range interactions, some
methods to limit the calculation to neighbor particles are necessary. FDPS
provides all of these necessary functions for efficient parallel execution of
particle-based simulations as "templates", which are independent of the actual
data structure of particles and the functional form of the interaction. By
using FDPS, researchers can write their programs with the amount of work
necessary to write a simple, sequential and unoptimized program of O(N^2)
calculation cost, and yet the program, once compiled with FDPS, will run
efficiently on large-scale parallel supercomputers. A simple gravitational
N-body program can be written in around 120 lines. We report the actual
performance of these programs and the performance model. The weak scaling
performance is very good, and almost linear speedup was obtained for up to the
full system of K computer. The minimum calculation time per timestep is in the
range of 30 ms (N=10^7) to 300 ms (N=10^9). These are currently limited by the
time for the calculation of the domain decomposition and communication
necessary for the interaction calculation. We discuss how we can overcome these
bottlenecks.
[9]
oai:arXiv.org:1506.04553 [pdf] - 1176045
GPU-Enabled Particle-Particle Particle-Tree Scheme for Simulating Dense
Stellar Cluster System
Submitted: 2015-06-15
We describe the implementation and performance of the ${\rm P^3T}$
(Particle-Particle Particle-Tree) scheme for simulating dense stellar systems.
In ${\rm P^3T}$, the force experienced by a particle is split into short-range
and long-range contributions. Short-range forces are evaluated by direct
summation and integrated with the fourth order Hermite predictor-corrector
method with the block timesteps. For long-range forces, we use a combination of
the Barnes-Hut tree code and the leapfrog integrator. The tree part of our
simulation environment is accelerated using graphical processing units (GPU),
whereas the direct summation is carried out on the host CPU. Our code gives
excellent performance and accuracy for star cluster simulations with a large
number of particles even when the core size of the star cluster is small.
[10]
oai:arXiv.org:1011.4017 [pdf] - 1042028
Eccentric evolution of SMBH binaries
Submitted: 2010-11-17
In recent numerical simulations \citep{matsubayashi07,lockmann08}, it has
been found that the eccentricity of supermassive black hole(SMBH) -
intermediate black hole(IMBH) binaries grows toward unity through interactions
with stellar background. This increase of eccentricity reduces the merging
timescale of the binary through the gravitational radiation to the value well
below the Hubble Time. It also gives the theoretical explanation of the
existence of eccentric binary such as that in OJ287 \citep{lehto96,
valtonen08}. In self-consistent N-body simulations, this increase of
eccentricity is always observed. On the other hand, the result of scattering
experiment between SMBH binaries and field stars \citep{quinlan96} indicated no
increase of eccentricity. This discrepancy leaves the high eccentricity of the
SMBH binaries in $N$-body simulations unexplained. Here we present a
stellar-dynamical mechanism that drives the increase of the eccentricity of an
SMBH binary with large mass ratio. There are two key processes involved. The
first one is the Kozai mechanism under non-axisymmetric potential, which
effectively randomizes the angular momenta of surrounding stars. The other is
the selective ejection of stars with prograde orbits. Through these two
mechanisms, field stars extract the orbital angular momentum of the SMBH
binary. Our proposed mechanism causes the increase in the eccentricity of most
of SMBH binaries, resulting in the rapid merger through gravitational wave
radiation. Our result has given a definite solution to the "last-parsec
problem".
[11]
oai:arXiv.org:1003.4125 [pdf] - 1025850
The origin of S-stars and a young stellar disk: distribution of debris
stars of a sinking star cluster
Submitted: 2010-03-22, last modified: 2010-05-21
Within the distance of 1 pc from the Galactic center (GC), more than 100
young massive stars have been found. The massive stars at 0.1-1 pc from the GC
are located in one or two disks, while those within 0.1 pc from the GC,
S-stars, have an isotropic distribution. How these stars are formed is not well
understood, especially for S-stars. Here we propose that a young star cluster
with an intermediate-mass black hole (IMBH) can form both the disks and
S-stars. We performed a fully self-consistent $N$-body simulation of a star
cluster near the GC. Stars escaped from the tidally disrupted star cluster were
carried to the GC due to an 1:1 mean motion resonance with the IMBH formed in
the cluster. In the final phase of the evolution, the eccentricity of the IMBH
becomes very high. In this phase, stars carried by the 1:1 resonance with the
IMBH were dropped from the resonance and their orbits are randomized by a
chaotic Kozai mechanism. The mass function of these carried stars is extremely
top-heavy within 10". The surface density distributions of young massive stars
has a slope of -1.5 within 10" from the GC. The distribution of stars in the
most central region is isotropic. These characteristics agree well with those
of stars observed within 10" from the GC.
[12]
oai:arXiv.org:0807.2818 [pdf] - 314996
Trojan Stars in the Galactic Center
Submitted: 2008-07-17, last modified: 2009-01-15
We performed, for the first time, the simulation of spiral-in of a star
cluster formed close to the Galactic center (GC) using a fully self-consistent
$N$-body model. In our model, the central super-massive black hole (SMBH) is
surrounded by stars and the star cluster. Not only are the orbits of stars and
the cluster stars integrated self-consistently, but the stellar evolution,
collisions and merging of the cluster stars are also included. We found that an
intermediate-mass black hole (IMBH) is formed in the star cluster and stars
escaped from the cluster are captured into a 1:1 mean motion resonance with the
IMBH. These "Trojan" stars are brought close to the SMBH by the IMBH, which
spirals into the GC due to the dynamical friction. Our results show that, once
the IMBH is formed, it brings the massive stars to the vicinity of the central
SMBH even after the star cluster itself is disrupted. Stars carried by the IMBH
form a disk similar to the observed disks and the core of the cluster including
the IMBH has properties similar to those of IRS13E, which is a compact assembly
of several young stars.
[13]
oai:arXiv.org:0708.3719 [pdf] - 4313
Evolution of Star Clusters near the Galactic Center: Fully
Self-consistent N-body Simulations
Submitted: 2007-08-28, last modified: 2008-07-08
We have performed fully self-consistent $N$-body simulations of star clusters
near the Galactic center (GC). Such simulations have not been performed because
it is difficult to perform fast and accurate simulations of such systems using
conventional methods. We used the Bridge code, which integrates the parent
galaxy using the tree algorithm and the star cluster using the fourth-order
Hermite scheme with individual timestep. The interaction between the parent
galaxy and the star cluster is calculate with the tree algorithm. Therefore,
the Bridge code can handle both the orbital and internal evolutions of star
clusters correctly at the same time. We investigated the evolution of star
clusters using the Bridge code and compared the results with previous studies.
We found that 1) the inspiral timescale of the star clusters is shorter than
that obtained with "traditional" simulations, in which the orbital evolution of
star clusters is calculated analytically using the dynamical friction formula
and 2) the core collapse of the star cluster increases the core density and
help the cluster survive. The initial conditions of star clusters is not so
severe as previously suggested.
[14]
oai:arXiv.org:0801.0859 [pdf] - 8675
Evolution of Massive Blackhole Triples II -- The effect of the BH
triples dynamics on the structure of the galactic nuclear
Submitted: 2008-01-06
In this paper, we investigate the structures of galaxies which either have or
have had three BHs using $N$-body simulations, and compare them with those of
galaxies with binary BHs. We found that the cusp region of a galaxy which have
(or had) triple BHs is significantly larger and less dense than that of a
galaxy with binary BHs of the same mass. Moreover, the size of the cusp region
depends strongly on the evolution history of triple BHs, while in the case of
binary BHs, the size of the cusp is determined by the mass of the BHs. In
galaxies which have (or had) three BHs, there is a region with significant
radial velocity anisotropy, while such a region is not observed in galaxies
with binary BH. These differences come from the fact that with triple BHs the
energy deposit to the central region of the galaxy can be much larger due to
multiple binary-single BH scatterings. Our result suggests that we can
discriminate between galaxies which experienced triple BH interactions with
those which did not, through the observable signatures such as the cusp size
and velocity anisotropy.
[15]
oai:arXiv.org:0706.2059 [pdf] - 1000421
BRIDGE: A Direct-tree Hybrid N-body Algorithm for Fully Self-consistent
Simulations of Star Clusters and their Parent Galaxies
Submitted: 2007-06-14, last modified: 2007-07-27
We developed a new direct-tree hybrid N-body algorithm for fully
self-consistent N-body simulations of star clusters in their parent galaxies.
In such simulations, star clusters need high accuracy, while galaxies need a
fast scheme because of the large number of the particles required to model it.
In our new algorithm, the internal motion of the star cluster is calculated
accurately using the direct Hermite scheme with individual timesteps and all
other motions are calculated using the tree code with second-order leapfrog
integrator. The direct and tree schemes are combined using an extension of the
mixed variable symplectic (MVS) scheme. Thus, the Hamiltonian corresponding to
everything other than the internal motion of the star cluster is integrated
with the leapfrog, which is symplectic. Using this algorithm, we performed
fully self-consistent N-body simulations of star clusters in their parent
galaxy. The internal and orbital evolutions of the star cluster agreed well
with those obtained using the direct scheme. We also performed fully
self-consistent N-body simulation for large-N models ($N=2\times 10^6$). In
this case, the calculation speed was seven times faster than what would be if
the direct scheme was used.
[16]
oai:arXiv.org:astro-ph/0511391 [pdf] - 77749
Evolution of Massive Blackhole Triples I -- Equal-mass binary-single
systems
Submitted: 2005-11-14
We present the result of $N$-body simulations of dynamical evolution of
triple massive blackhole (BH) systems in galactic nuclei. We found that in most
cases two of the three BHs merge through gravitational wave (GW) radiation in
the timescale much shorter than the Hubble time, before ejecting one BH through
a slingshot. In order for a binary BH to merge before ejecting out the third
one, it has to become highly eccentric since the gravitational wave timescale
would be much longer than the Hubble time unless the eccentricity is very high.
We found that two mechanisms drive the increase of the eccentricity of the
binary. One is the strong binary-single BH interaction resulting in the
thermalization of the eccentricity. The second is the Kozai mechanism which
drives the cyclic change of the inclination and eccentricity of the inner
binary of a stable hierarchical triple system. Our result implies that many of
supermassive blackholes are binaries.