Normalized to: Abe, G.
[1]
oai:arXiv.org:astro-ph/0606105 [pdf] - 82545
High-Performance Small-Scale Simulation of Star Clusters Evolution on
Cray XD1
Submitted: 2006-06-06, last modified: 2006-06-07
In this paper, we describe the performance of an $N$-body simulation of star
cluster with 64k stars on a Cray XD1 system with 400 dual-core Opteron
processors. A number of astrophysical $N$-body simulations were reported in
SCxy conferences. All previous entries for Gordon-Bell prizes used at least
700k particles. The reason for this preference of large numbers of particles is
the parallel efficiency. It is very difficult to achieve high performance on
large parallel machines, if the number of particles is small. However, for many
scientifically important problems the calculation cost scales as $O(N^{3.3})$,
and it is very important to use large machines for relatively small number of
particles. We achieved 2.03 Tflops, or 57.7% of the theoretical peak
performance, using a direct $O(N^2)$ calculation with the individual timestep
algorithm, on 64k particles. The best efficiency previously reported on similar
calculation with 64K or smaller number of particles is 12% (9 Gflops) on Cray
T3E-600 with 128 processors. Our implementation is based on highly scalable
two-dimensional parallelization scheme, and low-latency communication network
of Cray XD1 turned out to be essential to achieve this level of performance.