Normalized to: Belleman, R.
[1]
oai:arXiv.org:0707.0438 [pdf] - 2763
High Performance Direct Gravitational N-body Simulations on Graphics
Processing Units -- II: An implementation in CUDA
Submitted: 2007-07-03, last modified: 2007-07-16
We present the results of gravitational direct $N$-body simulations using the
Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed
for gaming computers. The force evaluation of the $N$-body problem is
implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to
speed-up the calculations. We tested the implementation on three different
$N$-body codes: two direct $N$-body integration codes, using the 4th order
predictor-corrector Hermite integrator with block time-steps, and one
Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The
integration of the equations of motions for all codes is performed on the host
CPU.
We find that for $N > 512$ particles the GPU outperforms the GRAPE-6Af, if
some softening in the force calculation is accepted. Without softening and for
very small integration time steps the GRAPE still outperforms the GPU. We
conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special
purpose hardware. Using the same time-step criterion, the total energy of the
$N$-body system was conserved better than to one in $10^6$ on the GPU, only
about an order of magnitude worse than obtained with GRAPE-6Af. For $N \apgt
10^5$ the 8800GTX outperforms the host CPU by a factor of about 100 and runs at
about the same speed as the GRAPE-6Af.
[2]
oai:arXiv.org:astro-ph/0702058 [pdf] - 89112
High Performance Direct Gravitational N-body Simulations on Graphics
Processing Unit I: An implementation in Cg
Submitted: 2007-02-02, last modified: 2007-04-23
We present the results of gravitational direct $N$-body simulations using the
commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce
8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The
force evaluation of the $N$-body problem was implemented in Cg using the GPU
directly to speed-up the calculations. The integration of the equations of
motions were, running on the host computer, implemented in C using the 4th
order predictor-corrector Hermite integrator with block time steps. We find
that for a large number of particles ($N \apgt 10^4$) modern graphics
processing units offer an attractive low cost alternative to GRAPE special
purpose hardware. A modern GPU continues to give a relatively flat scaling with
the number of particles, comparable to that of the GRAPE. The GRAPE is designed
to reach double precision, whereas the GPU is intrinsically single-precision.
For relatively large time steps, the total energy of the N-body system was
conserved better than to one in $10^6$ on the GPU, which is impressive given
the single-precision nature of the GPU. For the same time steps, the GRAPE gave
somewhat more accurate results, by about an order of magnitude. However,
smaller time steps allowed more energy accuracy on the grape, around
$10^{-11}$, whereas for the GPU machine precision saturates around $10^{-6}$
For $N\apgt 10^6$ the GeForce 8800GTX was about 20 times faster than the host
computer. Though still about a factor of a few slower than GRAPE, modern GPUs
outperform GRAPE in their low cost, long mean time between failure and the much
larger onboard memory; the GRAPE-6Af holds at most 256k particles whereas the
GeForce 8800GTX can hold 9 million particles in memory.