Normalized to: Groen, D.
[1]
oai:arXiv.org:1507.01138 [pdf] - 1241861
From Thread to Transcontinental Computer: Disturbing Lessons in
Distributed Supercomputing
Submitted: 2015-07-04
We describe the political and technical complications encountered during the
astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation
of large scale structure in the universe. The simulations are challenging due
to the enormous dynamic range in spatial and temporal coordinates, as well as
the enormous computer resources required. In CosmoGrid we dealt with the
computational requirements by connecting up to four supercomputers via an
optical network and make them operate as a single machine. This was
challenging, if only for the fact that the supercomputers of our choice are
separated by half the planet, as three of them are located scattered across
Europe and fourth one is in Tokyo. The co-scheduling of multiple computers and
the 'gridification' of the code enabled us to achieve an efficiency of up to
$93\%$ for this distributed intercontinental supercomputer. In this work, we
find that high-performance computing on a grid can be done much more
effectively if the sites involved are willing to be flexible about their user
policies, and that having facilities to provide such flexibility could be key
to strengthening the position of the HPC community in an increasingly
Cloud-dominated computing landscape. Given that smaller computer clusters owned
by research groups or university departments usually have flexible user
policies, we argue that it could be easier to instead realize distributed
supercomputing by combining tens, hundreds or even thousands of these
resources.
[2]
oai:arXiv.org:1101.2020 [pdf] - 648806
The Cosmogrid Simulation: Statistical Properties of Small Dark Matter
Halos
Ishiyama, Tomoaki;
Rieder, Steven;
Makino, Junichiro;
Zwart, Simon Portegies;
Groen, Derek;
Nitadori, Keigo;
de Laat, Cees;
McMillan, Stephen;
Hiraki, Kei;
Harfst, Stefan
Submitted: 2011-01-10, last modified: 2013-04-08
We present the results of the "Cosmogrid" cosmological N-body simulation
suites based on the concordance LCDM model. The Cosmogrid simulation was
performed in a 30Mpc box with 2048^3 particles. The mass of each particle is
1.28x10^5 Msun, which is sufficient to resolve ultra-faint dwarfs. We found
that the halo mass function shows good agreement with the Sheth & Tormen
fitting function down to ~10^7 Msun. We have analyzed the spherically averaged
density profiles of the three most massive halos which are of galaxy group size
and contain at least 170 million particles. The slopes of these density
profiles become shallower than -1 at the inner most radius. We also find a
clear correlation of halo concentration with mass. The mass dependence of the
concentration parameter cannot be expressed by a single power law, however a
simple model based on the Press-Schechter theory proposed by Navarro et al.
gives reasonable agreement with this dependence. The spin parameter does not
show a correlation with the halo mass. The probability distribution functions
for both concentration and spin are well fitted by the log-normal distribution
for halos with the masses larger than ~10^8 Msun. The subhalo abundance depends
on the halo mass. Galaxy-sized halos have 50% more subhalos than ~10^{11} Msun
halos have.
[3]
oai:arXiv.org:1109.5559 [pdf] - 416352
High performance cosmological simulations on a grid of supercomputers
Submitted: 2011-09-26
We present results from our cosmological N-body simulation which consisted of
2048x2048x2048 particles and ran distributed across three supercomputers
throughout Europe. The run, which was performed as the concluding phase of the
Gravitational Billion Body Problem DEISA project, integrated a 30 Mpc box of
dark matter using an optimized Tree/Particle Mesh N-body integrator. We ran the
simulation up to the present day (z=0), and obtained an efficiency of about
0.93 over 2048 cores compared to a single supercomputer run. In addition, we
share our experiences on using multiple supercomputers for high performance
computing and provide several recommendations for future projects.
[4]
oai:arXiv.org:1101.0605 [pdf] - 294823
High Performance Gravitational N-body Simulations on a Planet-wide
Distributed Supercomputer
Submitted: 2011-01-03
We report on the performance of our cold-dark matter cosmological N-body
simulation which was carried out concurrently using supercomputers across the
globe. We ran simulations on 60 to 750 cores distributed over a variety of
supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia),
Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network
latency of 0.32 seconds and the communication over 30.000 km of optical network
cable we are able to achieve about 87% of the performance compared to an equal
number of cores on a single supercomputer. We argue that using widely
distributed supercomputers in order to acquire more compute power is
technically feasible, and that the largest obstacle is introduced by local
scheduling and reservation policies.
[5]
oai:arXiv.org:1001.0773 [pdf] - 1019013
Simulating the universe on an intercontinental grid of supercomputers
Zwart, Simon Portegies;
Ishiyama, Tomoaki;
Groen, Derek;
Nitadori, Keigo;
Makino, Junichiro;
de Laat, Cees;
McMillan, Stephen;
Hiraki, Kei;
Harfst, Stefan;
Grosso, Paola
Submitted: 2010-01-05
Understanding the universe is hampered by the elusiveness of its most common
constituent, cold dark matter. Almost impossible to observe, dark matter can be
studied effectively by means of simulation and there is probably no other
research field where simulation has led to so much progress in the last decade.
Cosmological N-body simulations are an essential tool for evolving density
perturbations in the nonlinear regime. Simulating the formation of large-scale
structures in the universe, however, is still a challenge due to the enormous
dynamic range in spatial and temporal coordinates, and due to the enormous
computer resources required. The dynamic range is generally dealt with by the
hybridization of numerical techniques. We deal with the computational
requirements by connecting two supercomputers via an optical network and make
them operate as a single machine. This is challenging, if only for the fact
that the supercomputers of our choice are separated by half the planet, as one
is located in Amsterdam and the other is in Tokyo. The co-scheduling of the two
computers and the 'gridification' of the code enables us to achieve a 90%
efficiency for this distributed intercontinental supercomputer.
[6]
oai:arXiv.org:0907.4036 [pdf] - 416183
The Living Application: a Self-Organising System for Complex Grid Tasks
Submitted: 2009-07-23
We present the living application, a method to autonomously manage
applications on the grid. During its execution on the grid, the living
application makes choices on the resources to use in order to complete its
tasks. These choices can be based on the internal state, or on autonomously
acquired knowledge from external sensors. By giving limited user capabilities
to a living application, the living application is able to port itself from one
resource topology to another. The application performs these actions at
run-time without depending on users or external workflow tools. We demonstrate
this new concept in a special case of a living application: the living
simulation. Today, many simulations require a wide range of numerical solvers
and run most efficiently if specialized nodes are matched to the solvers. The
idea of the living simulation is that it decides itself which grid machines to
use based on the numerical solver currently in use. In this paper we apply the
living simulation to modelling the collision between two galaxies in a test
setup with two specialized computers. This simulation switces at run-time
between a GPU-enabled computer in the Netherlands and a GRAPE-enabled machine
that resides in the United States, using an oct-tree N-body code whenever it
runs in the Netherlands and a direct N-body solver in the United States.
[7]
oai:arXiv.org:0807.1996 [pdf] - 14464
A multiphysics and multiscale software environment for modeling
astrophysical systems
Submitted: 2008-07-12, last modified: 2008-11-01
We present MUSE, a software framework for combining existing computational
tools for different astrophysical domains into a single multiphysics,
multiscale application. MUSE facilitates the coupling of existing codes written
in different languages by providing inter-language tools and by specifying an
interface between each module and the framework that represents a balance
between generality and computational efficiency. This approach allows
scientists to use combinations of codes to solve highly-coupled problems
without the need to write new codes for other domains or significantly alter
their existing codes. MUSE currently incorporates the domains of stellar
dynamics, stellar evolution and stellar hydrodynamics for studying generalized
stellar systems. We have now reached a "Noah's Ark" milestone, with (at least)
two available numerical solvers for each domain. MUSE can treat multi-scale and
multi-physics systems in which the time- and size-scales are well separated,
like simulating the evolution of planetary systems, small stellar associations,
dense stellar clusters, galaxies and galactic nuclei.
In this paper we describe three examples calculated using MUSE: the merger of
two galaxies, the merger of two evolving stars, and a hybrid N-body simulation.
In addition, we demonstrate an implementation of MUSE on a distributed computer
which may also include special-purpose hardware, such as GRAPEs or GPUs, to
accelerate computations. The current MUSE code base is publicly available as
open source at http://muse.li
[8]
oai:arXiv.org:0709.4552 [pdf] - 5456
Distributed N-body Simulation on the Grid Using Dedicated Hardware
Submitted: 2007-09-28, last modified: 2007-11-05
We present performance measurements of direct gravitational N -body
simulation on the grid, with and without specialized (GRAPE-6) hardware. Our
inter-continental virtual organization consists of three sites, one in Tokyo,
one in Philadelphia and one in Amsterdam. We run simulations with up to 196608
particles for a variety of topologies. In many cases, high performance
simulations over the entire planet are dominated by network bandwidth rather
than latency. With this global grid of GRAPEs our calculation time remains
dominated by communication over the entire range of N, which was limited due to
the use of three sites. Increasing the number of particles will result in a
more efficient execution. Based on these timings we construct and calibrate a
model to predict the performance of our simulation on any grid infrastructure
with or without GRAPE. We apply this model to predict the simulation
performance on the Netherlands DAS-3 wide area computer. Equipping the DAS-3
with GRAPE-6Af hardware would achieve break-even between calculation and
communication at a few million particles, resulting in a compute time of just
over ten hours for 1 N -body time unit. Key words: high-performance computing,
grid, N-body simulation, performance modelling
[9]
oai:arXiv.org:0711.0643 [pdf] - 6709
A parallel gravitational N-body kernel
Submitted: 2007-11-05
We describe source code level parallelization for the {\tt kira} direct
gravitational $N$-body integrator, the workhorse of the {\tt starlab}
production environment for simulating dense stellar systems. The
parallelization strategy, called ``j-parallelization'', involves the partition
of the computational domain by distributing all particles in the system among
the available processors. Partial forces on the particles to be advanced are
calculated in parallel by their parent processors, and are then summed in a
final global operation. Once total forces are obtained, the computing elements
proceed to the computation of their particle trajectories. We report the
results of timing measurements on four different parallel computers, and
compare them with theoretical predictions. The computers employ either a
high-speed interconnect, a NUMA architecture to minimize the communication
overhead or are distributed in a grid. The code scales well in the domain
tested, which ranges from 1024 - 65536 stars on 1 - 128 processors, providing
satisfactory speedup. Running the production environment on a grid becomes
inefficient for more than 60 processors distributed across three sites.