Normalized to: Sterling, T.
[1]
oai:arXiv.org:astro-ph/9710212 [pdf] - 98971
Smooth Particle Hydrodynamics: Models, Applications, and Enabling
Technologies
Submitted: 1997-10-20
We present the results from a two-day study in which we discussed various
implementations of Smooth Particle Hydrodynamics (SPH), one of the leading
methods used across a variety of areas of large-scale astrophysical
simulations. In particular, we evaluated the suitability of designing special
hardware extensions, to further boost the performance of the high-end general
purpose computers currently used for those simulations. We considered a range
of hybrid architectures, consisting of a mix of custom LSI and reconfigurable
logic, combining the extremely high throughput of Special-Purpose Devices
(SPDs) with the flexibility of reconfigurable structures, based on Field
Programmable Gate Arrays (FPGAs). The main findings of our workshop consist of
a clarification of the decomposition of the computational requirements,
together with specific estimates for cost/performance improvements that can be
obtained at each stage in this decomposition, by using enabling hardware
technology to accelerate the performance of general purpose computers.
[2]
oai:arXiv.org:astro-ph/9704183 [pdf] - 97145
GRAPE-6: A Petaflops Prototype
Submitted: 1997-04-17
We present the outline of a research project aimed at designing and
constructing a hybrid computing system that can be easily scaled up to
petaflops speeds. As a first step, we envision building a prototype which will
consist of three main components: a general-purpose, programmable front end, a
special-purpose, fully hardwired computing engine, and a multi-purpose,
reconfigurable system. The driving application will be a suite of
particle-based large-scale simulations in various areas of physics. The
prototype system will achieve performance in the $\sim 50 - 100$ teraflops
range for a broad class of applications in this area. The combination of a
hardwired petaflops-class computational engine and a front end with sustained
speed on the order of 10 gigaflops can produce extremely high performance, but
only for the limited class of problems in which there exists a single
bottleneck with computing cost dominating the total. While the calculation for
which the Grape-4 (our system's immediate predecessor) was designed is a prime
example of such a problem, in many other applications the primary computational
bottleneck, while still related to an inverse-square (gravitational, Coulomb,
etc.) force, requires less than 99% of the computing power. Although the
remainder of the CPU time is typically dominated by just one secondary
bottleneck, its nature varies greatly from problem to problem. It is not
cost-effective to attempt to design custom chips for each new problem that
arises. FPGA-based systems can restore the balance, guaranteeing scalability
from the teraflops to the petaflops domain, while still retaining significant
flexibility. (abbreviated abstract)