Normalized to: Shroyer, B.
[1]
oai:arXiv.org:1112.1710 [pdf] - 967235
Efficient Parallelization for AMR MHD Multiphysics Calculations;
Implementation in AstroBEAR
Submitted: 2011-12-07, last modified: 2013-02-21
Current Adaptive Mesh Refinement (AMR) simulations require algorithms that
are highly parallelized and manage memory efficiently. As compute engines grow
larger, AMR simulations will require algorithms that achieve new levels of
efficient parallelization and memory management. We have attempted to employ
new techniques to achieve both of these goals. Patch or grid based AMR often
employs ghost cells to decouple the hyperbolic advances of each grid on a given
refinement level. This decoupling allows each grid to be advanced
independently. In AstroBEAR we utilize this independence by threading the grid
advances on each level with preference going to the finer level grids. This
allows for global load balancing instead of level by level load balancing and
allows for greater parallelization across both physical space and AMR level.
Threading of level advances can also improve performance by interleaving
communication with computation, especially in deep simulations with many levels
of refinement. While we see improvements of up to 30% on deep simulations run
on a few cores, the speedup is typically more modest (5-20%) for larger scale
simulations. To improve memory management we have employed a distributed tree
algorithm that requires processors to only store and communicate local sections
of the AMR tree structure with neighboring processors. Using this distributed
approach we are able to get reasonable scaling efficiency (> 80%) out to 12288
cores and up to 8 levels of AMR - independent of the use of threading.
[2]
oai:arXiv.org:1110.1616 [pdf] - 422694
Efficient Parallelization for AMR MHD Multiphysics Calculations;
Implementation in AstroBEAR
Submitted: 2011-10-07
Current AMR simulations require algorithms that are highly parallelized and
manage memory efficiently. As compute engines grow larger, AMR simulations will
require algorithms that achieve new levels of efficient parallelization and
memory management. We have attempted to employ new techniques to achieve both
of these goals. Patch or grid based AMR often employs ghost cells to decouple
the hyperbolic advances of each grid on a given refinement level. This
decoupling allows each grid to be advanced independently. In AstroBEAR we
utilize this independence by threading the grid advances on each level with
preference going to the finer level grids. This allows for global load
balancing instead of level by level load balancing and allows for greater
parallelization across both physical space and AMR level. Threading of level
advances can also improve performance by interleaving communication with
computation, especially in deep simulations with many levels of refinement. To
improve memory management we have employed a distributed tree algorithm that
requires processors to only store and communicate local sections of the AMR
tree structure with neighboring processors.