Normalized to: Cavelan, A.
[1]
oai:arXiv.org:1911.06714 [pdf] - 2000312
Two-level Dynamic Load Balancing for High Performance Scientific
Applications
Submitted: 2019-11-15
Scientific applications are often complex, irregular, and
computationally-intensive. To accommodate the ever-increasing computational
demands of scientific applications, high-performance computing (HPC) systems
have become larger and more complex, offering parallelism at multiple levels
(e.g., nodes, cores per node, threads per core). Scientific applications need
to exploit all the available multilevel hardware parallelism to harness the
available computational power. The performance of applications executing on
such HPC systems may adversely be affected by load imbalance at multiple
levels, caused by problem, algorithmic, and systemic characteristics.
Nevertheless, most existing load balancing methods do not simultaneously
address load imbalance at multiple levels. This work investigates the impact of
load imbalance on the performance of three scientific applications at the
thread and process levels. We jointly apply and evaluate selected dynamic loop
self-scheduling (DLS) techniques to both levels. Specifically, we employ the
extended LaPeSD OpenMP runtime library at the thread level and extend the
DLS4LB MPI-based dynamic load balancing library at the process level. This
approach is generic and applicable to any multiprocess-multithreaded
computationally-intensive application (programmed using MPI and OpenMP). We
conduct an exhaustive set of experiments to assess and compare six DLS
techniques at the thread level and eleven at the process level. The results
show that improved application performance, by up to 21%, can only be achieved
by jointly addressing load imbalance at the two levels. We offer insights into
the performance of the selected DLS techniques and discuss the interplay of
load balancing at the thread level and process level.