Normalized to: Plaat, A.
[1]
oai:arXiv.org:1906.11516 [pdf] - 1907642
Scalability Model for the LOFAR Direction Independent Pipeline
Submitted: 2019-06-27
LOFAR is a leading aperture synthesis telescope operated in the Netherlands
with stations across Europe. The LOFAR Two-meter Sky Survey (LoTSS) will
produce more than 3000 14 TB data sets, mapping the entire northern sky at low
frequencies. The data produced by this survey is important for understanding
the formation and evolution of galaxies, supermassive black holes and other
astronomical phenomena. All of the LoTSS data needs to be processed by the
LOFAR Direction Independent (DI) pipeline, prefactor. Understanding the
performance of this pipeline is important when trying to optimize the
throughput for large projects, such as LoTSS and other deep surveys. Making a
model of its completion time will enable us to predict the time taken to
process large data sets, optimize our parameter choices, help schedule other
LOFAR processing services, and predict processing time for future large radio
telescopes. We tested the prefactor pipeline by scaling several parameters,
notably number of CPUs, data size and size of calibration sky model. We present
these results as a comprehensive model which will be used to predict processing
time for a wide range of processing parameters. We also discover that smaller
calibration models lead to significantly faster calibration times, while the
calibration results do not significantly degrade in quality. Finally, we
validate the model and compare predictions with production runs from the past
six months, quantifying the performance penalties incurred by processing on a
shared cluster. We conclude by noting the utility of the results and model for
the LoTSS Survey, LOFAR as a whole and for other telescopes.
[2]
oai:arXiv.org:1808.10735 [pdf] - 1742204
Fast and Reproducible LOFAR Workflows with AGLOW
Submitted: 2018-08-31
The LOFAR radio telescope creates Petabytes of data per year. This data is
important for many scientific projects. The data needs to be efficiently
processed within the timespan of these projects in order to maximize the
scientific impact. We present a workflow orchestration system that integrates
LOFAR processing with a distributed computing platform. The system is named
Automated Grid-enabled LOFAR Workflows (AGLOW). AGLOW makes it fast and easy to
develop, test and deploy complex LOFAR workflows, and to accelerate them on a
distributed cluster architecture. AGLOW provides a significant reduction in
time for setting up complex workflows: typically, from months to days. We lay
out two case studies that process the data from the LOFAR Surveys Key Science
Project. We have implemented these into the AGLOW environment. We also describe
the capabilities of AGLOW, paving the way for use by other LOFAR science cases.
In the future, AGLOW will automatically produce multiple science products from
a single dataset, serving several of the LOFAR Key Science Projects.
[3]
oai:arXiv.org:1807.05733 [pdf] - 1716272
Pipeline Collector: gathering performance data for distributed
astronomical pipelines
Submitted: 2018-07-16
Modern astronomical data processing requires complex software pipelines to
process ever growing datasets. For radio astronomy, these pipelines have become
so large that they need to be distributed across a computational cluster. This
makes it difficult to monitor the performance of each pipeline step. To gain
insight into the performance of each step, a performance monitoring utility
needs to be integrated with the pipeline execution. In this work we have
developed such a utility and integrated it with the calibration pipeline of the
Low Frequency Array, LOFAR, a leading radio telescope. We tested the tool by
running the pipeline on several different compute platforms and collected the
performance data. Based on this data, we make well informed recommendations on
future hardware and software upgrades. The aim of these upgrades is to
accelerate the slowest processing steps for this LOFAR pipeline. The pipeline
collector suite is open source and will be incorporated in future LOFAR
pipelines to create a performance database for all LOFAR processing.
[4]
oai:arXiv.org:1712.00312 [pdf] - 1663241
An Automated Scalable Framework for Distributing Radio Astronomy
Processing Across Clusters and Clouds
Submitted: 2017-12-01, last modified: 2018-04-10
The Low Frequency Array (LOFAR) radio telescope is an international aperture
synthesis radio telescope used to study the Universe at low frequencies. One of
the goals of the LOFAR telescope is to conduct deep wide-field surveys. Here we
will discuss a framework for the processing of the LOFAR Two Meter Sky Survey
(LoTSS). This survey will produce close to 50 PB of data within five years.
These data rates require processing at locations with high-speed access to the
archived data. To complete the LoTSS project, the processing software needs to
be made portable and moved to clusters with a high bandwidth connection to the
data archive. This work presents a framework that makes the LOFAR software
portable, and is used to scale out LOFAR data reduction. Previous work was
successful in preprocessing LOFAR data on a cluster of isolated nodes. This
framework builds upon it and and is currently operational. It is designed to be
portable, scalable, automated and general. This paper describes its design and
high level operation and the initial results processing LoTSS data.