Normalized to: Juve, G.
[1]
oai:arXiv.org:1312.6723 [pdf] - 763725
Creating A Galactic Plane Atlas With Amazon Web Services
Submitted: 2013-12-23
This paper describes by example how astronomers can use cloud-computing
resources offered by Amazon Web Services (AWS) to create new datasets at scale.
We have created from existing surveys an atlas of the Galactic Plane at 16
wavelengths from 1 {\mu}m to 24 {\mu}m with pixels co-registered at spatial
sampling of 1 arcsec. We explain how open source tools support management and
operation of a virtual cluster on AWS platforms to process data at scale, and
describe the technical issues that users will need to consider, such as
optimization of resources, resource costs, and management of virtual machine
instances.
[2]
oai:arXiv.org:1211.4055 [pdf] - 591793
A Tale Of 160 Scientists, Three Applications, A Workshop and A Cloud
Submitted: 2012-11-16
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan
Workshops, thematic meetings aimed at introducing researchers to the latest
tools and methodologies in exoplanet research. The theme of the Summer 2012
workshop, held from July 23 to July 27 at Caltech, was to explore the use of
exoplanet light curves to study planetary system architectures and atmospheres.
A major part of the workshop was to use hands-on sessions to instruct attendees
in the use of three open source tools for the analysis of light curves,
especially from the Kepler mission. Each hands-on session involved the 160
attendees using their laptops to follow step-by-step tutorials given by
experts. We describe how we used the Amazon Elastic Cloud 2 to run these
applications.
[3]
oai:arXiv.org:1010.4813 [pdf] - 955510
The Application of Cloud Computing to Astronomy: A Study of Cost and
Performance
Submitted: 2010-10-22
Cloud computing is a powerful new technology that is widely used in the
business world. Recently, we have been investigating the benefits it offers to
scientific computing. We have used three workflow applications to compare the
performance of processing data on the Amazon EC2 cloud with the performance on
the Abe high-performance cluster at the National Center for Supercomputing
Applications (NCSA). We show that the Amazon EC2 cloud offers better
performance and value for processor- and memory-limited applications than for
I/O-bound applications. We provide an example of how the cloud is well suited
to the generation of a science product: an atlas of periodograms for the
210,000 light curves released by the NASA Kepler Mission. This atlas will
support the identification of periodic signals, including those due to
transiting exoplanets, in the Kepler data sets.
[4]
oai:arXiv.org:1010.4822 [pdf] - 955511
Data Sharing Options for Scientific Workflows on Amazon EC2
Submitted: 2010-10-22
Efficient data management is a key component in achieving good performance
for scientific workflows in distributed environments. Workflow applications
typically communicate data between tasks using files. When tasks are
distributed, these files are either transferred from one computational node to
another, or accessed through a shared storage system. In grids and clusters,
workflow data is often stored on network and parallel file systems. In this
paper we investigate some of the ways in which data can be managed for
workflows in the cloud. We ran experiments using three typical workflow
applications on Amazon's EC2. We discuss the various storage and file systems
we used, describe the issues and problems we encountered deploying them on EC2,
and analyze the resulting performance and cost of the workflows.
[5]
oai:arXiv.org:1006.4860 [pdf] - 1033307
The Application of Cloud Computing to the Creation of Image Mosaics and
Management of Their Provenance
Submitted: 2010-06-24
We have used the Montage image mosaic engine to investigate the cost and
performance of processing images on the Amazon EC2 cloud, and to inform the
requirements that higher-level products impose on provenance management
technologies. We will present a detailed comparison of the performance of
Montage on the cloud and on the Abe high performance cluster at the National
Center for Supercomputing Applications (NCSA). Because Montage generates many
intermediate products, we have used it to understand the science requirements
that higher-level products impose on provenance management technologies. We
describe experiments with provenance management technologies such as the
"Provenance Aware Service Oriented Architecture" (PASOA).
[6]
oai:arXiv.org:1005.4457 [pdf] - 170900
Pipeline-Centric Provenance Model
Submitted: 2010-05-24
In this paper we propose a new provenance model which is tailored to a class
of workflow-based applications. We motivate the approach with use cases from
the astronomy community. We generalize the class of applications the approach
is relevant to and propose a pipeline-centric provenance model. Finally, we
evaluate the benefits in terms of storage needed by the approach when applied
to an astronomy application.
[7]
oai:arXiv.org:1005.2718 [pdf] - 1513445
Scientific Workflow Applications on Amazon EC2
Submitted: 2010-05-15
The proliferation of commercial cloud computing providers has generated
significant interest in the scientific computing community. Much recent
research has attempted to determine the benefits and drawbacks of cloud
computing for scientific applications. Although clouds have many attractive
features, such as virtualization, on-demand provisioning, and "pay as you go"
usage-based pricing, it is not clear whether they are able to deliver the
performance required for scientific applications at a reasonable price. In this
paper we examine the performance and cost of clouds from the perspective of
scientific workflow applications. We use three characteristic workflows to
compare the performance of a commercial cloud with that of a typical HPC
system, and we analyze the various costs associated with running those
workflows in the cloud. We find that the performance of clouds is not
unreasonable given the hardware resources provided, and that performance
comparable to HPC systems can be achieved given similar resources. We also find
that the cost of running workflows on a commercial cloud can be reduced by
storing data in the cloud rather than transferring it from outside.