Normalized to: Groth, P.
[1]
oai:arXiv.org:1401.2134 [pdf] - 1202657
10 Simple Rules for the Care and Feeding of Scientific Data
Goodman, Alyssa;
Pepe, Alberto;
Blocker, Alexander W.;
Borgman, Christine L.;
Cranmer, Kyle;
Crosas, Mercè;
Di Stefano, Rosanne;
Gil, Yolanda;
Groth, Paul;
Hedstrom, Margaret;
Hogg, David W.;
Kashyap, Vinay;
Mahabal, Ashish;
Siemiginowska, Aneta;
Slavkovic, Aleksandra
Submitted: 2014-01-09
This article offers a short guide to the steps scientists can take to ensure
that their data and associated analyses continue to be of value and to be
recognized. In just the past few years, hundreds of scholarly papers and
reports have been written on questions of data sharing, data provenance,
research reproducibility, licensing, attribution, privacy, and more, but our
goal here is not to review that literature. Instead, we present a short guide
intended for researchers who want to know why it is important to "care for and
feed" data, with some practical advice on how to do that.
[2]
oai:arXiv.org:1006.4860 [pdf] - 1033307
The Application of Cloud Computing to the Creation of Image Mosaics and
Management of Their Provenance
Submitted: 2010-06-24
We have used the Montage image mosaic engine to investigate the cost and
performance of processing images on the Amazon EC2 cloud, and to inform the
requirements that higher-level products impose on provenance management
technologies. We will present a detailed comparison of the performance of
Montage on the cloud and on the Abe high performance cluster at the National
Center for Supercomputing Applications (NCSA). Because Montage generates many
intermediate products, we have used it to understand the science requirements
that higher-level products impose on provenance management technologies. We
describe experiments with provenance management technologies such as the
"Provenance Aware Service Oriented Architecture" (PASOA).
[3]
oai:arXiv.org:1005.4457 [pdf] - 170900
Pipeline-Centric Provenance Model
Submitted: 2010-05-24
In this paper we propose a new provenance model which is tailored to a class
of workflow-based applications. We motivate the approach with use cases from
the astronomy community. We generalize the class of applications the approach
is relevant to and propose a pipeline-centric provenance model. Finally, we
evaluate the benefits in terms of storage needed by the approach when applied
to an astronomy application.
[4]
oai:arXiv.org:1005.2643 [pdf] - 166170
Metadata and provenance management
Submitted: 2010-05-14
Scientists today collect, analyze, and generate TeraBytes and PetaBytes of
data. These data are often shared and further processed and analyzed among
collaborators. In order to facilitate sharing and data interpretations, data
need to carry with it metadata about how the data was collected or generated,
and provenance information about how the data was processed. This chapter
describes metadata and provenance in the context of the data lifecycle. It also
gives an overview of the approaches to metadata and provenance management,
followed by examples of how applications use metadata and provenance in their
scientific processes.