Full-text search for arXiv

Groth, Paul

Normalized to: Groth, P.

4 article(s) in total. 22 co-authors, from 1 to 3 common article(s). Median position in authors list is 4,0.

[1] oai:arXiv.org:1401.2134 [pdf] - 1202657

10 Simple Rules for the Care and Feeding of Scientific Data

Goodman, Alyssa; Pepe, Alberto; Blocker, Alexander W.; Borgman, Christine L.; Cranmer, Kyle; Crosas, Mercè; Di Stefano, Rosanne; Gil, Yolanda; Groth, Paul; Hedstrom, Margaret; Hogg, David W.; Kashyap, Vinay; Mahabal, Ashish; Siemiginowska, Aneta; Slavkovic, Aleksandra

Comments: Accepted in PLOS Computational Biology. This paper was written collaboratively, on the web, in the open, using Authorea. The living version of this article, which includes sources and history, is available at http://www.authorea.com/3410/

Submitted: 2014-01-09

This article offers a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more, but our goal here is not to review that literature. Instead, we present a short guide intended for researchers who want to know why it is important to "care for and feed" data, with some practical advice on how to do that.

[2] oai:arXiv.org:1006.4860 [pdf] - 1033307

The Application of Cloud Computing to the Creation of Image Mosaics and Management of Their Provenance

Berriman, G. Bruce; Deelman, Ewa; Groth, Paul; Juve, Gideon

Comments: 15 pages, 3 figure

Submitted: 2010-06-24

We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Applications (NCSA). Because Montage generates many intermediate products, we have used it to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with provenance management technologies such as the "Provenance Aware Service Oriented Architecture" (PASOA).

[3] oai:arXiv.org:1005.4457 [pdf] - 170900

Pipeline-Centric Provenance Model

Groth, Paul; Deelman, Ewa; Juve, Gideon; Mehta, Gaurang; Berriman, Bruce

Comments: 9 pages, 4 figures

Submitted: 2010-05-24

In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.

[4] oai:arXiv.org:1005.2643 [pdf] - 166170

Metadata and provenance management

Deelman, Ewa; Berriman, Bruce; Chervenak, Ann; Corcho, Oscar; Groth, Paul; Moreau, Luc

Comments:

Submitted: 2010-05-14

Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes.