Capturing Provenance of Data Curation at BCO-DMO

Thumbnail Image
Date
2020-05-15
Authors
Shepherd, Adam
York, Amber
Schloer, Conrad
Kinkade, Danie
Rauch, Shannon
Biddle, Matt
Copley, Nancy
Haskins, Christina
Soenen, Karen
Saito, Mak A.
Wiebe, Peter
Linked Authors
Alternative Title
As Published
Date Created
Location
DOI
10.1575/1912/25777
Replaced By
Keywords
Data Curation
Provenance
Workflows
Frictionless Data
Data management
Data repository
Abstract
At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easer for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.
Description
Presented at Data Curation Network, May 15, 2020
Embargo Date
Citation
Cruises
Cruise ID
Cruise DOI
Vessel Name
Except where otherwise noted, this item's license is described as Attribution 4.0 International