Capturing Provenance of Data Curation at BCO-DMO
Date
2020-11-09Author
Shepherd, Adam
Concept link
York, Amber
Concept link
Schloer, Conrad
Concept link
Kinkade, Danie
Concept link
Rauch, Shannon
Concept link
Copley, Nancy
Concept link
Gerlach, Dana
Concept link
Haskins, Christina
Concept link
Soenen, Karen
Concept link
Saito, Mak A.
Concept link
Wiebe, Peter
Concept link
Metadata
Show full item recordCitable URI
https://hdl.handle.net/1912/26373DOI
10.1575/1912/26373Abstract
At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easier for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.
Description
Presented at USGS Data Management Working Group, 9, November 2020
Collections
Suggested Citation
Presentation: Shepherd, Adam, York, Amber, Schloer, Conrad, Kinkade, Danie, Rauch, Shannon, Copley, Nancy, Gerlach, Dana, Haskins, Christina, Soenen, Karen, Saito, Mak A., Wiebe, Peter, "Capturing Provenance of Data Curation at BCO-DMO", Presented at USGS Data Management Working Group, 9, November 2020, DOI:10.1575/1912/26373, https://hdl.handle.net/1912/26373The following license files are associated with this item:
Related items
Showing items related by title, author, creator and subject.
-
Capturing Provenance of Data Curation at BCO-DMO
Shepherd, Adam; York, Amber; Schloer, Conrad; Kinkade, Danie; Rauch, Shannon; Biddle, Matt; Copley, Nancy; Haskins, Christina; Soenen, Karen; Saito, Mak A.; Wiebe, Peter (Woods Hole Oceanographic Institution, 2020-05-15)At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office ... -
Recovering complete and draft population genomes from metagenome datasets
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A. (BioMed Central, 2016-03-08)Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. ... -
The Role Played by NaFIRRI Library and Data Centre in Managing Scientific Fisheries Data From Ugandan Waters
Endra, Alice (IAMSLIC, 2015)National Fisheries Resources Research Institute (NaFIRRI) is one of the Public Agricultural Research Institutes under Uganda‘s National Agricultural Research Organisation (NARO). National Fisheries Resources Research ...