Capturing Provenance of Data Curation at BCO-DMO
Saito, Mak A.
MetadataShow full item record
At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easer for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.
Presented at Data Curation Network, May 15, 2020
Suggested CitationPresentation: Shepherd, Adam, York, Amber, Schloer, Conrad, Kinkade, Danie, Rauch, Shannon, Biddle, Matt, Copley, Nancy, Haskins, Christina, Soenen, Karen, Saito, Mak A., Wiebe, Peter, "Capturing Provenance of Data Curation at BCO-DMO", Presented at Data Curation Network, May 15, 2020, DOI:10.1575/1912/25777, https://hdl.handle.net/1912/25777
The following license files are associated with this item:
Showing items related by title, author, creator and subject.
Shepherd, Adam; York, Amber; Schloer, Conrad; Kinkade, Danie; Rauch, Shannon; Copley, Nancy; Gerlach, Dana; Haskins, Christina; Soenen, Karen; Saito, Mak A.; Wiebe, Peter (Woods Hole Oceanographic Institution, 2020-11-09)At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office ...
Identifying contamination with advanced visualization and analysis practices : metagenomic approaches for eukaryotic genome assemblies Delmont, Tom O.; Eren, A. Murat (PeerJ, 2016-03-29)High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target ...
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A. (BioMed Central, 2016-03-08)Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. ...