The Frictionless Data Package : data containerization for addressing big data challenges [poster]
Saito, Mak A.
MetadataShow full item record
KeywordFrictionless Data; Data management; Data exchange; Data Transport; Distributed data; Data tools; Big data
At the Biological and Chemical Oceanography Data Management Office (BCO-DMO) Big Data challenges have been steadily increasing. The sizes of data submissions have grown as instrumentation improves. Complex data types can sometimes be stored across different repositories . This signals a paradigm shift where data and information that is meant to be tightly-coupled and has traditionally been stored under the same roof is now distributed across repositories and data stores. For domain-specific repositories like BCO-DMO, a new mechanism for assembling data, metadata and supporting documentation is needed. Traditionally, data repositories have relied on a human's involvement throughout discovery and access workflows. This human could assess fitness for purpose by reading loosely coupled, unstructured information from web pages and documentation. Distributed storage was something that could be communicated in text that a human could read and understand. However, as machines play larger roles in the process of discovery and access of data, distributed resources must be described and packaged in ways that fit into machine automated workflows of discovery and access for assessing fitness for purpose by the end-user. Once machines have recommended a data resource as relevant to an investigator's needs, the data should be easy to integrate into that investigator's toolkits for analysis and visualization. BCO-DMO is exploring the idea of data containerization, or packaging data and related information for easier transport, interpretation, and use. Data containerization reduces not only the friction data repositories experience trying to describe complex data resources, but also for end-users trying to access data with their own toolkits. In researching the landscape of data containerization, the Frictionlessdata Data Package (http://frictionlessdata.io/) provides a number of valuable advantages over similar solutions. This presentation will focus on these advantages and how the Frictionlessdata Data Package addresses a number of real-world use cases faced for data discovery, access, analysis and visualization in the age of Big Data.
Presented at AGU Ocean Sciences, 11 - 16 February 2018, Portland, OR
Suggested CitationPresentation: Shepherd, Adam, Fils, Douglas, Kinkade, Danie, Saito, Mak A., "The Frictionless Data Package : data containerization for addressing big data challenges [poster]", Presented at AGU Ocean Sciences, 11 - 16 February 2018, Portland, OR, https://hdl.handle.net/1912/9577
The following license files are associated with this item:
Showing items related by title, author, creator and subject.
Shepherd, Adam; Rauch, Shannon; Schloer, Conrad; Kinkade, Danie; Biddle, Matt; Copley, Nancy; Saito, Mak A.; Wiebe, Peter; York, Amber (2018-12-14)Data repositories often transform submissions to improve understanding and reuse of data by researchers other than the original submitter. However, scientific workflows built by the data submitters often depend on the ...
Shepherd, Adam; Schloer, Conrad; York, Amber; Kinkade, Danie (2018-10-10)At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management ...