The Frictionless Data Package : data containerization for addressing big data challenges [poster]

Thumbnail Image
Date
2018-02-15
Authors
Shepherd, Adam
Fils, Douglas
Kinkade, Danie
Saito, Mak A.
Linked Authors
Alternative Title
As Published
Date Created
Location
DOI
Related Materials
Replaces
Replaced By
Keywords
Frictionless Data
Data management
Data exchange
Data Transport
Distributed data
Data tools
Big data
Abstract
At the Biological and Chemical Oceanography Data Management Office (BCO-DMO) Big Data challenges have been steadily increasing. The sizes of data submissions have grown as instrumentation improves. Complex data types can sometimes be stored across different repositories . This signals a paradigm shift where data and information that is meant to be tightly-coupled and has traditionally been stored under the same roof is now distributed across repositories and data stores. For domain-specific repositories like BCO-DMO, a new mechanism for assembling data, metadata and supporting documentation is needed. Traditionally, data repositories have relied on a human's involvement throughout discovery and access workflows. This human could assess fitness for purpose by reading loosely coupled, unstructured information from web pages and documentation. Distributed storage was something that could be communicated in text that a human could read and understand. However, as machines play larger roles in the process of discovery and access of data, distributed resources must be described and packaged in ways that fit into machine automated workflows of discovery and access for assessing fitness for purpose by the end-user. Once machines have recommended a data resource as relevant to an investigator's needs, the data should be easy to integrate into that investigator's toolkits for analysis and visualization. BCO-DMO is exploring the idea of data containerization, or packaging data and related information for easier transport, interpretation, and use. Data containerization reduces not only the friction data repositories experience trying to describe complex data resources, but also for end-users trying to access data with their own toolkits. In researching the landscape of data containerization, the Frictionlessdata Data Package (http://frictionlessdata.io/) provides a number of valuable advantages over similar solutions. This presentation will focus on these advantages and how the Frictionlessdata Data Package addresses a number of real-world use cases faced for data discovery, access, analysis and visualization in the age of Big Data.
Description
Presented at AGU Ocean Sciences, 11 - 16 February 2018, Portland, OR
Embargo Date
Citation
Cruises
Cruise ID
Cruise DOI
Vessel Name
Except where otherwise noted, this item's license is described as Attribution 4.0 International