Informatics solutions for large ocean optics datasets
MetadataShow full item record
Lack of observations that span the wide range of critical space and time scales continues to limit many aspects of oceanography. As ocean observatories and observing networks mature, the role for optical technologies and approaches in helping to overcome this limitation continues to grow. As a result the quantity and complexity of data produced is increasing at a pace that threatens to overwhelm the capacity of individual researchers who must cope with large high-resolution datasets, complex, multi-stage analyses, and the challenges of preserving sufficient metadata and provenance information to ensure reproducibility and avoid costly reprocessing or data loss. We have developed approaches to address these new challenges in the context of a case study involving very large numbers (~1 billion) of images collected at coastal observatories by Imaging FlowCytobot, an automated submersible flow cytometer that produces high resolution images of plankton and other microscopic particles at rates up to 10 Hz for months to years. By developing partnerships amongst oceanographers generating and using such data and computer scientists focused on improving science outcomes, we have prototyped a replicable system. It provides simple and ubiquitous access to observational data and products via web services in standard formats; accelerates image processing by enabling algorithms developed with desktop applications to be rapidly deployed and evaluated on shared, high-performance servers; and improves data integrity by replacing error-prone manual data management processes with generalized, automated services. The informatics system is currently in operation for multiple Imaging FlowCytobot datasets and being tested with other types of ocean imagery.
Ocean Optics XXI, Glasgow, Scotland October 8-12 2012