Fils
Douglas
Fils
Douglas
No Thumbnail Available
Search Results
Now showing
1 - 6 of 6
-
ArticleKnowledge graphs to support real‐time flood impact evaluation(Association for the Advancement of Artificial Intelligence, 2022-03-31) Johnson, J. Michael ; Narock, Tom ; Singh-Mohudpur, Justin ; Fils, Douglas ; Clarke, Keith C. ; Saksena, Siddharth ; Shepherd, Adam ; Arumugam, Sankar ; Yeghiazarian, LilitA digital map of the built environment is useful for a range of economic, emergency response, and urban planning exercises such as helping find places in app driven interfaces, helping emergency managers know what locations might be impacted by a flood or fire, and helping city planners proactively identify vulnerabilities and plan for how a city is growing. Since its inception in 2004, OpenStreetMap (OSM) sets the benchmark for open geospatial data and has become a key player in the public, research, and corporate realms. Following the foundations laid by OSM, several open geospatial products describing the built environment have blossomed including the Microsoft USA building footprint layer and the OpenAddress project. Each of these products use different data collection methods ranging from public contributions to artificial intelligence, and if taken together, could provide a comprehensive description of the built environment. Yet, these projects are still siloed, and their variety makes integration and interoperability a major challenge. Here, we document an approach for merging data from these three major open building datasets and outline a workflow that is scalable to the continental United States (CONUS). We show how the results can be structured as a knowledge graph over which machine learning models are built. These models can help propagate and complete unknown quantities that can then be leveraged in disaster management.
-
PresentationThe Frictionless Data Package : data containerization for addressing big data challenges [poster]( 2018-02-15) Shepherd, Adam ; Fils, Douglas ; Kinkade, Danie ; Saito, Mak A.At the Biological and Chemical Oceanography Data Management Office (BCO-DMO) Big Data challenges have been steadily increasing. The sizes of data submissions have grown as instrumentation improves. Complex data types can sometimes be stored across different repositories . This signals a paradigm shift where data and information that is meant to be tightly-coupled and has traditionally been stored under the same roof is now distributed across repositories and data stores. For domain-specific repositories like BCO-DMO, a new mechanism for assembling data, metadata and supporting documentation is needed. Traditionally, data repositories have relied on a human's involvement throughout discovery and access workflows. This human could assess fitness for purpose by reading loosely coupled, unstructured information from web pages and documentation. Distributed storage was something that could be communicated in text that a human could read and understand. However, as machines play larger roles in the process of discovery and access of data, distributed resources must be described and packaged in ways that fit into machine automated workflows of discovery and access for assessing fitness for purpose by the end-user. Once machines have recommended a data resource as relevant to an investigator's needs, the data should be easy to integrate into that investigator's toolkits for analysis and visualization. BCO-DMO is exploring the idea of data containerization, or packaging data and related information for easier transport, interpretation, and use. Data containerization reduces not only the friction data repositories experience trying to describe complex data resources, but also for end-users trying to access data with their own toolkits. In researching the landscape of data containerization, the Frictionlessdata Data Package (http://frictionlessdata.io/) provides a number of valuable advantages over similar solutions. This presentation will focus on these advantages and how the Frictionlessdata Data Package addresses a number of real-world use cases faced for data discovery, access, analysis and visualization in the age of Big Data.
-
PresentationThe advantages of machine aided co-reference resolution for research cruise metadata( 2017-05-31) Shepherd, Adam ; Chandler, Cynthia L. ; Arko, Robert A. ; Fils, Douglas ; Kinkade, DanieOne of the central incentives of deploying linked open data is the opportunity to leverage the linkages between source datasets to retrieve related information. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) reaps these benefits by linking its cruise-level metadata to the Rolling Deck to Repository (R2R) – the trusted, authoritative source for cruises undertaken by the U.S. academic research fleet. Even though the process of identifying a link between these two repositories is easy for a human, this talk will explore the advantages of using a machine-aided process to suggest links to R2R cruises to a BCO-DMO data manager.
-
PresentationThe Frictionless Data Package : data containerization for automated scientific workflows [poster]( 2017-12-13) Shepherd, Adam ; Fils, Douglas ; Kinkade, Danie ; Saito, Mak A.As cross-disciplinary geoscience research increasingly relies on machines to discover and access data, one of the critical questions facing data repositories is how data and supporting materials should be packaged for consumption. Traditionally, data repositories have relied on a human's involvement throughout discovery and access workflows. This human could assess fitness for purpose by reading loosely coupled, unstructured information from web pages and documentation. In attempts to shorten the time to science and access data resources across may disciplines, expectations for machines to mediate the process of discovery and access is challenging data repository infrastructure. This challenge is to find ways to deliver data and information in ways that enable machines to make better decisions by enabling them to understand the data and metadata of many data types. Additionally, once machines have recommended a data resource as relevant to an investigator's needs, the data resource should be easy to integrate into that investigator's toolkits for analysis and visualization. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) supports NSF-funded OCE and PLR investigators with their project's data management needs. These needs involve a number of varying data types some of which require multiple files with differing formats. Presently, BCO-DMO has described these data types and the important relationships between the type's data files through human-readable documentation on web pages. For machines directly accessing data files from BCO-DMO, this documentation could be overlooked and lead to misinterpreting the data. Instead, BCO-DMO is exploring the idea of data containerization, or packaging data and related information for easier transport, interpretation, and use. In researching the landscape of data containerization, the Frictionlessdata Data Package (http://frictionlessdata.io/) provides a number of valuable advantages over similar solutions. This presentation will focus on these advantages and how the Frictionlessdata Data Package addresses a number of real-world use cases faced for data discovery, access, analysis and visualization.
-
DatasetGeoLink Triple Store Data( 2018-01-31) Arko, Robert A. ; Carbotte, Suzanne M. ; Chandler, Cynthia ; Cheatham, Michelle ; Fils, Douglas ; Hitzler, Pascal ; Hu, Yingjie ; Janowicz, Kyzysztof ; Ji, Peng ; Jones, Matt ; Krisnadhi, Adila ; Lehnert, Kerstin A. ; Mecum, Bryce ; Mickle, Audrey ; Narock, Tom ; Raymond, Lisa ; Schildhauer, Mark ; Shepherd, Adam ; Wiebe, Peter H.A growing collection of standard protocols, formats, and vocabularies, often characterized as the Semantic Web, offers a powerful approach for publishing research data online. The GeoLink project brings together experts from the geosciences, computer science, and library science in an effort to develop Semantic Web components that support discovery and reuse of data and knowledge. GeoLink's participating repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology. One of the outcomes of this project is a network of Linked Data published by participating repositories using those ODPs, and tools to facilitate discovery of related content in multiple repositories. This item will be versioned periodically as the data is re-harvested and processed. The live dataset is currently available for query at http://data.geolink.org/sparql. A demo data application is available at http://demo.geolink.org/.
-
PresentationLeveraging the GeoLink Knowledge Base for Cruise Information(Federation of Earth Science Information Partners, 2016-12-21) Mickle, Audrey ; Fils, Douglas ; Shepherd, AdamOpen Linked Data (LOD) is providing an excellent opportunity for repositories, libraries, and archives to expand the use of their holdings and advance the work of researchers. The implementation of the GeoLink Knowledgebase has created an exciting LOD framework for organizations specializing in Earth Sciences. As an NSF EarthCube Building Block, GeoLink brings together several powerful data sources, such as BCO-DMO, Rolling Deck to Repository (R2R), Data One, IEDA, IODP, and LTER, with publication providers such as the MBLWHOI Library’s Woods Hole Open Access Server (WHOAS), ESIP, and AGU. While publishing to the GeoLink knowledgebase offers a great way to make collections and metadata more findable and relevant, becoming a linked data publisher is not the only way to engage with linked data or the GeoLink project. Any repository can use simple, easily customizable code developed by members of the GeoLink team to add live GeoLink content to a page based on the item's metadata, leveraging GeoLink’s powerful framework for searching across repositories, organizations, and disciplines.