Labeling poststorm coastal imagery for machine learning: measurement of interrater agreement

Thumbnail Image
Date
2021-09-03
Authors
Goldstein, Evan B.
Buscombe, Daniel
Lazarus, Eli
Mohanty, Somya D.
Rafique, Shah Nafis
Anarde, Katherine A.
Ashton, Andrew D.
Beuzen, Tomas
Castagno, Katherine
Cohn, Nicholas
Conlin, Matthew P.
Ellenson, Ashley
Gillen, Megan N.
Hovenga, Paige A.
Over, Jin-Si
Palermo, Rose V.
Ratliff, Katherine M.
Reeves, Ian R. B.
Sanborn, Lily H.
Straub, Jessamin A.
Taylor, Luke A.
Wallace, Elizabeth J.
Warrick, Jonathan
Wernette, Phillipe
Williams, Hannah E.
Alternative Title
Date Created
Location
DOI
10.1029/2021EA001896
Related Materials
Replaces
Replaced By
Keywords
Data labeling
Classification
Hurricane impacts
Machine learning
Imagery
Data annotation
Abstract
Classifying images using supervised machine learning (ML) relies on labeled training data—classes or text descriptions, for example, associated with each image. Data-driven models are only as good as the data used for training, and this points to the importance of high-quality labeled data for developing a ML model that has predictive skill. Labeling data is typically a time-consuming, manual process. Here, we investigate the process of labeling data, with a specific focus on coastal aerial imagery captured in the wake of hurricanes that affected the Atlantic and Gulf Coasts of the United States. The imagery data set is a rich observational record of storm impacts and coastal change, but the imagery requires labeling to render that information accessible. We created an online interface that served labelers a stream of images and a fixed set of questions. A total of 1,600 images were labeled by at least two or as many as seven coastal scientists. We used the resulting data set to investigate interrater agreement: the extent to which labelers labeled each image similarly. Interrater agreement scores, assessed with percent agreement and Krippendorff's alpha, are higher when the questions posed to labelers are relatively simple, when the labelers are provided with a user manual, and when images are smaller. Experiments in interrater agreement point toward the benefit of multiple labelers for understanding the uncertainty in labeling data for machine learning research.
Description
© The Author(s), 2021. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Goldstein, E. B., Buscombe, D., Lazarus, E. D., Mohanty, S. D., Rafique, S. N., Anarde, K. A., Ashton, A. D., Beuzen, T., Castagno, K. A., Cohn, N., Conlin, M. P., Ellenson, A., Gillen, M., Hovenga, P. A., Over, J.-S. R., Palermo, R., Ratliff, K. M., Reeves, I. R. B., Sanborn, L. H., Straub, J. A., Taylor, L. A., Wallace E. J., Warrick, J., Wernette, P., Williams, H. E. Labeling poststorm coastal imagery for machine learning: measurement of interrater agreement. Earth and Space Science, 8(9), (2021): e2021EA001896, https://doi.org/10.1029/2021EA001896.
Embargo Date
Citation
Goldstein, E. B., Buscombe, D., Lazarus, E. D., Mohanty, S. D., Rafique, S. N., Anarde, K. A., Ashton, A. D., Beuzen, T., Castagno, K. A., Cohn, N., Conlin, M. P., Ellenson, A., Gillen, M., Hovenga, P. A., Over, J.-S. R., Palermo, R., Ratliff, K. M., Reeves, I. R. B., Sanborn, L. H., Straub, J. A., Taylor, L. A., Wallace E. J., Warrick, J., Wernette, P., Williams, H. E. (2021). Labeling poststorm coastal imagery for machine learning: measurement of interrater agreement. Earth and Space Science, 8(9), e2021EA001896.
Cruises
Cruise ID
Cruise DOI
Vessel Name
Except where otherwise noted, this item's license is described as Attribution 4.0 International