Applications of natural language processing in biodiversity science

dc.contributor.author Thessen, Anne E.
dc.contributor.author Cui, Hong
dc.contributor.author Mozzherin, Dmitry
dc.date.accessioned 2012-06-20T18:06:18Z
dc.date.available 2012-06-20T18:06:18Z
dc.date.issued 2012
dc.description © The Author(s), 2012. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Advances in Bioinformatics 2012 (2012): 391574, doi:10.1155/2012/391574. en_US
dc.description.abstract Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science. en_US
dc.description.sponsorship This work was funded in part by the MacArthur Foundation Grant to the Encyclopedia of Life, the National Science Foundation Data Net Program Grant no. 0830976, and the National Science Foundation Emerging Front Grant no. 0849982. en_US
dc.format.mimetype application/pdf
dc.identifier.citation Advances in Bioinformatics 2012 (2012): 391574 en_US
dc.identifier.doi 10.1155/2012/391574
dc.identifier.uri https://hdl.handle.net/1912/5235
dc.language.iso en_US en_US
dc.publisher Hindawi Publishing en_US
dc.relation.uri https://doi.org/10.1155/2012/391574
dc.rights Attribution 3.0 Unported *
dc.rights.uri http://creativecommons.org/licenses/by/3.0/ *
dc.title Applications of natural language processing in biodiversity science en_US
dc.type Article en_US
dspace.entity.type Publication
relation.isAuthorOfPublication 3cca9fc8-ca04-41bc-8e22-afae3733ce35
relation.isAuthorOfPublication 0cea8b6f-6d24-4eb2-b6fd-4c7d864afd73
relation.isAuthorOfPublication dd2e58b5-1541-46e5-8cca-c6f456395db8
relation.isAuthorOfPublication.latestForDiscovery 3cca9fc8-ca04-41bc-8e22-afae3733ce35
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
391574.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.89 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections