NetiNeti : Discovery of Scientific Names from Text Using Machine Learning Methods Figure 2
Scientific names found in American Seashell Book by manual recognition including names with OCR error (63.62Kb)
Unique NetiNeti results not including names also found using manual markup or Taxon finder with reasons (2.869Kb)
Akella, Lakshmi Manohar
MetadataShow full item record
A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org we present the comparison results of various machine learning algorithms on our annotated corpus. Naïve Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms.
Figure 2 summarizes the results of running the NetiNeti with Naïve Bayes algorithm on the annotated corpus (“American Seashell” book). We also compare our results with those of TaxonFinder.