NetiNeti : Discovery of Scientific Names from Text Using Machine Learning Methods Figure 1

Akella, Lakshmi Manohar

NetiNeti : Discovery of Scientific Names from Text Using Machine Learning Methods Figure 1

Date

2011-12-30

Authors

Akella, Lakshmi Manohar

Linked Authors

Person

Akella, Lakshmi Manohar

Files

NPR-Experiment.txt (26.03 KB)

NPR-Experiment-nosl.txt (26.02 KB)

Citable URI

https://hdl.handle.net/1912/4965

DOI

10.1575/1912/4965

Related Materials

https://hdl.handle.net/1912/6236

Keywords

Naïve Bayes classifier, training experiments

Abstract

A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org we present the comparison results of various machine learning algorithms on our annotated corpus. Naïve Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms.

Description

Figure 1 demonstrates a series of training experiments with the Naïve Bayes classifier using different neighborhoods for contextual features, different sizes of positive and negative training examples and evaluated the resulting classifiers with our annotated gold standard corpus. The data sets are the results of running NetiNeti on subset of 136 PubMedCentral tagged open access articles and with no stop list.

Collections

CLI Data Sets

Full item page