Mozzherin Dmitry

No Thumbnail Available
Last Name
First Name

Search Results

Now showing 1 - 3 of 3
  • Preprint
    SeaBase : a multispecies transcriptomic resource and platform for gene network inference
    ( 2014-05) Fischer, Antje H. L. ; Mozzherin, Dmitry ; Eren, A. Murat ; Lans, Kristen D. ; Wilson, Nathan ; Cosentino, Carlo ; Smith, Joel
    Marine and aquatic animals are extraordinarily useful as models for identifying mechanisms of development and evolution, regeneration, resistance to cancer, longevity and symbiosis, among many other areas of research. This is due to the great diversity of these organisms and their wide-ranging capabilities. Genomics tools are essential for taking advantage of these “free lessons” of nature. However, genomics and transcriptomics are challenging in emerging model systems. Here, we present SeaBase, a tool for helping to meet these needs. Specifically, SeaBase provides a platform for sharing and searching transcriptome data. More importantly, SeaBase will support a growing number of tools for inferring gene network mechanisms. The first dataset available on SeaBase is a developmental transcriptome profile of the sea anemone Nematostella vectensis (Anthozoa, Cnidaria). Additional datasets are currently being prepared and we are aiming to expand SeaBase to include user-supplied data for any number of marine and aquatic organisms, thereby supporting many potentially new models for gene network studies.
  • Article
    Applications of natural language processing in biodiversity science
    (Hindawi Publishing, 2012) Thessen, Anne E. ; Cui, Hong ; Mozzherin, Dmitry
    Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.
  • Article
    The taxonomic name resolution service : an online tool for automated standardization of plant names
    (BioMed Central, 2013-01-16) Boyle, Brad ; Hopkins, Nicole ; Lu, Zhenyuan ; Garay, Juan Antonio Raygoza ; Mozzherin, Dmitry ; Rees, Tony ; Matasci, Naim ; Narro, Martha L. ; Piel, William H. ; Mckay, Sheldon J. ; Lowry, Sonya ; Freeland, Chris ; Peet, Robert K. ; Enquist, Brian J.
    The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at webcite and as a RESTful web service and application programming interface. Source code is available at webcite.