CLI Publications

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 29
  • Article
    NetiNeti : discovery of scientific names from text using machine learning methods
    (BioMed Central, 2012-08-22) Akella, Lakshmi Manohar ; Norton, Cathy N. ; Miller, Holly
    A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.
  • Article
    The taxonomic name resolution service : an online tool for automated standardization of plant names
    (BioMed Central, 2013-01-16) Boyle, Brad ; Hopkins, Nicole ; Lu, Zhenyuan ; Garay, Juan Antonio Raygoza ; Mozzherin, Dmitry ; Rees, Tony ; Matasci, Naim ; Narro, Martha L. ; Piel, William H. ; Mckay, Sheldon J. ; Lowry, Sonya ; Freeland, Chris ; Peet, Robert K. ; Enquist, Brian J.
    The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ webcite and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/ webcite.
  • Working Paper
    Envisioning the future of science libraries at academic research institutions : a discussion
    ( 2012-12-20) Feltes, Carol ; Gibson, Donna S. ; Miller, Holly ; Norton, Cathy N. ; Pollock, Ludmila
    A group of librarians, other information professionals, scientists and research administrators met to discuss the challenges that research libraries are currently facing. After the meeting a survey was conducted to obtain additional input from the group on several key challenges that arose from the discussions. The purpose of the meeting and survey was threefold: 1. Examine in detail, from a variety of perspectives, how the world of research is changing and the impact these changes have on the direction of research libraries. 2. Create an informed vision of how research libraries can be a vital partner to researchers. 3. Suggest a strategic approach for realizing this vision. The strategic approach presented in this white paper incorporates feedback from various sized research libraries, each with its own mission. The expectation is that individual libraries will use it as a guide in formulating strategies that are appropriate to their research communities, financial circumstances, and organizational reporting structure.
  • Article
    The taxonomic significance of species that have only been observed once : the genus Gymnodinium (Dinoflagellata) as an example
    (Public Library of Science, 2012-08-30) Thessen, Anne E. ; Patterson, David J. ; Murray, Shauna A.
    Taxonomists have been tasked with cataloguing and quantifying the Earth’s biodiversity. Their progress is measured in code-compliant species descriptions that include text, images, type material and molecular sequences. It is from this material that other researchers are to identify individuals of the same species in future observations. It has been estimated that 13% to 22% (depending on taxonomic group) of described species have only ever been observed once. Species that have only been observed at the time and place of their original description are referred to as oncers. Oncers are important to our current understanding of biodiversity. They may be validly described species that are members of a rare biosphere, or they may indicate endemism, or that these species are limited to very constrained niches. Alternatively, they may reflect that taxonomic practices are too poor to allow the organism to be re-identified or that the descriptions are unknown to other researchers. If the latter are true, our current tally of species will not be an accurate indication of what we know. In order to investigate this phenomenon and its potential causes, we examined the microbial eukaryote genus Gymnodinium. This genus contains 268 extant species, 103 (38%) of which have not been observed since their original description. We report traits of the original descriptions and interpret them in respect to the status of the species. We conclude that the majority of oncers were poorly described and their identity is ambiguous. As a result, we argue that the genus Gymnodinium contains only 234 identifiable species. Species that have been observed multiple times tend to have longer descriptions, written in English. The styles of individual authors have a major effect, with a few authors describing a disproportionate number of oncers. The information about the taxonomy of Gymnodinium that is available via the internet is incomplete, and reliance on it will not give access to all necessary knowledge. Six new names are presented – Gymnodinium campbelli for the homonymous name Gymnodinium translucens Campbell 1973, Gymnodinium antarcticum for the homonymous name Gymnodinium frigidum Balech 1965, Gymnodinium manchuriensis for the homonymous name Gymnodinium autumnale Skvortzov 1968, Gymnodinium christenum for the homonymous name Gymnodinium irregulare Christen 1959, Gymnodinium conkufferi for the homonymous name Gymnodinium irregulare Conrad & Kufferath 1954 and Gymnodinium chinensis for the homonymous name Gymnodinium frigidum Skvortzov 1968.
  • Article
    Biological nomenclature terms for facilitating communication in the naming of organisms
    (Pensoft, 2012-05-08) David, John ; Garrity, George M. ; Greuter, Werner ; Hawksworth, David L. ; Jahn, Regine ; Kirk, Paul M. ; McNeill, John ; Michel, Ellinor ; Knapp, Sandra ; Patterson, David J. ; Tindall, Brian J. ; Todd, Jonathan A. ; Tol, Jan van ; Turland, Nicholas J.
    A set of terms recommended for use in facilitating communication in biological nomenclature is presented as a table showing broadly equivalent terms used in the traditional Codes of nomenclature. These terms are intended to help those engaged in naming across organism groups, and are the result of the work of the International Committee on Bionomenclature, whose aim is to promote harmonisation and communication amongst those naming life on Earth.
  • Article
    Applications of natural language processing in biodiversity science
    (Hindawi Publishing, 2012) Thessen, Anne E. ; Cui, Hong ; Mozzherin, Dmitry
    Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.
  • Presentation
    Building research networks to support campus programs [poster]
    ( 2012-04-04) Furfey, John F. ; Devenish, Ann ; Hurter, Colleen ; Stafford, Nancy
    This poster focuses on the methods, tools and outcomes involved in creating two targeted research networks to support large, long-running research programs in the Woods Hole scientific community.
  • Article
    Mapping the biosphere : exploring species to understand the origin, organization and sustainability of biodiversity
    (Taylor & Francis, 2012-03-27) Wheeler, Q. D. ; Knapp, Sandra ; Stevenson, D. W. ; Stevenson, J. ; Blum, Stan D. ; Boom, B.. M. ; Borisy, Gary G. ; Buizer, James L. ; De Carvalho, M. R. ; Cibrian, A. ; Donoghue, M. J. ; Doyle, V. ; Gerson, E. M. ; Graham, C. H. ; Graves, P. ; Graves, Sara J. ; Guralnick, Robert P. ; Hamilton, A. L. ; Hanken, J. ; Law, W. ; Lipscomb, D. L. ; Lovejoy, Thomas E. ; Miller, Holly ; Miller, J. S. ; Naeem, Shahid ; Novacek, M. J. ; Page, L. M. ; Platnick, N. I. ; Porter-Morgan, H. ; Raven, Peter H. ; Solis, M. A. ; Valdecasas, A. G. ; Van Der Leeuw, S. ; Vasco, A. ; Vermeulen, N. ; Vogel, J. ; Walls, R. L. ; Wilson, E. O. ; Woolley, J. B.
    The time is ripe for a comprehensive mission to explore and document Earth's species. This calls for a campaign to educate and inspire the next generation of professional and citizen species explorers, investments in cyber-infrastructure and collections to meet the unique needs of the producers and consumers of taxonomic information, and the formation and coordination of a multi-institutional, international, transdisciplinary community of researchers, scholars and engineers with the shared objective of creating a comprehensive inventory of species and detailed map of the biosphere. We conclude that an ambitious goal to describe 10 million species in less than 50 years is attainable based on the strength of 250 years of progress, worldwide collections, existing experts, technological innovation and collaborative teamwork. Existing digitization projects are overcoming obstacles of the past, facilitating collaboration and mobilizing literature, data, images and specimens through cyber technologies. Charting the biosphere is enormously complex, yet necessary expertise can be found through partnerships with engineers, information scientists, sociologists, ecologists, climate scientists, conservation biologists, industrial project managers and taxon specialists, from agrostologists to zoophytologists. Benefits to society of the proposed mission would be profound, immediate and enduring, from detection of early responses of flora and fauna to climate change to opening access to evolutionary designs for solutions to countless practical problems. The impacts on the biodiversity, environmental and evolutionary sciences would be transformative, from ecosystem models calibrated in detail to comprehensive understanding of the origin and evolution of life over its 3.8 billion year history. The resultant cyber-enabled taxonomy, or cybertaxonomy, would open access to biodiversity data to developing nations, assure access to reliable data about species, and change how scientists and citizens alike access, use and think about biological diversity information.
  • Article
    Pseudo-nitzschia physiological ecology, phylogeny, toxicity, monitoring and impacts on ecosystem health
    (Elsevier B.V., 2011-11-03) Trainer, Vera L. ; Bates, Stephen S. ; Lundholm, Nina ; Thessen, Anne E. ; Cochlan, William P. ; Adams, Nicolaus G. ; Trick, Charles G.
    Over the last decade, our understanding of the environmental controls on Pseudo-nitzschia blooms and domoic acid (DA) production has matured. Pseudo-nitzschia have been found along most of the world's coastlines, while the impacts of its toxin, DA, are most persistent and detrimental in upwelling systems. However, Pseudo-nitzschia and DA have recently been detected in the open ocean's high-nitrate, low-chlorophyll regions, in addition to fjords, gulfs and bays, showing their presence in diverse environments. The toxin has been measured in zooplankton, shellfish, crustaceans, echinoderms, worms, marine mammals and birds, as well as in sediments, demonstrating its stable transfer through the marine food web and abiotically to the benthos. The linkage of DA production to nitrogenous nutrient physiology, trace metal acquisition, and even salinity, suggests that the control of toxin production is complex and likely influenced by a suite of environmental factors that may be unique to a particular region. Advances in our knowledge of Pseudo-nitzschia sexual reproduction, also in field populations, illustrate its importance in bloom dynamics and toxicity. The combination of careful taxonomy and powerful new molecular methods now allow for the complete characterization of Pseudo-nitzschia populations and how they respond to environmental changes. Here we summarize research that represents our increased knowledge over the last decade of Pseudo-nitzschia and its production of DA, including changes in worldwide range, phylogeny, physiology, ecology, monitoring and public health impacts.
  • Article
    Data hosting infrastructure for primary biodiversity data
    (BioMed Central, 2011-12-15) Goddard, Anthony ; Wilson, Nathan ; Cryer, Phil ; Yamashita, Grant
    Today, an unprecedented volume of primary biodiversity data are being generated worldwide, yet significant amounts of these data have been and will continue to be lost after the conclusion of the projects tasked with collecting them. To get the most value out of these data it is imperative to seek a solution whereby these data are rescued, archived and made available to the biodiversity community. To this end, the biodiversity informatics community requires investment in processes and infrastructure to mitigate data loss and provide solutions for long-term hosting and sharing of biodiversity data. We review the current state of biodiversity data hosting and investigate the technological and sociological barriers to proper data management. We further explore the rescuing and re-hosting of legacy data, the state of existing toolsets and propose a future direction for the development of new discovery tools. We also explore the role of data standards and licensing in the context of data hosting and preservation. We provide five recommendations for the biodiversity community that will foster better data preservation and access: (1) encourage the community's use of data standards, (2) promote the public domain licensing of data, (3) establish a community of those involved in data hosting and archival, (4) establish hosting centers for biodiversity data, and (5) develop tools for data discovery. The community's adoption of standards and development of tools to enable data discovery is essential to sustainable data preservation. Furthermore, the increased adoption of open content licensing, the establishment of data hosting infrastructure and the creation of a data hosting and archiving community are all necessary steps towards the community ensuring that data archival policies become standardized.
  • Article
    Data issues in the life sciences
    (Pensoft Publishers, 2011-11-28) Thessen, Anne E. ; Patterson, David J.
    We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the “Big New Biology”. Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available.
  • Presentation
    SCOR/IODE/MBLWHOI Library collaboration on data publication [poster] 
    ( 2011-05-25) Raymond, Lisa ; Pikula, Linda ; Lowry, Roy ; Urban, Edward ; Moncoiffe, Gwenaelle ; Pissierssens, Peter ; Norton, Cathy N.
    This poster describes the development of international standards to publish oceanographic datasets. Research areas include the assignment of persistent identifiers, tracking provenance, linking datasets to publications, attributing credit to data providers, and best practices for the physical composition and semantic description of the content.
  • Preprint
    Identity of epibiotic bacteria on symbiontid euglenozoans in O2-depleted marine sediments : evidence for symbiont and host co-evolution
    ( 2010-06) Edgcomb, Virginia P. ; Breglia, S. A. ; Yubuki, Naoji ; Beaudoin, David J. ; Patterson, David J. ; Leander, Brian S. ; Bernhard, Joan M.
    A distinct subgroup of euglenozoans, referred to as the “Symbiontida,” has been described from oxygen-depleted and sulfidic marine environments. By definition, all members of this group carry epibionts that are intimately associated with underlying mitochondrion-derived organelles beneath the surface of the hosts. We have used molecular phylogenetic and ultrastructural evidence to identify the rod-shaped epibionts of two members of this group, Calkinsia aureus and Bihospites bacati, hand-picked from sediments from two separate oxygen-depleted, sulfidic environments. We identify their epibionts as closely related sulfur or sulfide oxidizing members of the Epsilon proteobacteria. The Epsilon proteobacteria generally play a significant role in deep-sea habitats as primary colonizers, primary producers, and/or in symbiotic associations. The epibionts likely fulfill a role in detoxifying the immediate surrounding environment for these two different hosts. The nearly identical rod-shaped epibionts on these two symbiontid hosts provides evidence for a co-evolutionary history between these two sets of partners. This hypothesis is supported by congruent tree topologies inferred from 18S and 16S rDNA from the hosts and bacterial epibionts, respectively. The eukaryotic hosts likely serve as a motile substrate that delivers the epibionts to the ideal locations with respect to the oxic/anoxic interface whereby their growth rates can be maximized, perhaps also allowing the host to cultivate a food source. Because symbiontid isolates and additional SSU rDNA gene sequences from this clade have now been recovered from many locations worldwide, the Symbiontida are likely more widespread and diverse than presently known.
  • Preprint
    Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life
    ( 2010-06-01) Parfrey, Laura Wegener ; Grant, Jessica ; Tekle, Yonas I. ; Lasek-Nesselquist, Erica ; Morrison, Hilary G. ; Sogin, Mitchell L. ; Patterson, David J. ; Katz, Laura A.
    An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying the diversity of microbial and macroscopic (e.g. plants and animals) eukaryotes. Previous work has divided eukaryotic diversity into a small number of high-level ‘supergroups’, many of which receive strong support in phylogenomic analyses. However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships due to systematic phylogenetic error. Further, the paucity of major eukaryotic lineages (19 or fewer) included in these genomic studies may exaggerate systematic error and reduces power to evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life. The consistency across analyses with varying numbers of taxa (88-451) and levels of missing data (17-69%) supports the accuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genes or taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in these analyses (e.g. SAR, Rhizaria, Excavata), while the proposed supergroup ‘Chromalveolata’ is rejected. Further, extensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary relationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxonrich analyses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data.
  • Preprint
    Names are key to the big new biology
    ( 2010-09-20) Patterson, David J. ; Cooper, J. ; Kirk, Paul M. ; Pyle, R. L. ; Remsen, David P.
    Those who seek answers to big, broad questions about biology, especially questions emphasizing the organism (taxonomy, evolution, ecology), will soon benefit from an emerging names-based infrastructure. It will draw on the almost universal association of organism names with biological information to index and interconnect information distributed across the Internet. The result will be a virtual data commons, expanding as further data are shared, allowing biology to become more of a “big science”. Informatics devices will exploit this ‘big new biology’, revitalizing comparative biology with a broad perspective to reveal previously inaccessible trends and discontinuities, so helping us to reveal unfamiliar biological truths. Here, we review the first components of this freely available, participatory, and semantic Global Names Architecture.
  • Preprint
    A model for Bioinformatics training : the Marine Biological Laboratory
    ( 2010-08-04) Yamashita, Grant ; Miller, Holly ; Goddard, Anthony ; Norton, Cathy N.
    Many areas of science such as biology, medicine, and oceanography are becoming increasingly data-rich and most programs that train scientists do not address informatics techniques or technologies that are necessary for managing and analyzing large amounts of data. Educational resources for scientists in informatics are scarce, yet scientists need the skills and knowledge to work with informaticians and manage graduate students and post-docs in informatics projects. The Marine Biological Laboratory houses a world-renowned library and is involved in a number of informatics projects in the sciences. The MBL has been home to the National Library of Medicine's BioMedical Informatics Course for nearly two decades and is committed to educating scientists and other scholars in informatics. In an innovative, immersive learning experience, Grant Yamashita, a biologist and post-doc at Arizona State University, visited the Science Informatics Group at MBL to learn first hand how informatics is done and how informatics teams work. Hands-on work with developers, systems administrators, librarians, and other scientists provided an invaluable education in informatics and is a model for future science informatics training.
  • Article
    LigerCat : using “MeSH clouds” from journal, article, or gene citations to facilitate the identification of relevant biomedical literature
    (American Medical Informatics Association, 2009-11-14) Sarkar, Indra Neil ; Schenk, Ryan ; Miller, Holly ; Norton, Cathy N.
    The identification of relevant literature from within large collections is often a challenging endeavor. In the context of indexed resources, such as MEDLINE, it has been shown that keywords from a controlled vocabulary (e.g., MeSH) can be used in combination to retrieve relevant search results. One effective strategy for identifying potential search terms is to examine a collection of documents for frequently occurring terms. In this way, “Tag clouds” are a popular mechanism for ascertaining terms associated with a collection of documents. Here, we present the Literature and Genomic Electronic Resource Catalogue (LigerCat) system for exploring biomedical literature through the selection of terms within a “MeSH cloud” that is generated based on an initial query using journal, article, or gene data. The resultant interface is encapsulated within a Web interface: http://ligercat.ubio.org. The system is also available for installation under an MIT license.
  • Article
    GenBank and PubMed : how connected are they?
    (BioMed Central, 2009-06-09) Miller, Holly ; Norton, Cathy N. ; Sarkar, Indra Neil
    GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs). Here we show that an analysis of 87,116,501 GenBank sequence files reveals that 42% are associated with a publication or patent. Of these, 71% are associated with PMIDs, and can therefore be linked to a citation record in the PubMed database. The remaining (29%) of publication-associated GenBank entries either do not have PMIDs or cite a publication that is not currently indexed by PubMed. We also identify the journal titles that are linked through citations in the GenBank files to the largest number of sequences. Our analysis suggests that GenBank contains molecular sequences from a range of disciplines beyond biomedicine, the initial scope of PubMed. The findings thus suggest opportunities to develop mechanisms for integrating biological knowledge beyond the biomedical field.
  • Preprint
    Intra- and interspecies differences in growth and toxicity of Pseudo-nitzschia while using different nitrogen sources
    ( 2009-01) Thessen, Anne E. ; Bowers, Holly A. ; Stoecker, Diane K.
    Clonal cultures of plankton are widely used in laboratory experiments and have contributed greatly to knowledge of microbial systems. However, many physiological characteristics vary drastically between strains of the same species, calling into question our ability to make ecologically relevant inferences about populations based on studying one or a few strains. This study included nineteen non-axenic strains of three species of the diatom Pseudo-nitzschia isolated primarily from the mid-Atlantic coastal region of the United States. Toxin (domoic acid) production and growth rates were measured in cultures using different nitrogen sources (NH4+, NO3- and urea) and growth irradiances. The strains exhibited broad differences in growth rate and toxin content even between strains isolated from the same water sample. The influence of bacteria on toxin production was not investigated. Both P. multiseries clones produced toxin, yet preferentially used different nitrogen sources. Only two out of nine P. calliantha and two out of five P. fraudulenta isolates were toxic and domoic acid content varied by orders of magnitude. All three species had variable intraspecies growth rates on each nitrogen source, but P. fraudulenta strains had the broadest range. Light-limited growth rate and maximum growth rate in P. fraudulenta and P. multiseries varied with species. These findings show the importance of defining intra- and interspecies variability in ecophysiology and toxicity. Ecologically relevant functional diversity in the form of ecotypes or cryptic species appears to be present in the genus Pseudo-nitzschia.
  • Preprint
    CAOS software for use in character-based DNA barcoding
    ( 2008-04) Sarkar, Indra Neil ; Planet, Paul J. ; DeSalle, Rob
    The success of character based DNA barcoding depends on the efficient identification of diagnostic character states from molecular sequences that have been organized hierarchically (e.g., according to phylogenetic methods). Similarly, the reliability of these identified diagnostic character states must be assessed according to their ability to diagnose new sequences. Here, a set of software tools is presented that implement the previously described Characteristic Attribute Organization System for both diagnostic identification and diagnostic-based classification. The software is publicly available from http://sarkarlab.mbl.edu/CAOS.