Miller Holly

No Thumbnail Available
Last Name
First Name

Search Results

Now showing 1 - 6 of 6
  • Article
    LigerCat : using “MeSH clouds” from journal, article, or gene citations to facilitate the identification of relevant biomedical literature
    (American Medical Informatics Association, 2009-11-14) Sarkar, Indra Neil ; Schenk, Ryan ; Miller, Holly ; Norton, Cathy N.
    The identification of relevant literature from within large collections is often a challenging endeavor. In the context of indexed resources, such as MEDLINE, it has been shown that keywords from a controlled vocabulary (e.g., MeSH) can be used in combination to retrieve relevant search results. One effective strategy for identifying potential search terms is to examine a collection of documents for frequently occurring terms. In this way, “Tag clouds” are a popular mechanism for ascertaining terms associated with a collection of documents. Here, we present the Literature and Genomic Electronic Resource Catalogue (LigerCat) system for exploring biomedical literature through the selection of terms within a “MeSH cloud” that is generated based on an initial query using journal, article, or gene data. The resultant interface is encapsulated within a Web interface: The system is also available for installation under an MIT license.
  • Working Paper
    Envisioning the future of science libraries at academic research institutions : a discussion
    ( 2012-12-20) Feltes, Carol ; Gibson, Donna S. ; Miller, Holly ; Norton, Cathy N. ; Pollock, Ludmila
    A group of librarians, other information professionals, scientists and research administrators met to discuss the challenges that research libraries are currently facing. After the meeting a survey was conducted to obtain additional input from the group on several key challenges that arose from the discussions. The purpose of the meeting and survey was threefold: 1. Examine in detail, from a variety of perspectives, how the world of research is changing and the impact these changes have on the direction of research libraries. 2. Create an informed vision of how research libraries can be a vital partner to researchers. 3. Suggest a strategic approach for realizing this vision. The strategic approach presented in this white paper incorporates feedback from various sized research libraries, each with its own mission. The expectation is that individual libraries will use it as a guide in formulating strategies that are appropriate to their research communities, financial circumstances, and organizational reporting structure.
  • Preprint
    A model for Bioinformatics training : the Marine Biological Laboratory
    ( 2010-08-04) Yamashita, Grant ; Miller, Holly ; Goddard, Anthony ; Norton, Cathy N.
    Many areas of science such as biology, medicine, and oceanography are becoming increasingly data-rich and most programs that train scientists do not address informatics techniques or technologies that are necessary for managing and analyzing large amounts of data. Educational resources for scientists in informatics are scarce, yet scientists need the skills and knowledge to work with informaticians and manage graduate students and post-docs in informatics projects. The Marine Biological Laboratory houses a world-renowned library and is involved in a number of informatics projects in the sciences. The MBL has been home to the National Library of Medicine's BioMedical Informatics Course for nearly two decades and is committed to educating scientists and other scholars in informatics. In an innovative, immersive learning experience, Grant Yamashita, a biologist and post-doc at Arizona State University, visited the Science Informatics Group at MBL to learn first hand how informatics is done and how informatics teams work. Hands-on work with developers, systems administrators, librarians, and other scientists provided an invaluable education in informatics and is a model for future science informatics training.
  • Article
    GenBank and PubMed : how connected are they?
    (BioMed Central, 2009-06-09) Miller, Holly ; Norton, Cathy N. ; Sarkar, Indra Neil
    GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs). Here we show that an analysis of 87,116,501 GenBank sequence files reveals that 42% are associated with a publication or patent. Of these, 71% are associated with PMIDs, and can therefore be linked to a citation record in the PubMed database. The remaining (29%) of publication-associated GenBank entries either do not have PMIDs or cite a publication that is not currently indexed by PubMed. We also identify the journal titles that are linked through citations in the GenBank files to the largest number of sequences. Our analysis suggests that GenBank contains molecular sequences from a range of disciplines beyond biomedicine, the initial scope of PubMed. The findings thus suggest opportunities to develop mechanisms for integrating biological knowledge beyond the biomedical field.
  • Article
    NetiNeti : discovery of scientific names from text using machine learning methods
    (BioMed Central, 2012-08-22) Akella, Lakshmi Manohar ; Norton, Cathy N. ; Miller, Holly
    A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at
  • Article
    Mapping the biosphere : exploring species to understand the origin, organization and sustainability of biodiversity
    (Taylor & Francis, 2012-03-27) Wheeler, Q. D. ; Knapp, Sandra ; Stevenson, D. W. ; Stevenson, J. ; Blum, Stan D. ; Boom, B.. M. ; Borisy, Gary G. ; Buizer, James L. ; De Carvalho, M. R. ; Cibrian, A. ; Donoghue, M. J. ; Doyle, V. ; Gerson, E. M. ; Graham, C. H. ; Graves, P. ; Graves, Sara J. ; Guralnick, Robert P. ; Hamilton, A. L. ; Hanken, J. ; Law, W. ; Lipscomb, D. L. ; Lovejoy, Thomas E. ; Miller, Holly ; Miller, J. S. ; Naeem, Shahid ; Novacek, M. J. ; Page, L. M. ; Platnick, N. I. ; Porter-Morgan, H. ; Raven, Peter H. ; Solis, M. A. ; Valdecasas, A. G. ; Van Der Leeuw, S. ; Vasco, A. ; Vermeulen, N. ; Vogel, J. ; Walls, R. L. ; Wilson, E. O. ; Woolley, J. B.
    The time is ripe for a comprehensive mission to explore and document Earth's species. This calls for a campaign to educate and inspire the next generation of professional and citizen species explorers, investments in cyber-infrastructure and collections to meet the unique needs of the producers and consumers of taxonomic information, and the formation and coordination of a multi-institutional, international, transdisciplinary community of researchers, scholars and engineers with the shared objective of creating a comprehensive inventory of species and detailed map of the biosphere. We conclude that an ambitious goal to describe 10 million species in less than 50 years is attainable based on the strength of 250 years of progress, worldwide collections, existing experts, technological innovation and collaborative teamwork. Existing digitization projects are overcoming obstacles of the past, facilitating collaboration and mobilizing literature, data, images and specimens through cyber technologies. Charting the biosphere is enormously complex, yet necessary expertise can be found through partnerships with engineers, information scientists, sociologists, ecologists, climate scientists, conservation biologists, industrial project managers and taxon specialists, from agrostologists to zoophytologists. Benefits to society of the proposed mission would be profound, immediate and enduring, from detection of early responses of flora and fauna to climate change to opening access to evolutionary designs for solutions to countless practical problems. The impacts on the biodiversity, environmental and evolutionary sciences would be transformative, from ecosystem models calibrated in detail to comprehensive understanding of the origin and evolution of life over its 3.8 billion year history. The resultant cyber-enabled taxonomy, or cybertaxonomy, would open access to biodiversity data to developing nations, assure access to reliable data about species, and change how scientists and citizens alike access, use and think about biological diversity information.