Norton Cathy N.

No Thumbnail Available
Last Name
Norton
First Name
Cathy N.
ORCID

Search Results

Now showing 1 - 20 of 21
  • Article
    Taxonomic indexing—extending the role of taxonomy
    (Taylor & Francis, 2006-06) Patterson, David J. ; Remsen, David P. ; Marino, William A. ; Norton, Cathy N.
    Taxonomic indexing refers to a new array of taxonomically intelligent network services that use nomenclatural principles and elements of expert taxonomic knowledge to manage information about organisms. Taxonomic indexing was introduced to help manage the increasing amounts of digital information about biology. It has been designed to form a near basal layer in a layered cyberinfrastructure that deals with biological information. Taxonomic Indexing accommodates the special problems of using names of organisms to index biological material. It links alternative names for the same entity (reconciliation), and distinguishes between uses of the same name for different entities (disambiguation), and names are placed within an indefinite number of hierarchical schemes. In order to access all information on all organisms, Taxonomic indexing must be able to call on a registry of all names in all forms for all organisms. NameBank has been developed to meet that need. Taxonomic indexing is an area of informatics that overlaps with taxonomy, is dependent on the expert input of taxonomists, and reveals the relevance of the discipline to a wide audience.
  • Article
    LigerCat : using “MeSH clouds” from journal, article, or gene citations to facilitate the identification of relevant biomedical literature
    (American Medical Informatics Association, 2009-11-14) Sarkar, Indra Neil ; Schenk, Ryan ; Miller, Holly ; Norton, Cathy N.
    The identification of relevant literature from within large collections is often a challenging endeavor. In the context of indexed resources, such as MEDLINE, it has been shown that keywords from a controlled vocabulary (e.g., MeSH) can be used in combination to retrieve relevant search results. One effective strategy for identifying potential search terms is to examine a collection of documents for frequently occurring terms. In this way, “Tag clouds” are a popular mechanism for ascertaining terms associated with a collection of documents. Here, we present the Literature and Genomic Electronic Resource Catalogue (LigerCat) system for exploring biomedical literature through the selection of terms within a “MeSH cloud” that is generated based on an initial query using journal, article, or gene data. The resultant interface is encapsulated within a Web interface: http://ligercat.ubio.org. The system is also available for installation under an MIT license.
  • Presentation
    Lessons learned from 104 years of mobile observatories [poster]
    ( 2007-12-10) Miller, Stephen P. ; Neiswender, Caryn ; Clark, Dru ; Raymond, Lisa ; Rioux, Margaret A. ; Norton, Cathy N. ; Detrick, Robert S. ; Helly, John ; Sutton, Don ; Weatherford, John
    As the oceanographic community ventures into a new era of integrated observatories, it may be helpful to look back on the era of "mobile observatories" to see what Cyberinfrastructure lessons might be learned. For example, SIO has been operating research vessels for 104 years, supporting a wide range of disciplines: marine geology and geophysics, physical oceanography, geochemistry, biology, seismology, ecology, fisheries, and acoustics. In the last 6 years progress has been made with diverse data types, formats and media, resulting in a fully-searchable online SIOExplorer Digital Library of more than 800 cruises (http://SIOExplorer.ucsd.edu). Public access to SIOExplorer is considerable, with 795,351 files (206 GB) downloaded last year. During the last 3 years the efforts have been extended to WHOI, with a "Multi-Institution Testbed for Scalable Digital Archiving" funded by the Library of Congress and NSF (IIS 0455998). The project has created a prototype digital library of data from both institutions, including cruises, Alvin submersible dives, and ROVs. In the process, the team encountered technical and cultural issues that will be facing the observatory community in the near future. Technological Lessons Learned: Shipboard data from multiple institutions are extraordinarily diverse, and provide a good training ground for observatories. Data are gathered from a wide range of authorities, laboratories, servers and media, with little documentation. Conflicting versions exist, generated by alternative processes. Domain- and institution-specific issues were addressed during initial staging. Data files were categorized and metadata harvested with automated procedures. With our second-generation approach to staging, we achieve higher levels of automation with greater use of controlled vocabularies. Database and XML- based procedures deal with the diversity of raw metadata values and map them to agreed-upon standard values, in collaboration with the Marine Metadata Interoperability (MMI) community. All objects are tagged with an expert level, thus serving an educational audience, as well as research users. After staging, publication into the digital library is completely automated. The technical challenges have been largely overcome, thanks to a scalable, federated digital library architecture from the San Diego Supercomputer Center, implemented at SIO, WHOI and other sites. The metadata design is flexible, supporting modular blocks of metadata tailored to the needs of instruments, samples, documents, derived products, cruises or dives, as appropriate. Controlled metadata vocabularies, with content and definitions negotiated by all parties, are critical. Metadata may be mapped to required external standards and formats, as needed. Cultural Lessons Learned: The cultural challenges have been more formidable than expected. They became most apparent during attempts to categorize and stage digital data objects across two institutions, each with their own naming conventions and practices, generally undocumented, and evolving across decades. Whether the questions concerned data ownership, collection techniques, data diversity or institutional practices, the solution involved a joint discussion with scientists, data managers, technicians and archivists, working together. Because metadata discussions go on endlessly, significant benefit comes from dictionaries with definitions of all community-authorized metadata values.
  • Working Paper
    Envisioning the future of science libraries at academic research institutions : a discussion
    ( 2012-12-20) Feltes, Carol ; Gibson, Donna S. ; Miller, Holly ; Norton, Cathy N. ; Pollock, Ludmila
    A group of librarians, other information professionals, scientists and research administrators met to discuss the challenges that research libraries are currently facing. After the meeting a survey was conducted to obtain additional input from the group on several key challenges that arose from the discussions. The purpose of the meeting and survey was threefold: 1. Examine in detail, from a variety of perspectives, how the world of research is changing and the impact these changes have on the direction of research libraries. 2. Create an informed vision of how research libraries can be a vital partner to researchers. 3. Suggest a strategic approach for realizing this vision. The strategic approach presented in this white paper incorporates feedback from various sized research libraries, each with its own mission. The expectation is that individual libraries will use it as a guide in formulating strategies that are appropriate to their research communities, financial circumstances, and organizational reporting structure.
  • Article
    Honor trust and economics in the CD-ROM marketplace
    (IAMSLIC, 1991) Norton, Cathy N.
  • Article
    Journal use study.
    (IAMSLIC, 1985) Norton, Cathy N.
  • Preprint
    Participant perceptions of the influences of the NLM-sponsored Woods Hole Medical Informatics Course
    ( 2005-01-21) Patel, Vimla L. ; Branch, Timothy ; Cimino, Andria ; Norton, Cathy N. ; Cimino, James J.
    This paper provides an evaluation of the NLM-sponsored Woods Hole Medical Informatics course and the extent to which the objectives of the program are achieved. Two studies were conducted to examine the participants’ perceptions of both the shortterm (Spring 2002) and the long-term influences (1993-2002) on knowledge, skills and behaviour. Data were collected through the use of questionnaires, semi-structured telephone interviews, and participant observation methods in order to provide both quantitative and qualitative assessment. The participants of the Spring 2002 course considered the course to be an excellent opportunity to increase their knowledge and understanding of the field of medical informatics, as well as to meet and interact with other professionals in the field to establish future collaborations. Past participants remained highly satisfied with their experience at Woods Hole and its influence on their professional careers and their involvement in a broad range of activities related to medical informatics. This group considered their knowledge and understanding of medical informatics to be of greater quality, had increased their networking with other professionals, and were more confident and motivated to work in the field. Many of the participants feel and show evidence of becoming effective agents of change in their institutions in the area of medical informatics, which is one of the objectives of the program.
  • Article
    The Encyclopedia of Life, Biodiversity Heritage Library, biodiversity informatics and beyond Web 2.0
    (Great Cities Initiative of the University of Illinois at Chicago Library, 2008-08-04) Norton, Cathy N.
    E.O. Wilson, the noted entomologist at Harvard, "wished" for an authoritative encyclopedia of life that would be freely available on the worldwide web for the entire world. On 9 May 2007, The Encyclopedia of Life (EOL) was launched as a multi-institutional initiative whose mission is to create 1.8 million Web sites detailing all the known attributes, history, and behavior, about every known and described species and portraying that information through video, audio, and literature, via the Internet. A major contributor to the Encyclopedia is the Biodiversity Heritage Library that is currently scanning all the core biodiversity literature.
  • Presentation
    SCOR/IODE/MBLWHOI Library collaboration on data publication [poster] 
    ( 2011-05-25) Raymond, Lisa ; Pikula, Linda ; Lowry, Roy ; Urban, Edward ; Moncoiffe, Gwenaelle ; Pissierssens, Peter ; Norton, Cathy N.
    This poster describes the development of international standards to publish oceanographic datasets. Research areas include the assignment of persistent identifiers, tracking provenance, linking datasets to publications, attributing credit to data providers, and best practices for the physical composition and semantic description of the content.
  • Article
    Engaging new audiences with specialized data
    (IAMSLIC, 2013) Rinaldo, Constance ; Norton, Cathy N.
  • Article
    IODE panel discussion on data citation
    (IAMSLIC, 2013) Norton, Cathy N.
  • Article
    Money: where is it?
    (IAMSLIC, 1990) Norton, Cathy N.
  • Preprint
    A model for Bioinformatics training : the Marine Biological Laboratory
    ( 2010-08-04) Yamashita, Grant ; Miller, Holly ; Goddard, Anthony ; Norton, Cathy N.
    Many areas of science such as biology, medicine, and oceanography are becoming increasingly data-rich and most programs that train scientists do not address informatics techniques or technologies that are necessary for managing and analyzing large amounts of data. Educational resources for scientists in informatics are scarce, yet scientists need the skills and knowledge to work with informaticians and manage graduate students and post-docs in informatics projects. The Marine Biological Laboratory houses a world-renowned library and is involved in a number of informatics projects in the sciences. The MBL has been home to the National Library of Medicine's BioMedical Informatics Course for nearly two decades and is committed to educating scientists and other scholars in informatics. In an innovative, immersive learning experience, Grant Yamashita, a biologist and post-doc at Arizona State University, visited the Science Informatics Group at MBL to learn first hand how informatics is done and how informatics teams work. Hands-on work with developers, systems administrators, librarians, and other scientists provided an invaluable education in informatics and is a model for future science informatics training.
  • Article
    GenBank and PubMed : how connected are they?
    (BioMed Central, 2009-06-09) Miller, Holly ; Norton, Cathy N. ; Sarkar, Indra Neil
    GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs). Here we show that an analysis of 87,116,501 GenBank sequence files reveals that 42% are associated with a publication or patent. Of these, 71% are associated with PMIDs, and can therefore be linked to a citation record in the PubMed database. The remaining (29%) of publication-associated GenBank entries either do not have PMIDs or cite a publication that is not currently indexed by PubMed. We also identify the journal titles that are linked through citations in the GenBank files to the largest number of sequences. Our analysis suggests that GenBank contains molecular sequences from a range of disciplines beyond biomedicine, the initial scope of PubMed. The findings thus suggest opportunities to develop mechanisms for integrating biological knowledge beyond the biomedical field.
  • Article
    uBioRSS : tracking taxonomic literature using RSS
    (Oxford University Press, 2007-03-28) Leary, Patrick R. ; Remsen, David P. ; Norton, Cathy N. ; Patterson, David J. ; Sarkar, Indra Neil
    Web content syndication through standard formats such as RSS and ATOM has become an increasingly popular mechanism for publishers, news sources, and blogs to disseminate regularly updated content. These standardized syndication formats deliver content directly to the subscriber, allowing them to locally aggregate content from a variety of sources instead of having to find the information on multiple websites. The uBioRSS application is a "taxonomically intelligent" service customized for the biological sciences. It aggregates syndicated content from academic publishers and science news feeds, then uses a taxonomic name entity recognition algorithm to identify and index taxonomic names within those data streams. The resulting name index is cross-referenced to current global taxonomic datasets to provide context for browsing the publications by taxonomic group. This process, called taxonomic indexing, draws upon services developed specifically for biological sciences, collectively referred to as "taxonomic intelligence." Such value-added enhancements can provide biologists with accelerated and improved access to current biological content.
  • Article
    NetiNeti : discovery of scientific names from text using machine learning methods
    (BioMed Central, 2012-08-22) Akella, Lakshmi Manohar ; Norton, Cathy N. ; Miller, Holly
    A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.
  • Article
    Taxonomic informatics tools for the electronic nomenclator zoologicus
    (Marine Biological Laboratory, 2006-02) Remsen, David P. ; Norton, Cathy N. ; Patterson, David J.
    Given the current trends, it seems inevitable that all biological documents will eventually exist in a digital format and be distributed across the internet. New network services and tools need to be developed to increase retrieval rates for documents and to refine data recovery. Biological data have traditionally been well managed using taxonomic principles. As part of a larger initiative to build an array of names-based network services that emulate taxonomic principles for managing biological information, we undertook the digitization of a major taxonomic reference text, Nomenclator Zoologicus. The process involved replicating the text to a high level of fidelity, parsing the content for inclusion within a database, developing tools to enable expert input into the product, and integrating the metadata and factual content within taxonomic network services. The result is a high-quality and freely available web application (http://uio.mbl.edu/NomenclatorZoologicus/) capable of being exploited in an array of biological informatics services.
  • Article
    The Biodiversity Heritage Library: an expanding international collaboration
    (IAMSLIC, 2010) Rinaldo, Constance ; Norton, Cathy N.
  • Article
  • Article
    Exploring historical trends using taxonomic name metadata
    (BioMed Central, 2008-05-13) Sarkar, Indra Neil ; Schenk, Ryan ; Norton, Cathy N.
    Authority and year information have been attached to taxonomic names since Linnaean times. The systematic structure of taxonomic nomenclature facilitates the ability to develop tools that can be used to explore historical trends that may be associated with taxonomy. From the over 10.7 million taxonomic names that are part of the uBio system, approximately 3 million names were identified to have taxonomic authority information from the years 1750 to 2004. A pipe-delimited file was then generated, organized according to a Linnaean hierarchy and by years from 1750 to 2004, and imported into an Excel workbook. A series of macros were developed to create an Excel-based tool and a complementary Web site to explore the taxonomic data. A cursory and speculative analysis of the data reveals observable trends that may be attributable to significant events that are of both taxonomic (e.g., publishing of key monographs) and societal importance (e.g., world wars). The findings also help quantify the number of taxonomic descriptions that may be made available through digitization initiatives. Temporal organization of taxonomic data can be used to identify interesting biological epochs relative to historically significant events and ongoing efforts. We have developed an Excel workbook and complementary Web site that enables one to explore taxonomic trends for Linnaean taxonomic groupings, from Kingdoms to Families.