GenBank and PubMed : how connected are they?

dc.contributor.author Miller, Holly
dc.contributor.author Norton, Cathy N.
dc.contributor.author Sarkar, Indra Neil
dc.date.accessioned 2009-07-14T13:19:47Z
dc.date.available 2009-07-14T13:19:47Z
dc.date.issued 2009-06-09
dc.description © 2009 Sarkar et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License. The definitive version was published in BMC Research Notes 2 (2009): 101, doi:10.1186/1756-0500-2-101. en
dc.description.abstract GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs). Here we show that an analysis of 87,116,501 GenBank sequence files reveals that 42% are associated with a publication or patent. Of these, 71% are associated with PMIDs, and can therefore be linked to a citation record in the PubMed database. The remaining (29%) of publication-associated GenBank entries either do not have PMIDs or cite a publication that is not currently indexed by PubMed. We also identify the journal titles that are linked through citations in the GenBank files to the largest number of sequences. Our analysis suggests that GenBank contains molecular sequences from a range of disciplines beyond biomedicine, the initial scope of PubMed. The findings thus suggest opportunities to develop mechanisms for integrating biological knowledge beyond the biomedical field. en
dc.description.sponsorship INS and HM are funded in part by a research grant from the Ellison Medical Foundation and National Library of Medicine award R01LM009725 to INS. en
dc.format.mimetype application/pdf
dc.identifier.citation BMC Research Notes 2 (2009): 101 en
dc.identifier.doi 10.1186/1756-0500-2-101
dc.identifier.uri https://hdl.handle.net/1912/2868
dc.language.iso en_US en
dc.publisher BioMed Central en
dc.relation.uri https://doi.org/10.1186/1756-0500-2-101
dc.rights Attribution 2.0 Generic *
dc.rights.uri http://creativecommons.org/licenses/by/2.0 *
dc.title GenBank and PubMed : how connected are they? en
dc.type Article en
dspace.entity.type Publication
relation.isAuthorOfPublication 0fb42fe3-4a25-4c1c-b91d-0b1b2ee5d14e
relation.isAuthorOfPublication a829e76c-a45a-4079-b477-f5079bf1e24c
relation.isAuthorOfPublication 7dbdbc4b-8bd1-4719-8fa6-890cb941fdfc
relation.isAuthorOfPublication.latestForDiscovery 0fb42fe3-4a25-4c1c-b91d-0b1b2ee5d14e
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
1756-0500-2-101.pdf
Size:
482.87 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.97 KB
Format:
Item-specific license agreed upon to submission
Description: