First Monday, Volume 13 Number 8 - 4 August 2008

First Monday

The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informatics and Beyond Web 2.0 by Cathy Norton



Abstract
E.O. Wilson, the noted entomologist at Harvard, “wished” for an authoritative encyclopedia of life that would be freely available on the worldwide web for the entire world. On 9 May 2007, The Encyclopedia of Life (EOL) was launched as a multi–institutional initiative whose mission is to create 1.8 million Web sites detailing all the known attributes, history, and behavior, about every known and described species and portraying that information through video, audio, and literature, via the Internet. A major contributor to the Encyclopedia is the Biodiversity Heritage Library that is currently scanning all the core biodiversity literature.

 


 

The Encyclopedia of Life is a collaboration between museums, universities and research institutions with five cornerstone institutions: the Ernst Meyer Museum of Comparative Zoology at Harvard, Field Museum, Marine Biological Laboratory, Missouri Botanical Gardens, Smithsonian Institution and Biodiversity Heritage Library (BHL) which is in turn comprised of ten major libraries. The Marine Biological Laboratory (MBL) is working on the development of Biodiversity Informatics tools to deliver the Web sites, Harvard is developing the Education and Outreach component of EOL, the Field Museum is the Synthesis Center, and the Smithsonian Institution serves as the administrative body and oversees the species page group for EOL. In February 2008, 30,000 pages were released on the www.eol.org site. This monumental effort is made possible because of the many collaborators and data partners. The EOL relies on these partnerships and is creating a cyber–infrastructure that will deliver a single, dynamic/Web 2.0 source page with multiple resources for the users. The Encyclopedia of Life plans to have various levels of expertise serving audiences from K–12, to the general public, to the citizen science enthusiast, and ultimately, to the scientific expert who will validate and authenticate the information on the species pages.

Dr. James Edwards, EOL Executive Director, talks about the “anticipations of scientific chords you might hear once the 1.8 million notes are brought together through this instrument.” This scientific orchestration will have as its base all the known literature that these libraries can digitize in the next five years.

The Biodiversity Heritage Library (BHL) is comprised of ten major libraries: American Museum of Natural History, Field Museum, Natural History Museum (London, U.K.), Smithsonian Institution, Missouri Botanical Garden, New York Botanical Garden, Royal Botanical Garden, (Kew U.K.), Botany and Ernst Mayer Library of the Museum of Comparative Zoology, Harvard University and the Marine Biological Laboratory Woods Hole Oceanographic Institution Library. The BHL’s mission, as part of EOL, is to provide open access to biodiversity literature. The BHL is digitizing the core published literature on biodiversity and it is available through the BHL portal (www.biodiversitylibrary.org), the Internet Archive (www.archive.org) and EOL(www.eol.org). The Internet Archive is our scanning partner and with them we adhere to the Open Content Alliance principles which are outlined at www.opencontentalliance.org/participate.html.

The biodiversity domain currently is something you can define with over 5.4 million books dating back to 1469 and 40,000 journal titles, with more than 50 percent, pre–1923 (U.S. out of copyright). Out of copyright is an important element since the scanned images can be made freely available. Also, taxonomic information has exceptional longevity and continues to be useful to taxonomists who generally consult the literature for the first instance of species description. This historic literature mainly describes species in underdeveloped countries, so this effort will in effect repatriate these works back to those that need access. The cited half–life of publications in taxonomy is very long and the decay rate is longer than in most scientific disciplines so these older scanned materials are still relevant to today’s scientific community.

All the institutions involved in this Project also have the “long tail” and have been collecting literature in biodiversity for many years. The MBL was founded in 1888 and Charles Otis Whitman (Whitman, 1881) [1], the first director, declared that Woods Hole was too far from the great libraries in Boston, New York, Chicago and Baltimore so, the MBL should collect everything in science and in all the languages — and in 1888, you could do that! Now such a proposition is financially and physically impossible as an aspiration for any institution. All the BHL libraries have signed MOUs that emphasize our mission to provide in this virtual library setting open access to all the biodiversity literature. Our goal is to digitize the core published literature on biodiversity. BHL will scan all of the out–of–copyright literature and has approached publishers and rights holders for permission to scan their content and place it in our portal. We currently have 49 signed agreements with publishers to scan their materials.

Why are we doing it now? The cost is pretty low to do the scanning, about 10 cents a page in the U.S. and 20 cents a page in the U.K. Biodiversity is a very well–defined scientific domain and the participating institutions hold significant collections of books and journals in the domain, and finally, taxonomic information has exceptional longevity. The benefit to taxonomists and scientists is that they will have global access to this information from EOL and from BHL. There will be new crosswalks created with other disciplines in geology, ecology and genomics, because this information will now be able to be mined differently. The Ellison Medical Foundation has funded another project at the Marine Biological Laboratory that will help identify aging genes in humans — all species for that matter — so that this information can be mined for geographical locations, longevity and diseases using EOL, Genebank, PubMed and other resources(http://boa.mbl.edu). Recently the gene that causes Hutchinson–Gilford Progeria syndrome, the rapid aging in children was discovered (De Sandre–Giovannoli, et.al., 2008) [2]. The database we are developing will help look at some of the intersections between species and longevity, and where they live which may help to shed light on the evolution of this genetic concern.

A core benefit of this project to libraries is the release of “stack” space that can then be made available for other informatics projects. Digitized books can be stored off campus or discarded. The creation of a multi–source, public domain virtual library, with many partners in the public domain will free librarians up to work on areas of metadata management and to improve skills sets that will make their libraries transformational.

The Internet Archive (www.archive.org) is our scanning and hosting partner. We have scribes (scanning stations) installed in London, Boston, New York, and in Washington D.C. at the Smithsonian and the Library of Congress. We have created centers, containing up to 10 scribes each, which allows for mass scanning and mass digitization for up to 16 hours a day. More than six million pages have already been scanned and are available (www.biodiversitylibrary.org). The BHL is taking responsibility for long–term sustainability and management of these digital assets and the content is integrated into an The Encyclopedia of Life Project through taxonomic intelligence linking.

Taxonomic information about species is tied to a scientific name, a name that is linked between what has been learned in the past and what we know today. Information about names and groups of organisms extends back thousands of years. They are in books, in journals, in surveys, in museums, and they are in all different languages. Digital libraries have now unraveled this incredible problem of linking all the name changes over time. Scientific names change at the rate of about one percent per year. If you add that up over the last 200 to 300 years since Linnaeus began to name species, you have many name changes and variations. The current taxonomic literature often relies upon texts that are more than 100 years old. For example, bluefish has had many scientific names associated with it and triple the number of common names that local fisherman use to describe the fish. The naming issue gave rise to the creation of Name Bank and Classification Bank which the MBLWHOI Library developed so that all species’ scientific names could be in one place and disambiguated to link back to each other. So now when you look for “bluefish” you will find all the references to it from when it was known as Temnodon saltator in 1766 to the current name Pomatomus saltator.

Name Bank is a library repository of species names. It’s like Switzerland, it absolutely doesn’t make any judgments, it puts all the names that are listed in the literature in one place that are related to a concept and coordinates with another system — Classification Bank — that carries the expert opinions of where species lie on the Tree of Life. Together these two “Banks” make up the Universal Biological Indexer and Organizer (uBio) (www.ubio.org). Having at your finger tips ALL the names associated with a species will increase your rate of retrieval 1000–fold. If you search in JSTOR with just the current name of a species you will miss all of the articles that were published with the older names. Creating these “banks” was the first step in our quest in biodiversity informatics. And its contributions are notable. The AIDS virus that kills human beings is not the same as the virus in monkeys so the name was changed resulting in the omission of the previous 20 years of AIDS literature. There are also serious challenges in the federated environment. If you use four different scientific names for the same animal to find geographical location information you will end up with four different maps.

The MBLWHOI Library created a tool that finds scientific names in older texts, Taxon Finder, that uses an algorithm to scan for and return the scientific names within the given text. These tools and names are used in the BHL and EOL project to render even more information. These Web tools are available at www.ubio.org.

Using the uBio data resources we have created tools that allow us to search using RSS feeds for all of the animals that our researchers use. There are more than 10 million names in Name Bank but the local scientists here at MBL work on approximately 200 species. We tailor the search to use all the names associated with those species and bring back the latest publications that are served to them from the Library’s home page.

We now look at the library as part of the EOL and BHL projects not as information in a physical or virtual place, but as activity that is ever evolving! The Encyclopedia of Life with all its component parts will be a monumental gift to all biodiversity endeavors and for those species that go extinct in the next few hours — possibly the only place that you will be able to find information about them will be the great literature that is being scanned by these libraries.

Support for this project comes from a number of sources with the initial funding coming from The Alfred P. Sloan Foundation and the John D. and Catherine T. MacArthur Foundations. End of article

 

About the author

Cathy Norton is the Director of the MBLWHOI Library.
E–mail: cnorton [at] mbl [dot] edu

 

Notes

1. C.O. Whitman, 1888. “Biological Bulletin, Address to the Corporation,” Biological Bulletin, volume 1, number 1.

2. A. De Sandre–Giovannoli, R. Bernard, P. Cau, C. Navarro, J. Amiel, I. Boccaccio, S. Lyonnet, C. Stewart, A. Munnich, Ma. Le Merrer, and N. Lévy, 2003. “Lamin A Truncation in Hutchinson–Gilford Progeria,” Science, volume 300, number 5628, p. 2055.

 


Editorial history

Paper received 18 July 2008.


Creative Commons License
“The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informatics and Beyond Web 2.0” by Catherine Norton is licensed under a Creative Commons Attribution–Noncommercial–No Derivative Works 3.0 United States License.

The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informatics and Beyond Web 2.0 by Cathy Norton
by Cathy Norton
First Monday, Volume 13 Number 8 - 4 August 2008
http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/2226/2013