Much of the research in the Library of Life, as at the BTR Center itself, will involve the challenges presented by the completion of the Human Genome Project, in which the basic fingerprints of humans, the DNA sequences, have been recorded. In future decades researchers will attempt to unravel biological functions and discover medical benefits. This, however, is only a small part of the information that is necessary to understand life. Of the 20 million known species on Earth, only a tiny fraction of genomes have been sequenced. And genomes do not code ecological relationships and complex environmental effects, which need to be recorded and modeled separately.
Cornell University is one of a handful of universities in the world making investments in excess of $500 million to modernize life sciences research and education programs. Through its New Life Sciences Initiative, Cornell is engaging several hundred researchers across its campuses in Ithaca and at the Weill Cornell Medical College in New York City in a broad program of education and investigation, integrating life sciences with physical, engineering and computational sciences.
Steven Tanksley, the Liberty Hyde Bailey professor of plant breeding at Cornell University, says that the collection of data at the Library of Life is expected to take decades, by many research groups throughout the world. "Future advances in medicine, agriculture and environmental sciences will critically depend on the Library of Life," he says.
Tanksley says that collecting, cataloging and connecting data "will evolve into the new basis for creativity and discoveries about the origins, mechanisms and interconnectedness of life forms, and from that information we will embark on a new future on how we feed and clothe ourselves. This information will also expand and, in some ways, change how we view ourselves, as the human species, in the larger context of life and the universe."
The library's director, Ron Elber, professor of computer science at Cornell, says that the aim of the library is to assemble a digital catalog and living samples of all microbes, fungi, plants, insects, invertebrates and vertebrates in the region, creating a Library of the Desert. It is because the desert environment is not rich in life forms that comprehensive analysis of life sciences for this specific environment might be feasible in a relatively short time, he says. "This is important since the Library of Life will need to show some tangible outcomes in a few years. Hence, besides the obvious economical and ecological benefits to the region, the Library of the Desert will provide a prototype for the Library of Life and will sketch the structure for libraries of other regions richer in alternative life forms and more challenging to handle."
The complex nature of the data, he says, will require the development of new software and new database systems. "We will need to handle new information at an unprecedented scale as well as to integrate many existing databases. This is a very major undertaking -- besides the obvious challenge of collecting the data."
Making the Library of Life's huge data set accessible over the Web also will require a number of technical breakthroughs. A new language will be created integrating classification schemes of different life science disciplines, making it easy to navigate between the biology of the small and of the large. "The ties between biology and the information sciences have always been deep; this project will generate many hard questions for computing and information science, and provide opportunities to apply our technology to meeting basic human needs," says Robert Constable, dean of the Faculty of Computing and Information Science at Cornell. "We will be challenged to find ways to integrate the many databases being created for the life sciences and to organize them to facilitate problem solving, discovery and education."
To enable this rapid exploration of data and comprehensive mathematical modeling of life on Earth, data structures and query languages will be created, guided by a think tank of Cornell researchers -- in time to include experts from around the world -- in the biological, computer and physical sciences. For example, the large-scale data integration will make it possible computationally to examine the effects of drug molecules on their environment and ecology.