Whatever happens with Google Books, HathiTrust will keep books online

While lawyers debate what Google can do with thousands of digitized books whose copyright status is in question, librarians want to make sure those digital books will always be around to read. One way, they have decided, is not to put all their eggs in Google's basket. As a result, Cornellians have a new way to search hard-to-find texts.

Over the past few years, Cornell University Library has scanned and digitized some 300,000 volumes, including many rare, out-of-print and deteriorating materials. Visual images of the pages are run through optical character recognition software to make the text fully searchable. Since 2007, much of this work has been supported by Google Books, which aims eventually to digitize about a half million volumes from Cornell's 8 million. Google has similar arrangements with the New York Public Library, 19 other universities and several European libraries.

Included are works covered by copyright and works in the public domain. Google makes all of the texts searchable, but viewers may see only small portions of copyrighted works, unless the copyright owner has agreed to make the full text available. But controversy surrounds so-called "orphan" works whose copyright owners cannot be found. Google set up a plan to pay into a fund that would remunerate copyright owners if they turned up, but the plan was rejected by a federal court ruling March 22. Further negotiations are under way.

Google's goal is to make everything accessible to everyone, but librarians take a longer view. "There is no guarantee that the books currently stored on Google's servers will be available five, 10 or 20 years from now," said Oya Rieger, associate university librarian for digital scholarship services. So academic institutions have formed their own repository, operated by a consortium known as HathiTrust, which has 52 member institutions, including Cornell.

The primary goal of HathiTrust is to ensure long-term preservation of digital materials, but it also expects to make the texts available for computational research and to allow access to in-copyright material to the extent the law allows. Copyright law allows access to such materials for persons with print disabilities and when the originals are damaged, deteriorating, lost or stolen or not available at a reasonable market price. HathiTrust also is working to identify orphan works that are actually in the public domain. So far some 23,000 books have been examined, and approximately 57 percent of those have been found to be in public domain.

HathiTrust began with the books digitized by Google, but also is receiving materials from members' collections. It currently maintains more than 8.4 million volumes, including about 2.2 million in the public domain. What you can find there ranges from old Tom Swift and Tarzan novels to genealogy records from the Daughters of the American Revolution to the autobiography of Andrew Dickson White.

The HathiTust website at http://www.hathiTrust.org is open to the public at large, but as a member, the Cornell community will have access to additional features.

Hathi (pronounced "HAH-tee") is a Hindi word for elephant, used to symbolize memory and the fact that the project is huge.

Media Contact

Blaine Friedlander