Sept. 26, 2013

Cornell gets archived – digitally – twice a year

Every word on the Internet can be ephemeral and fleeting – which presents unique challenges for Cornell’s digital archivists, whose mission is to pin down information and preserve it so that researchers can use it in the future.

At the library, staff members from two departments – Library Technical Services and the University Archives in the Division of Rare and Manuscript Collections – are trying to capture the university’s intellectual output by archiving all of the websites in the cornell.edu domain.

The process began in 2011. Now, several collections are available through Archive-It, a paid service of the Internet Archive, which developed the Wayback Machine. Both the Internet Archive and Archive-It rely on the Wayback Machine for public access to archived websites.

The archive allows users to find older versions of Web pages and content that is no longer available online, such as the popular “Dear Uncle Ezra” column.

Because websites are not static entities, the group had to set up a process that can be repeated at regular intervals. Much of cornell.edu domain has already been archived twice, with plans to redo it every January and June.

“We will continue to improve the archiving process each time we run it,” said Jason Kovari, metadata librarian for humanities and special collections. “Web archiving allows us to archive cornell.edu as it appears at determined moments in time, so that researchers can view the progression of change.”

The project doesn’t end with cornell.edu. The library is also working on capturing the websites of the Cornell Cooperative Extensions, about 300 student organizations outside the cornell.edu domain and dozens of organizations that use Cornell as an official repository.

Only public sites, available to anyone through cornell.edu, are preserved. And, through Archive-It, the captured sites are easily searchable and open to anyone who wants to look at them.

Although archiving websites may be new, collecting information and materials from the present day is part of a long tradition for the library. Andrew Dickson White, Cornell's co-founder and first president, was careful to preserve pamphlets and scrapbooks from the Civil War and other events of his own time – most of which were not considered “collection-worthy” but which have proven invaluable for researchers.

“This project documents Cornell’s history,” said Liz Muller, head of Archival Technical Services and curator of Digital and Media Collections. “It’s a continuation of what we’ve been collecting – and continue to collect – in paper form – the same mission the library has always had, just in a new digital format.”

Gwen Glazer is the staff writer/editor for Cornell University Library.