The entire Cornell Web space will be archived

The good news: Practically anything you might want to know is on the Web.

The bad news: A lot of what you might want to know is only on the Web, and it could go away forever.

So Cornell has enlisted the expertise of the Internet Archive's Archive-It service to create periodic snapshots of the entire Cornell Web space -- some 8 million files. The archives will be available to anyone accessing the Internet Archive online, and copies will be delivered to Cornell for local storage.

"By Cornell policy certain information must be preserved in the University Archives," said Dean Krafft, director of information technology and chief technology strategist for Cornell Library, who will oversee the project, "but a lot of that information is no longer printed on paper. There are cases where significant research money has gone into creating resources that we want to be sure are saved in the long run."

As examples he cited the Program on Breast Cancer and Environmental Risk Factors, which closed in March, 2010 and whose website is at risk, and the New York State Integrated Pest Management Program, about to lose its funding.

The agreement with the Internet Archive also provides for preserving other scholarly and historically important sites outside of Cornell. Examples include human rights websites that are in danger of being closed down or at risk websites in areas such as Southeast Asia or the Middle East, according to John Saylor, associate university librarian for scholarly resources and special collections. Librarians designated as "selectors" for subject areas across the university, who decide what books, journals and other information resources to buy, will similarly choose websites to be preserved. Faculty may recommend sites by contacting the selector for their area.

The initial cost of the service, estimated at about $8,000, will be shared by Cornell Library and Cornell Information Technologies.

The Cornell Web space will be crawled once each semester and possibly a third time each year, Krafft said. Preliminary testing is going on now, he said, with the first complete archive to be created in the fall.

 

Media Contact

Blaine Friedlander