Sept. 27, 2011

The arXiv at 20: a global resource

As the e-print arXiv of scientific publications celebrates its 20th anniversary, what started as an effort to "level the playing field" for researchers has created a whole new playing field on which the white lines are still not clearly drawn.

Long before Paul Ginsparg, professor of physics and information science, created the arXiv, it was common practice for scientists to circulate to their friends a few paper preprints of their articles before the articles appeared in scholarly journals. Ginsparg created a simple website -- at first housed in a single workstation on a cluttered desk -- to turn the preprints into "e-prints" made available electronically.

"I've heard a lot about how democratic the arXiv is," Ginsparg said Sept. 23 in a talk commemorating the anniversary. People have, for example, praised the fact that the arXiv makes scientific papers easily available to scientists in developing countries where subscriptions to journals are not always affordable. "But what I was trying to do was set up a system that eliminated the hierarchy in my field," he said. As a physicist at Los Alamos National Laboratory, "I was receiving preprints long before graduate students further down the food chain," Ginsparg said. "When we have success we like to think it was because we worked harder, not just because we happened to have access."

The idea caught on, submissions multiplied, the subject matter expanded to include mathematics, astrophysics, computer science and, most recently, biology, and the arXiv has inspired online repositories in other fields. Ten years ago Ginsparg joined the Cornell faculty, bringing the arXiv with him. It is maintained by Cornell University Library.

As of September 2011, the system had accumulated more than 700,000 papers, with more than 6,000 new submissions arriving each month; in 2010 65 million full-text articles were downloaded. The history of submissions reflects the history of public acceptance, Ginsparg noted. The number of submissions in high-energy physics increased year by year but finally leveled out, meaning, Ginsparg said, that the field had reached saturation. Despite all this traffic, Ginsparg remarked, the system is still using some of the software written 20 years ago.

One of the surprises, Ginsparg said, is that electronic publishing has not transformed the seemingly irrational scholarly publishing system in which researchers give their work to publishing houses from which their academic institutions buy it back by subscribing to journals. Scholarly publishing is still in transition, Ginsparg said, due to questions about how to fund electronic publication and how to maintain quality control. The arXiv has no peer-review process, although it does restrict submissions to those with scientific credentials.

But the lines of communication are definitely blurring. Ginsparg reported that a recent paper posted on the arXiv by Alexander Gaeta, Cornell professor of applied and engineering physics, was picked up by bloggers and spread out from there. The paper is to be published in the journal Nature and is still under a press embargo, but an article about it has appeared in the journal Science.

Electronic publishing makes supporting data available alongside a paper and offers new ways to manage information, including searches, data mining and detection of plagiarism. "Concept searching," Ginsparg said, is a sought-after feature but is still in the works. And Facebook-style social interaction may become part of the system, he said.

But however wide the new field grows it is still level. The arXiv is, Ginsparg concluded, a global resource "where everybody has access to the same information on the same system."