Cornell Theory Center and other founders of NYSGrid link to share and crunch huge amounts of data

The Cornell Theory Center (CTC) has joined other academic and research institutions in New York in founding the New York State Grid (NYSGrid) to share computer applications and data storage and run demanding applications on more than one computer at a time.

So far, NYSGrid includes 20 academic institutions and four supporting members. Linda Callahan, executive director of CTC, is a member of its governing board.

NYSGrid will create "middleware" to free users from the complexities of navigation. A researcher at the University of Rochester, for example, might have a computer model that simulates the propagation of pollutants in Lake Ontario. NYSGrid could collect data from, say, Buffalo, Rochester and Syracuse, arrange for processing time on a computer at Brookhaven National Laboratory in Upton, transfer the results to a graphics program at Rensselaer Polytechnic Institute in Troy, and return a video visualization.

"But the end user will be unaware of everything going on under the hood," explained NYSGrid executive director Russ Miller, professor of computer science and electrical engineering and director of the Center for Computational Research at the University of Buffalo. It will be "Go run it, send me the answer when it's done," he said. Eventually, he added, non-research institutions may be able to connect to and use the grid as a teaching resource.

The system also will enable "grid computing," in which large jobs are broken into pieces that run in parallel on several computers in different locations.

One of Cornell's first projects in using this capability will be an analysis of gene fragments expressed in plants, led by Jaroslaw ("Jarek") Pillardy, senior researcher in the Computational Biology Service Unit (CBSU) at CTC. The project will go through hundreds of thousands of nucleotide sequences and see where they occur in all existing databases of plant and animal genomes. It was originally set to run on CBSU's cluster computer with over 1,100 parallel processors, but running on the grid will provide a system at least 10 times as large, Pillardy says.

Other Cornell projects that could benefit from grid computing include the Web Lab, a virtual laboratory for social science researchers that will collect about 40 billion Web pages, and the archiving and mining of large astronomical surveys from the Arecibo Observatory telescope.

In return, Cornell will contribute expertise to the grid. "Cornell has a number of leading-edge computing, data, visualization, networking and experimental facilities, as well as critical expertise in grid computing and cyberinfrastructure," said Anthony Ingraffea, acting director of CTC. "Our land-grant status makes us a natural wellspring for New York state's emergence as a major cyberinfrastructure player."

NYSGrid grew out of the New York State Workshop on Data-Driven Science and Cyberinfrastructure, convened at Cornell in July, and a follow-up workshop at Rensselaer in September, which led to creating a statewide cyberinfrastructure initiative.

Since funding agencies are offering more opportunities for projects in high-performance computing, organizers believe collaborators in NYSGrid will be better prepared to compete.

"With the emphasis on support for cyberinfrastructure among federal agencies and the White House, we believe it is an opportune time to assemble institutions of higher education in New York state to develop a collaborative plan for increasing [the state's] competitiveness," said Robert Richardson, Cornell vice provost for research, who spearheaded the July workshop.

Current participants in NYSGrid are Cornell, Alfred University, Brookhaven, Columbia University, Weill Cornell Medical College, Hauptman-Woodward Medical Research Institute, Marist College, Memorial Sloan-Kettering Cancer Center, New York University, Niagara University, Rochester Institute of Technology, University of Rochester, Rensselaer, Syracuse University and the State University of New York campuses at Albany, Binghamton, Buffalo, Geneseo and Stony Brook. The project is also supported by Internet2, National Lambda Rail, NYSERNet and the Open Science Grid.

A preliminary grid has been established connecting about half of the institutions. Another workshop in January will bring programmers and scientists together to develop a more sophisticated system.

Media Contact

Media Relations Office