Cyberinfrastructure task force seeks input on data-driven science

Got data?

Got lots of data? The Cyberinfrastructure Task Force wants to hear from you.

This week the task force has distributed a survey asking Cornell researchers what they need in order to do data-driven research. The underlying message is that to process the incredible amounts of data involved in research, Cornell needs to upgrade its computing facilities.

The Data-Driven Science Cyberinfrastructure Task Force was formed late last spring by the Cornell Theory Center (CTC) to come up with a plan of action to upgrade Cornell facilities and position the university competitively for cyberinfrastructure grants. It currently includes about 50 faculty members and information technology professionals, chaired by Tony Ingraffea, the Dwight C. Baum Professor of Engineering and acting director of CTC.

The Cornell Survey Research Institute is conducting the survey, which is initially polling about 80 principal investigators on research projects that use CTC facilities and services. It asks them what additional resources, including both hardware and expertise, they might want from CTC, and how these resources might improve the ability to obtain research funding.

The goal, Ingraffea said, is to help demonstrate the importance of cyberinfrastructure and identify potential areas of improvement or expansion.

Cyberinfrastructure is the stuff much of modern science needs to operate, including computer hardware and software, data storage, data communication networks and collaboration technologies. Computers can record millions of data points from ongoing experiments or collect historical information as fast as humans can scan it in. Ironically, the ability to process all that data is not always keeping up with the ability to collect it.

Among local examples: A sky survey by the Arecibo radiotelescope to find new pulsars is generating a terabyte (billion bytes) of data a day. A study of the sociology of the Internet is transferring 300 to 500 gigabytes of data a day to Cornell over Internet 2. Cornell's Institute for Social and Economic Research will be working with the 2010 census that will include more questions and assemble more data than ever before.

The National Science Foundation (NSF) has declared cyberinfrastructure of critical importance to the future success of research universities and is soliciting proposals for research into improving the infrastructure and projects that take advantage of large-scale computing capacity.

Cornell has been a leader in providing high-performance computing resources to its researchers, but now it is slipping behind, Ingraffea says. About a year ago, CTC's Velocity-3 Cluster, rated at 2.1 teraflops (trillions of floating-point operations per second), was one of the top 300 supercomputers in the world. Because others have moved ahead, it is now not even in the top 500, he reports. Meanwhile, the NSF is soliciting proposals to build petaflop machines (quadrillions of flops).

Along with processing power, Ingraffea says, the university needs more data-storage capacity, wider "pipes" (fiber-optic communication lines) to move data around campus and more programmers with expertise to manage huge databases. Finally, he says, there is a need for space for all of the above. The machine room in Rhodes Hall, which houses CTC computers, is already so overcrowded that the heat generated by the computers is overpowering the air conditioning.

Faculty members not already on the CTC survey list but who would like to participate should contact Linda Callahan at (607) 254-8610 or cal@tc.cornell.edu. New members are welcome to join the task force, which meets at 1:30 p.m. Wednesdays and 4 p.m. Mondays in alternate weeks. Contact Mary Yetsko at (607) 254-869 or yetsko@tc.cornell.edu for room location.

Media Contact

Media Relations Office