Internet data streaming into Cornell will provide new insights into social networks

Millions of bytes of data now streaming to Cornell from the massive Internet Archive will give social and information scientists an unprecedented playing field for research into social networks, sociology department chair Michael Macy told a packed room in the Industrial and Labor Relations Conference Center April 26.

The archive's more than 40 billion Web pages, along with blogs, e-mail messages and newsgroups, will give researchers insight into social networks and help them develop advanced tools for further social sciences research applications.

"We have enormous stores of data about individuals and groups but relatively little data on the structure of social ties and what goes on inside the links," Macy said. "Online interactions leave a digital trace, creating an unprecedented opportunity to study social life at the relational level."

Macy's talk was an update on "Getting Connected: Social Sciences in the Age of Networks," a three-year theme project sponsored by the Institute for the Social Sciences at Cornell.

Macy leads the interdisciplinary project's 10-member research team and is principal investigator on a related Cybertools project, which was funded last fall by a $2 million National Science Foundation research grant.

The nonprofit Internet Archive features snapshots of all data on the World Wide Web, collected every two months over 10 years from 1996 to 2005. Data is now streaming from the archive's servers in San Francisco to a computer server in Cornell's Theory Center at a rate of 300-500 gigabytes per day. Researchers at Cornell hope to have a third of the archive transferred by the end of 2007.

One possibility for this mine of information could be the ability to reconfigure the data into a relational database, Macy said. "The relational format allows users to download small subsets of data based on their criteria," such as keywords or dates, he said.

"Whereas 'old school' network analysis focuses on finding positions of power and equivalence, the new science of networks emphasizes what goes on in the interactions, and how nodes make choices, including the choice of partners, as they influence one another in response to the influences they receive," Macy said.

Cornell sociologist David Strang is studying how corporations are connected and how people carry innovations with them when they change jobs. "One idea is to develop systematic data on the movement of executives across firms -- data that hasn't been available," he said.

The networks team also includes John Abowd, director of the Cornell Institute for Social and Economic Research; David Easley and Larry Blume, economics; Jeffrey Prince, applied economics and management; Jon Kleinberg, computer science; communication department chair Geri Gay; and Kathleen O'Connor and Dan Huttenlocher of the Johnson Graduate School of Management.

Other areas of research in the networks theme project include employment networks, data searches, distribution of thresholds (or tipping points), innovations that succeeded and those that crashed, the diffusion of technology among early adopters, degrees of separation, how networks affect markets and prices, the small-world properties of networks and how communities form, including those on Myspace, Facebook and Livejournal.

Ultimately the related Cybertools project will be a demonstration to secure longer-term funding to make this data both useful and available, at researchers' fingertips.

"The immediate goal of Cybertools is to advance knowledge of networks by promoting collaborations across disciplines and institutions," Macy said. "The hidden agenda is that Cornell is poised to become a leading center for computational social science by linking the talents, tools and expertise of social, computer and information scientists. It isn't just about networks, it is a network in action."

Media Contact

Media Relations Office