WWW2007: Program
Top of Menu Home CFP Program Committees Key Dates Location Hotel Registration Students Sponsors Media Submission Tutorials Workshops Travel Info Proceedings

Poster Papers

Track: Social Networks

Paper Title:
Parallel Crawling for Online Social Networks

Authors:

  • Duen Horng Chau (Carnegie Mellon University)
  • Shashank Pandit (Carnegie Mellon University)
  • Samuel Wang (Carnegie Mellon University)
  • Christos Faloutsos (Carnegie Mellon University)

Abstract:
Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers that work independent of each other? In this paper, we present the framework of parallel crawlers for online social networks, utilizing a centralized queue. To show how this works in practice, we describe our implementation of the crawlers for an online auction website. The crawlers work independently, therefore the failing of one crawler does not affect the others at all. The framework ensures that no redundant crawling would occur. Using the crawlers that we built, we visited a total of approximately 11 million auction users, about 66,000 of which were completely crawled.

PDF version

























sponsors