SmartSeer: Using a DHT to Process Continuous Queries Over Peer-To-Peer Networks

Jayanthkumar Kannan^a
kjk@cs.berkeley.edu

Beverly Yang^b
byang@stanford.edu

Scott Shenker^c
shenker@icsi.berkeley.edu

Puneet Sharma^d
puneet@hpl.hp.com

Sujata Banerjee^d
sujata@hpl.hp.com

Sujoy Basu^d
basus@hpl.hp.com

Sung-Ju Lee^d
sjlee@hpl.hp.com

^aDepartment of Computer Science, University of California, Berkeley, CA
^bDepartment of Computer Science, Stanford University, Stanford, CA
^cUniversity of California at Berkeley and International Computer Science Institute, Berkeley, CA
^dMobile & Media Systems Lab, Hewlett-Packard Laboratories, Palo Alto, CA

Abstract

As the academic world moves away from physical journals and proceedings towards online document repositories, the ability to efficiently locate work of interest among the torrent of newly-generated papers will become increasingly important. To aid in this endeavor, we designed SmartSeer, a system that allows users to register personalized continuous queries over the CiteSeer database of technical documents. Users are then alerted whenever papers that match their queries are put online. SmartSeer has two main design requirements. First, to allow effective information retrieval, it should support rich continuous queries (as opposed to simple keyword searches). Second, to make effective use of donated infrastructure, it should be capable of running on a loosely maintained group of unreliable machines spread across multiple organizations (as opposed to assuming a reliable and tightly coupled distributed system). Existing work on distributed continuous query systems fails at least one of these requirements. Our design for SmartSeer is based on Distributed Hash Tables (DHTs), and thereby leverages previous work on DHT-based query systems. A prototype of SmartSeer has been implemented and evaluated on Planetlab. Though we evaluate our design only for the SmartSeer application, we believe it also provides useful insights into other distributed and rich continuous query systems (web alerts, news alerts etc).

PDF (182 KB)