Ph.D. Theses

Techniques and Data Structures for Efficient Information Access and Retrieval in Distributed Networks

By Ryan LaFortune
Advisor: Christopher Carothers
March 14, 2008

Networks were designed and continue to exist to allow for fast and convenient access to remote data. With data scattered across a large network, there exists a fundamental challenge to efficiently find any sought data. This challenge is further complicated when the data is periodically relocated in the network, as is the case with wireless sensor networks. Thus, a solution to the problem necessitates a data structure with the ability to update in response to object relocations. A trivial solution to the problem uses a centralized directory responsible for knowing the location of all data at all times, and directing all querying nodes to the location of the data they seek. Dependence on one node to provide directions results in a single point of failure, and may cause some queries to be unnecessarily long, especially when the sought data lies at a node topologically close to the querying node. A better solution to the problem uses a distributed directory, where all queries are answered quickly regardless of the whereabouts of the querying and storing nodes. In this thesis, we provide significant improvements to previous distributed directory solutions by creating innovative algorithms that improve the structural properties of sparse covers, the underlying data structure from which a directory is built. Specifically, we improve directory performance to Stretchfind = O(log n) and Stretchmove = O(log n) for H-minor free graphs (a savings of log n in each measure), and Stretchfind = O(1) and Stretchmove = O(log n) for planar graphs, unit disk graphs, and other graphs with a constant- stretch planar spanner (an additional savings of log n in Stretchfind).

Once data is located in the network, it must then be retrieved (delivered). In a simple world, this delivery would be between a single source and a single destination. The possibilities for delivery techniques increase greatly when there are many sources and destinations, like in peer-to-peer (P2P) networks. P2P networks have gained much attention in recent years due to their scalability and fault-tolerance, and also their potential to drastically reduce distributor transit costs. In order to study the dynamics and causal relationships between peer entities in these complex overlay networks, we have developed a flow-based discrete-event simulator and abstract Internet topology model that accurately and realistically model today's broadband service, at a scale larger than previous efforts. Specifically, our model can scale to hundreds of thousands of peers, where prior efforts peak at only a few thousand. Using detailed simulations, we have improved the efficiency of data dissemination and reduced distributor transit costs for both the time-insensitive mass- download scenario and the real-time streaming scenario.

Return to main PhD Theses page