Level: Introductory Todd Sundsted (todd-p2p@etcee.com), Vice President, Focus, Etcee LLC
01 Nov 2001 To accomplish useful work, peers in a P2P application must be able to find and interact with one another. In his continuing examination of P2P computing, software developer Todd Sundsted describes several ways to accomplish this task -- called discovery -- and the relative strengths and weaknesses of each.
Peer-to-peer applications are massive yet fine-grained. Individual peers pop in and out of existence -- each intent on its own agenda. During their brief moments of wakefulness they attempt to accomplish the tasks set out for them. Most of these tasks involve interacting with other peers. The governing architecture under which peers operate must provide a number of essential services to the peers that compose the gestalt P2P application. We've covered communication and security services (see Resources) in our journey in P2P computing; now it's time to examine the peer discovery service. A peer discovery service enables peers in a P2P application to locate one another so that they can interact. There are many ways to implement a peer discovery service. We'll begin by looking at the simplest way to educate peers about each other's existence: explicit point-to-point configuration. Explicit point-to-point configuration
Explicit point-to-point configuration isn't really a discovery mechanism, as much as it's a way to avoid having to implement one. Each peer comes into existence knowing the other peers that inhabit its P2P world. The term point-to-point means that every peer in a P2P application knows about and is wired to every peer it will ever interact with. The wiring doesn't having to be complete -- it may not be possible to wire every peer to every other peer -- however, failure to do so, whether intentional or not, will create network blind spots for some peers. The simple P2P application I've built for this series of articles (see Resources) uses explicit point-to-point configuration. Each peer must be preconfigured with the address of every other peer. If you've downloaded and used the code, you've undoubtedly experienced frustration from living with this requirement -- configuration is tedious and error prone, not to mention a general nuisance. In general, explicit point-to-point configuration of nodes in a distributed application doesn't scale well to large networks of nodes. That's why distributed computing applications and technologies have always (with a few notable exceptions) included naming and locating functionality. It also explains why the Domain Name System (DNS), a distributed naming system, eventually replaced the hosts file mechanism for machine naming. Maintaining hosts files is tedious, error-prone and generally unworkable in large network environments. Explicit point-to-point configuration isn't all bad, however. The lack of flexibility that characterizes point-to-point addressing also brings with it a certain measure of security. By preconfiguring every peer in a network with the list of peers it knows about and is willing to talk to, the network is hardened against outside attack.
Dynamic discovery models
In stark contrast to the static nature of the explicit point-to-point configuration approach, stands the dynamic nature of the directory services and network models. These models are often a better match for P2P applications, which tend toward the dynamic side of the spectrum. In the following sections we'll look at three different mechanisms by which peers can dynamically locate other peers and learn about the environment of which they are part. The directory services model
In the directory services model, one or more special-purpose servers provide directory services to peers. To maximize scalability, applications are architected so that a small number of directories serve a far larger number of peers. Peers register information about themselves (their name, address, resources, and metadata) with the directory service, and use the directory service to locate other peers based on queries against the information in the
directory. Figure 1 illustrates a P2P architecture that uses directories to provide location and naming services to peers. Directories may themselves be peers (albeit fat ones), or they may serve no other function than acting as directories.
Figure 1. The directory services model

Directories come in two flavors. The flavors are loosely distinguished by the degree to which a directory is centrally managed and administered. The best example of the directory model in the P2P realm is provided by Napster and its OpenNap clones. In the Napster model, directories enjoy centralized management and administration. While centrally managed directories have come under fire as being "non-P2P" in spirit, and have actually made it easier for Napster to be shut down, they do offer one important advantage: centralized management and administration makes it easy to ensure that server hardware and configuration is sufficient to meet quality-of-service objectives. If we step outside the P2P space for a moment, we can see that DNS is an excellent example of a decentralized directory. Like the Internet itself, DNS was designed to function even in the face of severe disruption of parts of its network. DNS directories are arranged hierarchically, with the root directory for a top-level domain (like "com") delegating responsibility for servicing sub-domain queries (for domains like "etcee.com") to DNS servers further down the hierarchy. In either case, only the location of the directory has to be configured into each peer -- an important advantage over the point-to-point model. To join the P2P party, peers register
themselves with the central directory. Recall the figure above. When peer A desires to interact with a peer that it doesn't already know the location of, it sends a request to the directory. The directory in turn returns the location to the peer. The network model
Figure 2 illustrates a different kind of P2P application. It consists of many peers, all of which are similar in functionality. There are no specialized directory servers. Peers must use the network of which they are part to locate other peers. Figure 2. The network model

As its name suggests, a network model P2P application consists of a (usually dynamic) collection of peers. No single peer knows the structure of the entire network or the identity of every peer participating in the network. Instead, peers know only of the peers
with which they are in direct communication -- they participate in the larger network vicariously. Peers must cooperate to carry out tasks. In many cases this cooperation includes support for distributed queries, distributed messaging, and even authentication and authorization activities. Because of the sizes involved, bulky network operations like file transfers usually take place directly between peers -- not over the network of peers. Consider the network in Figure 2 above. When peer A desires to know the location of another peer in the network, it formulates a query and passes the query to its neighbors. These neighbors try to satisfy the request. If they cannot satisfy the request entirely, they pass the request to their neighbors, and so on. To take part in the network, one peer locates another peer in the network that's willing to accept it as a neighbor. But how does one peer locate another peer when it's not yet part of the network? One possible solution is to provide the peer with a list of peers to check. The peer tries to contact the peers on the list until one or more peers accept it as a neighbor. This solution sounds a lot like the point-to-point model, doesn't it? As early users of the original Gnutella can attest, this solution is only moderately effective. Because P2P networks, and Gnutella in particular, are so dynamic, any static list is unlikely to be valid for very long. It's interesting to examine further iterations on the solution to this problem in the case of Gnutella. Gnutella implementations first began by catching and persistently storing the locations of other peers as queries from those peers circulated through the network. When the client relaunched after being shut down, it attempted to connect to each of the previously identified peers until it found one or more still running. This approach, while largely automatic, is inefficient and fragile. Later clients improved on this scheme by adding support for downloading a list of active peers from a central cache (see
Resources for examples). An interesting aspect of this model is that the peers comprising the network take an active role in supporting the peer discovery process. As we shall soon see, active peer participation is not a requirement. The multicast model
The multicast model is like the network model except the nodes in the network don't necessarily assist with the discovery. Instead, this model takes advantage of features offered by the network itself to locate and identify peers and resources. Implementations of this technology (Project Jxta from Sun Microsystems being an excellent example; for more information on Jxta, see Resources) use IP Multicast to effect the lookup.
Unlike unicast IP datagrams, which are sent from one host to at most one other host, multicast IP datagrams can be sent to multiple hosts simultaneously. More importantly, the sender doesn't need to know how many receivers exist or whether any exist at all. The sending host simply packs up a message and releases it into the network. All clients that are tuned to the proper channel (a combination of special IP address and port number) will receive a copy of the message. Discovery using IP Multicast technology works by having peers periodically announce their existence using multicast. The message contains the TCP/IP hostname and port number of the peer. Interested peers detect this message, extract the host name and port number, and use this information to establish a regular TCP/IP connection with the new peer. This is how multicast works on a single subnet. Routing multicast traffic across the subnets that make up a network is a completely different and very complicated subject. It is also the principle limitation to IP Multicast-based discovery. Without router support, IP Multicast-based discovery is limited to peers on the same subnet. Unfortunately, the Internet is not multicast friendly. Discovery over the Internet (or a large intranet) is typically facilitated by special peers that span the network boundaries and copy messages across.
Conclusion
Peer discovery for P2P applications is an interesting topic. Next month we'll examine an implementation of peer discovery that uses IP Multicast to locate active peers. The addition of this code to the simple P2P application will eliminate one of its biggest problems --
point-to-point configuration of peers.
Downloads | Description | Name | Size | Download method |
|---|
| Sample code peer A | j-p2pdiscpeera.zip | 45 KB | HTTP |
|---|
| Sample code peer B | j-p2pdiscpeerb.zip | 45 KB | HTTP |
|---|
| P2P application framework source code | j-p2pdiscsrc.zip | 39 KB | HTTP |
|---|
Resources
About the author  | |  |
Todd Sundsted has been writing software since computers became available in desktop models. His interests include security, distributed computing, and the dynamics and emergent behavior arising from massively fine-grained systems. In addition to writing, Todd codes. Contact Todd at todd-p2p@etcee.com. |
Rate this page
|