The aim of the first part of this paper is to provide an overview of information retrieval in Peer–to–Peer (P2P) information systems in the file–sharing domain. Starting with a general overview of the concept of P2P information systems, the paper then focuses on five desktop–accessible P2P information systems: Napster with its clones OpenNap and eDonkey, and Gnutella and FastTrack (i.e., Kazaa). A detailed description is given of the attributes and properties of each P2P file–sharing information system, followed by an evaluation of the respective P2P file–sharing applications, taking each in turn and examining their respective strengths and weaknesses. This paper concludes with a critical comparative analysis and gives some suggestions for further investigation.

Contents

Introduction: P2P historical context
What is a Peer–to–Peer information system?
Order out of chaos: The Peer–to–Peer paradigm
Critical analysis: Tree or rhizome?
Conclusion: Ethics or technics?

Introduction: P2P historical context

The term Peer–to–Peer (P2P) appeared in the world media around the year 2001 with a music sharing system called Napster (Napster, 2003). What attracted the attention of the media was not so much the technical innovation of the system itself, but rather the "moral" transgressions that it so easily allowed: most search queries on the Napster P2P network concerned material covered by copyright law.

However, P2P is, strictly speaking, nothing new, but is more of an evolution than a revolution. P2P systems, in some essential ways, revert back to the original design of the Internet itself, where one of the important features was that each Internet host could FTP (file transfer protocol) any other Internet host directly. In subsequent years, with the growth of the Web, the Internet has become more and more restricted to a client/server–type modality.

What is a Peer–to–Peer information system?

Contrary to popular opinion, a clear–cut definition of P2P is not so easy. However, from the available literature, three main fundamental concepts or principles emerge which distinguish P2P from other Internet–enabled networks:

On a fundamental level, P2P means just what it says: any individual computer unit or "peer" (or "node") is "equal" and therefore directly communicates without intermediaries. In the case of P2P file–sharing applications, all peers are "equal," in the sense that they play the same role; no peer has "special responsibility" to control or supervise the P2P network behaviour.

P2P is a "self organising" network involving singular units of users (peers or nodes) hooked up together in caterpillar fashion. In the case of file–sharing P2P, each node forms a chain of interconnected "applications," making it easy for users to search millions of individual machines for any file in a short time frame. The topology of file sharing P2P is a decentralised, dynamic resource network, composed of ad hoc, heterogeneous peers, who join and leave the system at will, without restriction.

All P2P involves sharing of resources in one form or another. In the case of P2P file sharing (FS), this involves sharing media files of all sorts. Each peer "pays" or contributes its participation by allowing access to its computing resources.

All P2P networks involve "resource retrieval," this being the best term as it involves resources in the widest sense of the term and not only the content. Although P2P file applications are designed primarily with the intent of sharing files, they also share their processing power. P2P file–sharing applications involve all resources simultaneously. For instance, in the Gnutella P2P network, merely leaving the P2P application "open" allows traffic to be routed ("query routers"). Thus, the term "resource retrieval" is also used to denote "space/bandwidth." Sharing of computer resources, in one form or another, by direct and decentralised exchange, is a property of P2P technology in general [ 1].

One of the main features of P2P file sharing systems is that they offer an alternative to traditional Web client/server systems (C/S). Every peer acts both as a client and a server, termed as "servant." The file sharing P2P "servant" acts as either a client (a consumer of information) or a server (a supplier of information).

From a superficial analysis, the above definitions or principles could hold — but part of the difficulty in defining P2P is that, in practice, P2P file–sharing networks are a hybrid mix of pure P2P with a range of different and, in some cases, subtle C/S networking behaviour.

As a theoretical ideal P2P offers decentralisation and democratic vision. However, in practice, as in real life, some "peers" are more equal than others, as the following P2P examples will show.

Order out of chaos: The Peer–to–Peer paradigm

OpenNap (OpenNap, 2003), eDonkey, (eDonkey, 2003), Gnutella (there is no main governing body) and FastTrack/Kazaa (Kazaa, 2003), are the current major players in the P2P file–sharing application field. Each in turn has developed and emphasised new features.

The P2P FS family can be divided up into two main architectural designs: "centralised" and "decentralised" resource retrieval systems. However, this division is based on the ideal of "equality," referred to above, and hence the practical operation of P2P file–sharing needs further elaboration.

P2P is, strictly speaking, nothing new, but is more of an evolution than a revolution.

Thus, to be accurate and up–to–date with P2P design, these two spheres need to be divided further on a gradient from the extreme pole of the centralised model, represented by Napster, to the other extreme of the decentralised model of the early Gnutella.

Even the most decentralised systems such as Gnutella, in its current form, have elements of "centralisation." Degrees of control or structure (i.e. centralisation), in one form or another, are also present in the other "so–called" decentralised systems such as Kazaa.

On the other hand, eDonkey and OpenNap attempt to put a touch of "decentralisation" into their monolithic, Napster–inspired architecture design.

The Tree: Napster, OpenNap and EDonkey

Napster

While Napster’s architecture is no longer in operation, it is essential to analyse its structure as it is the starting point from which all the current P2P FS systems developed. Subsequent systems were designed in reaction to the technical and legal problems faced by Napster [ 2].

Napster has an inverted tree structure or, taking a cue from French philosophers Deleuze and Guattari (1987), an "arborescent" architecture. Napster has a central point or trunk, connected by a series of nodes (or peers) forming the branches. In other words, it has a distinct embedded C/S network architecture. It is centralised in the sense that, while the file exchange is decentralised (i.e., pure P2P), there is a central gateway or index, upon which all the "servants" (peers) rely.

The central Napster servers maintain directories of the shared files stored on the respective PCs of the peers logged in to the network. These directories/indexes are updated every time a user logs on or off the server cluster. Each time a user of Napster submits a query (in keyword form) for a particular file, the central servers create a list of the files matching the search request, by cross–checking the request with the server farm of files belonging to the peers who are currently connected to the network. The central server displays that list to the peer making the request. The peer can then select the desired file from the list and open a direct download link with the individual computer which currently possesses that file. Whereas the Napster servers route the queries, the download of the actual file takes place directly from one peer to another.

In Napster P2P FS protocol, when a peer logs on, metadata of all the files on its local directory is uploaded to the server and added to the index. The metadata includes file names, byte rate, IP address, port, reported bandwidth, etc. The Napster server farm maintains an index of the metadata of the files of each peer. Similarly, when the "servant" logs off, all the shared directory information is removed from the index. At any point in time, only the shared content of the active peers is present in the index.

Basically, Napster is a centralised information retrieval P2P FS, in the sense that it acts as a search and retrieval intermediary gateway through which the peers are obliged to pass. Napster’s P2P FS system coordinates the search and retrieval function of the network, by providing direct contact between query and location by means of its centralised index.

OpenNap and the eDonkey P2P networks

The architecture of both OpenNap and eDonkey is virtually identical to that of Napster, except for a fundamental difference: the server function is decentralised. This means that both networks consist of numerous servers. This design is called a "chained" server architecture: all the servers are linked up together but are independent (i.e., not centralised in the sense of one ownership/location as in Napster’s server cluster).

Both OpenNap and eDonkey networks consist of distributed servers with separate metadata indexes, but the resource retrieval systems operate in the same way as Napster. In effect, individual "servants" have to connect to a few main active servers but the networks are decentralised, in the sense that there are many more independent servers (spread throughout the world) via which the peers search and retrieve. The independent servers, which make up the respective networks are, nevertheless, linked to each other (inter–server communication). Anyone with sufficient processing power can run an OpenNap or eDonkey server (i.e., independent server communities clustered together create sub–networks within their respective networks).

In this chained architecture, the servers form a linear chain that is used for search and retrieval. On login, the peer’s metadata is added to the local server’s index; the indexes of the other servers remain unaffected. When a query is submitted, initially only the local server is searched. However, if the local server cannot find a sufficient number of results, it will forward the query to a remote server along the chain (i.e., network). The remote server will return the matching results back to the local server, which will then forward the results to the peer. Every server has a separate index of files.

To summarise, when a search request (by keywords) is sent to a server, firstly only the specific server that the peer is connected to is searched (local search). If there is no match, the other servers forming the OpenNap or eDonkey networks are searched until matches are found. Both networks support simultaneous downloads from multiple sources.

The Rhizome: Gnutella and FastTrack/Kazaa

Gnutella

Gnutella’s P2P FS architecture [ 3] has many superficial similarities with Napster and its clones, in the direct download and upload system described above. However, its major architectural difference is its "claim" to achieve a fully decentralised topology. Gnutella does not have an inverted tree or arborescent structure but, rather, has a "rhizome" architecture (like bulbs and tubers), to take another analogy from Deleuze and Guattari [4]:

"Let us summarise the principal characteristics of a rhizome: unlike trees or their roots, the rhizome connects any point to any other point."

Every point or peer links to any other peer, forming the overlay of the Gnutella network itself, without a centralised gateway. Some of Gnutella’s main ideologues claim it is the only "pure" P2P FS in existence.

Since there are no centralised or decentralised servers of any kind in the Gnutella network, the search and retrieval system is managed solely by the individual peers, who interact amongst themselves and, thereby, form a "self–organising" network (using the Internet as a connecting base). The peers are connected in an ad hoc manner and there is no restriction to the number of peers composing the network. The Gnutella network is entered using a Gnutella compatible servant (i.e., applications such as Bearshare or Limewire) that connects to several other servants on the network. The peers interact with each other by means of messages or "data packs."

Gnutella’s protocol adopts what is called a "flooding" mechanism to search and retrieve files of all types. Gnutella’s flooding mechanism for resource retrieval is a simple way of broadcasting search queries in a dynamically changing network, without requiring the use of routing/server intermediaries.

In more detail, a typical Gnutella decentralised search and retrieval model can be described as follows:

Servant G1 sends a search request/query which "floods" the network. The query is passed to each servent/peer in the Gnutella network. Each servant operates as a router for the query, which "hops" along the interlinked peers. If one of the servants at any point in the peer network, for example, servant G23, has a file which matches the request, it transmits the file metadata (name, size, IP address, port, etc.) back through the active peers in the pathway towards G1, the originator of the query. Gnutella’s protocol incorporates a mechanism which memorises the originator of the query as the match (i.e., the metadata) is passed down the chain. Hence, each query is assigned a unique ID at its source. A list of files matching the search request appears on G1’s servent display. G1 is then able to open a direct connection with servant G23, from which it downloads the file (or from others who possess the same file, as the current version of Gnutella’s protocol supports multiple source downloads).

To avoid overloading the network with queries, the Gnutella protocol assigns to each query a "Time–To–Live" limitation (TTL), i.e., the number of peers it can hop or pass through.

In design, the Gnutella network is a distributed file–sharing system, approaching as closely as possible the democratic ideal of P2P, with each peer connected to the network being considered as "equal." However, as stated earlier, this "democratic" perspective is slightly marred, when one looks beyond the "ideology" at how Gnutella’s current search and retrieval system actually works in practice. Saroiu, et al. [ 5] note that:

"... even though these systems were designed with a symmetry of responsibilities in mind, there is clear evidence of client–like or server–like behaviour in a significant fraction of systems’ populations."

A form of hierarchy emerges, despite every servant being considered equal.

For instance, faster Internet connections (T1 and broadband) are more suited for "server functions" than others. This has the implication that peers with lesser processing power can have undesired effects on the IR network as a whole (e.g., by acting only as "clients"). As stated earlier, Gnutella’s P2P FS has no central router and this results in peers with slower processing power creating "choke points" on the network.

At the time of writing, most versions of Gnutella compatible applications (i.e., servants), do in fact recognise these practical implications and have taken steps to exploit such inequality to benefit Gnutella’s search and retrieval network. A more hierarchical structure has been encouraged to shape the network. This is known as the "ultra–peer" system, which organises the network into a hierarchic two–tier structure of peers functioning either as "client–peers" or "ultra–peers," as described by Rohrs and Singla (2002).

A form of hierarchy emerges, despite every servant being considered equal.

One of the main benefits of the ultra–peer system, for the search and retrieval function, is to reduce the burden on the network by preventing client–peers with slow connections from acting as servers, thus allowing search queries to be routed more efficiently.

With the earlier Gnutella network, "low server" peers slowed queries to a crawl. Such peers stopped "serving" as effective nodes for the Gnutella networks and, instead, damaged the Gnutella search and retrieval function, by acting as dead–ends, through which network queries could no longer flow. This had the outcome of fragmenting the Gnutella network into disconnected components, thereby creating an incomplete search and retrieval system, with the search horizon becoming more limited.

Gnutella’s P2P FS has only recently begun to homogenise its different applications to use the ultra–peer system. Hence the precise function has not yet been stabilised. Ultra–peers are based on the Kazaa "super–node" architecture which, from the start, has recognised the hierachical approach, considered by Saroiu, et al. [ 6] to produce a highly efficient P2P FS search and retrieval system:

"If there is a high degree of heterogeneity amongst the peers, a well–designed system should pay careful attention to delegating routing and content–serving responsibilities, favoring server–like peers."

FastTrack/Kazaa

The FastTrack (used by the Kazaa servant software) search and retrieval network functions like Gnutella but exploits more fully the ultra–peer infrastructure, by using what it terms "super–nodes." Kazaa uses a proprietary (encrypted protocol) and thus the precise nature of the protocol is not entirely known. However, the literature outlines the search and retrieval structure described below.

Kazaa allocates network resources by designating those participating peers with the fastest processing connections to act as super–nodes. Those peers who have sufficient bandwidth and processing power are automatically allocated the function of super–node and, thereby, assume an added server function. The remaining peers, i.e., the "client peers," who are not attributed this server–like function, connect to one of the super–nodes assigned to them by the network. Thus, queries are forwarded to, and handled by, only a subset of special peers: the super–nodes. The client peer transmits an index of its metadata to the super–node to which it is connected.

As such, with the FastTrack network, the search and routing activities are carried out through super–nodes, which has the effect of placing the heaviest load on those most able to bear it. In essence, Kazaa has a two–level hierarchy, in which "high capacity" nodes shield "low capacity" nodes. While the super–node maintains an index of files and is responsible for routing the search and retrieval traffic of all the peers connected to it, the actual file is downloaded from peer to peer (of either level).

A search request is first handled locally by the super–node to which the client–peer is connected (i.e., by searching all the locally connected peers). If satisfactory matches are not found, the super–node sends the search request to the other super–nodes it is connected to, and so forth, until matches are found.

In summary, the FastTrack network, with its super–node resource retrieval mechanism, can be considered as a combination of the search and retrieval characteristics of both Napster and Gnutella. Queries are forwarded to the server–like super–nodes, much like a Napster client sent queries to the Napster server. However, instead of the super–nodes being centralised servers, they group themselves loosely together in distributed clusters, which act in a similar way to Gnutella’s decentralised search and retrieval system.

Critical analysis: Tree or rhizome?

The arborescent and rhizomatic resource retrieval architectures of the P2P FS systems, described above, define the limits of their respective search and retrieval processes.

Taking each architectural design in turn, two main areas can be identified as having the most impact on their P2P file–sharing resource retrieval environments.

Centralised architecture: The tree (unchained and chained)

Search horizon

The principle advantage of centralised architecture for resource retrieval is that having a central index provides a stable and homogeneous platform for quick and efficient search and download. All users have to be registered in order to be on the server’s network. As a result, search requests reach all connected peers, which ensures that the search horizon is complete to the extent possible [ 7]. Ideally, the central directory enables the constant update of the index of the files of all connected peers. Basically, in a "highly structured" centralised system, the P2P FS network topology is known and thus the location of files can be precisely determined.
However, the chained architectures of OpenNap and eDonkey can have a negative impact on search and retrieval performance, which can be inefficient if many servers are involved in answering a single query.

It could be argued that a potential disadvantage of these centralised models is that they risk to provide out–of–date information or broken links, if the central server’s index is refreshed only periodically. P2P file–sharing networks, of any category, experience peer changes every second and, consequently, the index may not be completely renewed with fresh data at any point in time. By definition, this problem of scaling is non–existent in decentralised P2P FS.

Fault tolerance: Technical and legal

Arborescent structures are by definition hierarchical and, hence, have single points of failure. Such centralised systems have low fault tolerance which renders them technically vulnerable. In effect, the network could collapse if one, or several, of the servers were to be incapacitated.

In addition, centralised systems are easily subject to legal attacks. In effect, it was the centralised design of the Napster network which enabled the recording industry to take Napster to court for copyright infringement.

Decentralised architecture: The rhizome

Search horizon

The primary virtue of the decentralised rhizome P2P FS systems is the flexibility of their resource retrieval mechanism and the fact that the lack of a centralised server enables any resource to be made instantly available to the whole network.

Although the flooding system of decentralised networks theoretically allows for infinite network expandability (and instantaneous data availability), with the implication that search and retrieval is varied and extensive, in practice this process is hampered by two key factors:

i) In the case of Gnutella and, no doubt, Kazaa, the Time–to–Live (TTL) mechanism, which controls the number of peers a search request can reach (in order not to overload the network), affects the completeness of the search and retrieval operation.

In effect, the TTL imposes a horizon on each user. In practical terms this means that, even though a certain resource is available on the network, it may not be visible to the seeker because it is too many peers away, i.e., beyond the horizon.

ii) The search and retrieval mechanism of a decentralized network is slower and more erratic than a centralised one. Gnutella’s flooding mechanism, by propagating all the queries across the network, lacks any kind of "intelligent search" facility, which could improve efficiency and speed by homogenising the ad hoc topology of its decentralised nature (for instance, by identifying which peer or route is more favorable to match the search request).

However, Gnutella’s recent addition of the ultra–peer system, and Kazaa’s super–node framework, have gone a long way to remedying this endomorphic phenomenon.

Fault tolerance: Technical and legal

The main advantage of decentralised P2P FS systems is their rhizome structure which prevents substantial failure of the network. If any point of the network is damaged, the search and retrieval process can still flow through a multitude of alternative lines of peer connections [ 8]:

"Due to its distributed nature, a network of servants that implements the Gnutella protocol is highly fault–tolerant, as operation of the network will not be interrupted if a subset of servants goes offline."

However, while this is essentially true, the later Gnutella’s "ultra–peer" and Kazaa’s "super–node" structures introduce a degree of centralisation. This renders the resource retrieval systems of both networks more efficient but, by the same process, sacrifices the high fault tolerance that would be the case if their decentralised rhizomatic nature was allowed to reign supreme. For instance, a substantial failure or removal of the "ultra/super" servants would result in an irrevocable fragmentation of the network.

P2P FS networks are also communities and therefore depend on the presence of a sufficient resource, in the sense of communal participation and cooperation, in order to function successfully.

Another feature of decentralised P2P FS networks is that they are more resistant to legal attacks. It is difficult for these networks to be closed due to legal action, as is evident from the fact that, at the time of writing, both Gnutella and Kazaa are still in operation, unlike Napster. Gnutella, for instance, is only a protocol and has no central administration. Indeed, the proponents of Gnutella consider its resistance to lawsuits to be one of its best features.

The same appears to be true for Kazaa:

"... stopping Napster, which indexed songs on its servers, was easy ... and a judge pulled the plug. With Kazaa, users trade files through thousands of anonymous ‘supernodes.’ There is no plug to pull." [ 9]

This may be true but, yet again, if legal attacks were to be targeted at certain Kazaa super–nodes, for instance those which upload the greatest quantity of files, the network would be severely damaged [ 10].

This consideration leads to another dimension, which is more of an "ethical" nature than a technical one.

++++++++++

Conclusion: Ethics or technics?

In addition to the technical aspects already discussed, another important consideration, which applies to all P2P FS networks, is the "ethical" dimension, which is also essential for an efficient resource retrieval function. The behaviour of P2P FS individual users affects the function of the overall network.

The phenomenon of "free–riding" is analysed by Adar and Huberman (2000) in their study of Gnutella. The "unethical" practice of free–riding, by which users download files but do not share them, applies to all P2P FS networks. Other examples of "unethical" behaviour, such as disconnecting frequently or deliberately misreporting connection speed and other system data, are also prevalent amongst users and, consequently, have a negative impact on the technical efficiency of the P2P FS search and retrieval mechanism.

P2P FS networks are also communities and therefore depend on the presence of a sufficient resource, in the sense of communal participation and cooperation, in order to function successfully. A heightened awareness of this fact, by both designers and users, would in itself have a far more beneficial effect than adopting a purely technical approach.

Future P2P FS resource retrieval information systems might benefit from shaping their technology to develop built–in incentives, in order to create an environment wherein sharing, and not just taking, resources becomes the norm and the original concept of "peers" being truly equal is allowed to flourish. End of article

About the author

Bryn Loban, Ba/Ma in Philosophy and Msc in Information Management (Sheffield University U.K.), currently works as a freelance content Web editor.
E–mail: uria32@libero.it

Notes

1. For instance, the SETI@home P2P (Search for Extra Terrestrial Intelligence Organisation, at http://setiathome.ssl.berkeley.edu) uses the latent power of desktop PCs to help analyze data. The SETI project clearly demonstrates the potential of harnessing the processor time and unused disk space of volunteer peers, as does the recent Freenet P2P resource retrieval system, which is based on the anonymous sharing of disk space.

2. Napster’s proprietary endeavour was eventually forced to shut down by the record industry (RIIA), due to its P2P FS system allegedly favouring copyright infringement. At the time of writing, Napster is still not in operation, despite the presence of its Web domain.

3. Gnutella has an "Open Source" licence which means that its research and development is as decentralised as its IS architecture.

4. Deleuze and Guattari, 1987, p. 35.

5. Saroiu, et al., 2002, p. 15.

6. Saroiu, et al., 2002, p. 6.

7. In the case of unchained architecture, there is an apparent contradiction in the literature as to whether Napster’s search and retrieval system was truly global (i.e., a cluster of approximately 160 servers) or merely local (the individual server to which the peer is connected). In addition, the literature is unclear about the issue of the precise refresh rate of the indexes of both types of centralised networks.

8. Gnutella, 2001, l.

9. Woody, 2003, p. 11.

10. Kazaa with its FastTrack network, unlike Gnutella, is a commercial/proprietary endeavour. The ongoing legal attacks directed at Kazaa have not yet succeeded, due to the difficulty in determining whether other forms of centralised control exist, over and above its super–node structure.

References

E. Adar and B.A. Huberman, 2000. "Free riding on Gnutella," First Monday, volume 5, number 10 (October), at http://firstmonday.org/issues/issue5_10/adar/, accessed 23 October 2003.

G. Deleuze and F. Guattari, 1987. A thousand plateaus: Capitalism and schizophrenia. Translated by B. Massumi. Minneapolis: University of Minnesota Press.

eDonkey, 2003. eDonkey, at http://www.edonkey.com, accessed 24 October April 2003.

Gnutella, 2001. "Gnutella Protocol Specification," version 0.4, document revision 1.2, at http://www.limewire.com/developer/gnutella_protocol_0.4.pdf, accessed 20 October 2003.

Kazaa, 2003. Kazaa, at http://www.kazaa.com, accessed 20 October April 2003.

Napster, 2003. Napster, at http://www.napster.com, accessed 20 October April 2003.

OpenNap, 2003. OpenNap, at http://opennap.sourceforge.net, accessed 20 October April 2003.

C. Rohrs and A. Singla, 2002. "Ultrapeers: Another step towards Gnutella scalability," at http://www.limewire.com/developer/ultrapeers.html, accessed 8 October 2003.

S. Saroiu, P.K. Gummadi, and S. Gribble, 2002. "A measurement study of peer–to–peer file sharing systems," at http://www.cs.washington.edu/homes/gummadi/papers/mmcn.ps, accessed 22 October 2003.

T. Woody, 2003. "The race to kill Kazaa," Wired, volume 11, number 2 (February), at http://www.wired.com/wired/archive/11.02/kazaa.html, accessed 26 September 2004.

Editorial history

Paper received 24 January 2004; accepted 10 September 2004.

Between rhizomes and trees: P2P information systems by Bryn Loban
First Monday, volume 9, number 10 (October 2004),
URL: http://firstmonday.org/issues/issue9_10/loban/index.html