Gnutella
From Wikipedia, the free encyclopedia.
Gnutella (pronounced with a silent "g") is a distributed software project to create a true peer-to-peer file sharing network, without a central server.
Contents |
History
The first client was developed by Justin Frankel and Tom Pepper of Nullsoft, a division of AOL, in early 2000. On March 14, the program was made available for download on Nullsoft's servers. The event was prematurely announced on Slashdot, and thousands downloaded the program that day. The source code was to be released later, supposedly under the GNU General Public License (GPL).
The next day, AOL stopped the availability of the program over legal concerns and restrained Nullsoft from doing any further work on the project. This did not stop Gnutella; after a few days the protocol had been reverse engineered, and compatible open source clones started showing up. This parallel development of different clients by different groups remains the modus operandi of Gnutella development today.
The Gnutella network would be a fully distributed alternative to semi-centralized systems like FastTrack (KaZaA) or centralized systems like Napster. Initial popularity of the network was spurred on by Napster's threatened legal demise in early 2001. This growing surge in popularity revealed the limits of the initial protocol's scalabilty. In early 2001, variations of the protocol (implemented first in closed source clients) allowed scalabilty to improve somewhat. Instead of treating every user as client and server, some users were now treated as "ultrapeers", routing search requests and responses for users connected to them.
This allowed the network to grow in popularity. In late 2001, the Gnutella client LimeWire, which had driven much of the protocol's development, was released as open source. In February, 2002, Morpheus, a commercial file sharing group, abandoned its FastTrack-based peer-to-peer software and released a new client based on the open source Gnutella client Gnucleus.
Sometimes the word "Gnutella" refers not to a particular project or particular piece of software, but to the open protocol used by various clients. Since new clients are under development in various locations, and since a new protocol is apparently on the way too, it is hard to say what the word 'Gnutella' will mostly stand for in the future.
The name is a word play on GNU and Nutella. Supposedly, Frankel and Pepper ate a lot of nutella working on the original project, and they were going to use the GNU GPL license on the finished program. Gnutella is not associated with the GNU project; see GNUnet for the GNU project's equivalent.
How it works
To envision how Gnutella works, imagine a large circle of users (called nodes), who each have Gnutella client software. The client software on the initial use must bootstrap and find at least one of those other nodes. Different methods have been used for this, including a pre-existing list of possibly working node addresses shipped with the software, using Gwebcache sites on the web to find nodes, as well as using IRC to find nodes. Chances are at least one node (call it B) will work. Once it has connected, node B will send node A its own list of working nodes. Node A will try to connect to the nodes it was shipped with, as well as nodes it receives from other nodes, until it reaches a certain quota, usually user-specifiable. It will only connect to that many nodes, but it keeps the nodes it has not yet tried. (it discards ones that it tries but did not work.)
Now, when user A wants to do a search, it sends the request to each node it is actively connected to. It is possible that some of them will no longer work, in which case user A tries to connect to the nodes it has saved as backups. The number of actively connected nodes for user A is usually quite small (around 5), so each node then forwards the request to all the nodes it is connected to, and they in turn forward the request, and so on. In theory, the request will eventually find its way to every user on the Gnutella network.
If a search request turns up a result, the node that had the result contacts the searcher (whose IP address was included with the search request) directly. They negotiate the file transfer and the transfer proceeds. If more than one copy of the same file is found, the searcher can perform a "swarm" download - download pieces of the file from different nodes. This results in increased download rates.
Finally, when user A disconnects, the client software saves the list of nodes that it was actively connected to, and that it was keeping as a backup, for use next time it connects.
In practice, searching on the Gnutella network is often slow and unreliable. Each node is a regular computer user; as such, they are constantly connecting and disconnecting, so the network is never completely stable. Since individual users' connections are likely to be slow, it can take a very long time for a search request to traverse the entire network (which averages around 100,000 nodes at any time).
The real benefit of having Gnutella so decentralized is to make it very difficult to shut the network down. Unlike Napster, where the entire network relied on the central server, Gnutella cannot be shut down by shutting down any one node. As long as there are at least two users, Gnutella will continue to exist.
Protocol features and extensions
Gnutella operates on a query flooding protocol. The outdated Gnutella version 0.4 network protocol employs five different packet types, namely
- ping: discover hosts on network
- pong: reply to ping
- query: search for a file
- query hit: reply to query
- push: download request (for firewalled servents)
These are mainly concerned with searching the Gnutella network. File transfers are handled using HTTP.
The development of the Gnutella protocol is currently led by the GDF (Gnutella Developer Forum). Many protocol extensions have been and are being developed by the software vendors and free Gnutella developers of the GDF. These extensions include intelligent query routing, SHA-1 checksums, query hit transmission via UDP, querying via UDP, dynamic queries via TCP, file transfers via UDP, XML meta data, source exchange a.k.a "the download mesh" and parallel downloading in slices (swarming).
There are efforts to finalize these protocol extensions in the Gnutella 0.6 specification at the Gnutella protocol development website. The Gnutella 0.4 standard, although being still the latest protocol specification since all extensions only exist as proposals so far, is outdated. In fact, it is hard to impossible to connect today with the 0.4 handshake.
The Gnutella protocol remains under development and in spite of attempts to make a clean break with the complexity inherited from the old Gnutella 0.4 and to design a clean new message architecture (see Gnutella2), it is still the most successful, openly developed file-sharing protocol to date.
Clients
Some popular Gnutella clients are
- Limewire (Cross-Platform in Java), GPL open-source code;
- gtk-gnutella (Linux, Unix);
- Gnucleus (Windows), open-source core in C/C++;
- Acquisitionx (Mac OS X), based on LimeWire open-sourced core;
- BearShare (Windows), closed-source;
- Shareaza (Windows), GPL open-source, also connects to Gnutella2, EDonkey2000 / EMule, and BitTorrent;
- Poisoned (Mac OS X), open-source, also connects to FastTrack and OpenFT;
- Mutella (http://mutella.sourceforge.net) (Linux, Unix) Terminal mode Gnutella client;
- Phex open source Java based gnutella client, see [1] (http://phex.sourceforge.net/);
- Qtella GNU/Linux gnutella client, see [2] (http://www.qtella.net/);
- Gnotella (discontinued in December 2001).
See also
- Freenet, which focuses on anonymization and distributed storage,
- servent,
- WASTE,
- Bitzi, an open content file catalog integrated with some Gnutella clients,