BitTorrent - A Protocol To Distribute Large Files In Large Networks

BitTorrent - A Protocol To Distribute Large Files In Large Networks

·

5 min read

Introduction

BitTorrent is a peer-to-peer protocol that makes the distribution of large files fast, easy and efficient.

Traditionally, we request files from the server whenever we have to download files online. However, this approach proves to be inefficient when a large number of clients or a file of a larger size is to be downloaded due to the fact that -

  • It is inefficient as the server's available bandwidth can be exhausted when multiple clients come into the picture and will end up slowing things down.

  • The speed of the data transfer is limited by the upload capacity of the server in general.

Clarifying what a P2P protocol means

WHAT ARE PEER TO PEER (P2P) NETWORKS? - DCX Learn

In a peer-to-peer network/protocol (P2P), each item has the same capabilities and can initiate a conversation with the other members of the network. This makes such a network very strong due to high availability where even if one node fails, the other nodes can do its work thereby completely mitigating the core problem of single point of failure in modern systems.

In a P2P Network, sometimes a central leader/node shows up in the network (hybrid peer-to-peer network) where it will be responsible for providing information to the nodes of the network and it is worth noting that the peer nodes still are treated equally and can communicate with each other.

BitTorrent Working

BitTorrent uses a hybrid peer-to-peer network with a central entity (known as a tracker) and it is responsible for routing the messages to the correct source and destination notes within the P2P network.

BitTorrent enables faster downloads by distributing your file across the network and splits the work of downloading your large files among the resources of the P2P network. This process of chopping up the files and distributing them among the nodes of the network can also boost concurrency where the file is distributed across the network and every single node of the network contains a chunk/part of the file.

To summarize, when a user would like to download a file, the tracker node (leader node of the P2P network) will get the pieces of the files from the individual nodes. The user then downloads these pieces concurrently thereby improving performance on his end because of concurrency and better resource utilization. And, the ordering of this data is also managed by the tracker node.

In a torrent network, a node who is downloading the data is called a leecher and a seeder is a leecher who is finished downloading its particular chunk assigned by the leader. Therefore, the BitTorrent network is made up of these along with the tracker node in the centre to consolidate the data distributed among the individual nodes and routing messages like acknowledgements between the peers of the network.

Architecture of BitTorrent

At the beginning, a file is shared to the BitTorrent network in the form of a chunk by the tracker node in equal sizes to the individual nodes of the network. During this process, the original file is first cut into chunks (256KB to 1MB size). After this, the SHA-1 of every piece is added to the .torrent file under the 'pieces' attribute. The piece is fetched by its SHA-1 from the .torrent file.

After downloading the piece, a message is broadcasted among the individual members that this particular piece has been finished downloading. We keep a tab on the peers available in the network by downloading critical information called "ANNOUNCE" which is a tracker URL that tells you where you have downloaded your data from. Every single .torrent file is identified by a unique info hash and is typically downloaded through regular HTTP servers.

The central entity of the BitTorrent network is called a "Tracker". It keeps track of peers holding the files and downloading them. BitTorrent keeps tabs on who are the leechers and seeders in the network and it also assists the peers in finding other peers to download content from. It holds metadata about where the individual chunks of files are stored and information necessary for transferring the file.

The tracker is just a simple HTTP server. At first, we go to the tracker after getting a .torrent file and would advertise to it that we'd like to be a part of the network. Once a machine receives a list of 50 peers, it is added to a "peer set" (A subset of the nodes that can receive data right now). Therefore, every peer repeats its state to the tracker every 30 minutes informing it of its status in the network and the operations (uploading/downloading) it is performing in the network. And, this peer list is updated every 30 minutes automatically to avoid outdated node details from showing up.

If the peer set reduces below 20, the peer will reach out to the tracker to update the peer list. The maximum peer set that any server would have is 80 (40 for download and 40 for upload). This equal distribution splits these operations together instead of having the network perform only one type of operation without blocking the other operation and this improves the interconnection between the peers of the network.

The peers know which pieces each peer has in its peer set through the process of "gossiping". This refers to the probabilistic exchange of information between the members of the network and is the main way of communication between the peers to perform status updates and maintain synchronicity.

Applications of BitTorrent

  1. Downloading Linux distributions at a faster rate than through FTP / HTTP.

  2. Sending patches like security patches to users.

  3. Deploying artefacts across servers and used by companies to deploy large-scale applications quickly.

In conclusion, Bit Torrent created by Bram Cohen to enable faster downloads of large files through a peer-to-peer system proves to be pivotal for a lot of applications in today's time and its application/design is truly one of the best innovative engineering.

Thanks for reading this so far, I hope that this article gave you a lot of insight into what goes on inside complex systems like BitTorrent and how they have been designed. I haven't covered a whole lot about Torrent Files and some more algorithms used by BitTorrent however, these can be found on Wikipedia and other websites online if you guys are interested.

Did you find this article valuable?

Support Akash GSS's Blog by becoming a sponsor. Any amount is appreciated!