Don’t download the Hitman movie with Bittorrent when Apple will give it to you free on iTunes. Just hit “Browse” on the Quick Links on the right side of the main iTunes home page. Then, select “Movies” and “Thriller.” You’ll see the title “Hitman” for free, just 1.1GB away:
You might not want to watch it, Rottentomatoes gave it a 15%.
In a three-part rant about peer-to-peer technologies (1, 2, 3), Mark Cuban demands that peer-to-peer technologies “die a quick death” in order to”speed up [his own] internet connection.” He suggests that “Google Video is a far better solution for audio and video distribution than any P2P solution” and that cable companies “charge for upstream bandwidth usage.”
Guess what–I already get charged for all the bandwidth I use, either up or down. When Verizon strings a fiberoptic cable to my home, I’m getting a certain amount of fixed capacity into the greater internet at large. If I want to trade a little upstream capacity for greater downstream capacity, that’s my call! Have you ever noticed that downloading over http is typically slow because there are 100s of clients and 1 host? If I download the same information over bittorrent, I can sustain 12Mbs because everyone is a server–including me. Distributed protocols, such as the ones powering Amazon Dynamo or bittorrent, are more efficient, cost effective, and fault tolerant than single-server models.
Reactions around the blogosphere indicate that Mark Cuban’s thoughts on P2P are nonsensical rubbish. Mashable calls him “a guy who does not understand how P2P works, and yet he wants it shut down.” Ars Technica notes that “if users who are currently saturating their connections with BitTorrent start saturating their connections with Google Video content, the end result is more or less the same.” And a slashdotter comments, “Just imagine how fast the internet would be if there were no content to view. After P2Ps gone, get rid of all these freeloading websites, emails, etc. and it will be blisteringly fast.”
My guess is that billionaire Mark Cuban has a slow, shared cable internet connection at home, the modern equivalent of a party line. This might lead him to confuse his own slow internet connection with a greater systemic problem. What he should be complaining about is why Verizon hasn’t strung fiber in his area yet.
You’ve heard that private file sharing networks exist, but you’ve probably never had a chance to explore one from the inside. These networks of software, music, television, and movie pirates often are run on the internal network infrastructure of private educational institutions. Because a university network has a fixed set of IP addresses, college pirates can run DC++ and write simple scripts to only allow users from the internal IP pool, or even the residential dormitory pool. This prevents unwanted interference (RIAA, MPAA, Police) with the network by simply making it invisible to the outside world. Also, most university networks are lightly-satured high-speed ethernet, giving student pirates the bandwidth to share large files.
While I attended Cornell University, students there ran a large DC++ hub to share files. There were anywhere between 1000 and 2000 users of the DC++ hub, which provided access to terabytes of shared files. Before I left the University to work, I transfered a complete set of users’ file lists to my home computer for later analysis. With 1215 XML file lists from DC++, I wrote a few perl scripts to calculate metrics on the 600mb data set.
Interestingly, the DC++ hub appears to still be around at its old redirect address thchub.no-ip.com:3307. Apparently a student r253141224 is hosting the service on his dorm computer 126.96.36.199.
Data From 20,000 Feet
From the file lists I have, there were 2,456,462 unique files, 5,424,446 total files, 19.07 unique terabytes, and 75.55 total terabytes. Here’s a histogram and data listing of the most popular file types:
mp3 1857432 jpg 828815 m4a 312173 png 264820 gif 224034 avi 203304 dll 133889 wma 116851 htm 82130 zip 79114
The file types follow a classic long-tail distribution, and let us query the data in more interesting ways. For example, for avi movie files, what were the most popular file names? Here’s the top 20:
crash.avi 90 pulp fiction.avi 76 garden state.avi 74 office space.avi 74 good will hunting.avi 72 wedding crashers.avi 67 sin city.avi 66 lost - 2x05 - ...and found.avi 65 super troopers.avi 63 zoolander.avi 60 robin hood - men in tights.avi 59 lost - 2x09 - what kate did.avi 58 eternal sunshine of the spotless mind.avi 57 lost - 2x04 - everybody hates hugo.avi 57 memento.avi 57 american beauty.avi 55 batman begins.avi 55 mean girls.avi 55 lost - 2x07 - the other 48 days.avi 54 old school.avi 54
We can take advantage of common patterns in the data to try and find other patterns, but I’ll save that for another day, and another post in what will undoubtably become a series.