BitTorrent Protocol (scraping)

BitTorrent Protocol (scraping)

ngiacomelli

Member #5,114

October 2004

I've got what I hope is a simple question regarding the BitTorrent protocol. I'm attempting to 'scrape' the details of a torrent but am unsure as to how I should provide my info_hash.

As an example, say I have the following info_hash (pulled randomly from the web):
F03B4D83B26CA57E543702FF81A7D491611F8114

I've noticed that Azureus will use that info_hash in a GET request to the scrape script like so:
%F0%3BM%83%B2l%A5%7ET7%02%FF%81%A7%D4%91a%1F%81%14

Now, this may be a silly question but - what conversion is taking place here? I have found this but it hasn't been much help to me, in this matter.

ReyBrujo

Moderator

January 2001

The hash is a 20 byte long string. When contacting the tracker with the GET request, it converts the hexa values into ASCII codes if possible, and if not, it leaves them in hexa. So, F0 is not a printable character and turns into %F0. However, 4D is M, so it converts it.

--
RB
光子「あたしただ…奪う側に回ろうと思っただけよ」
Mitsuko's last words, Battle Royale

Thomas Fjellstrom

Member #476

June 2000

The interesting thing here is, some trackers accept ASCII form (like the one you found), and some like the binary form (like the one encoded), and some like both.. I had a lot of fun with that...

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

ngiacomelli

Member #5,114

October 2004

Thanks for the help, I feel I'm getting close. My PHP script now returns the following info:

Original hash: 70b2b50b8bb7ad3f9ad0415de13b6522bbe30cdb
Hex hash:+P¸»zÓù Þ¶R+¾0Í
Scrape address: http://tracker.prq.to/scrape?info_hash=%07%0B%2BP%B8%BBz%D3%F9%AD%04%15%DE%13%B6R%2B%BE0%CD%0B

Where 'scrape address' is the GET request URL. Yet, no matter what I do (using the original hash, or the ASCII urlencoded version)... I still get an Invalid Request!

I've tried it on a number of trackers, too. I'm just using fopen to try and rip the file, but have also used CURL. Nothing seems to be working. Should I be sending any special header stuff with my request?

Vanneto

Member #8,643

May 2007

Try this:

$link = fsockopen("tracker.prq.to", "PORT");
if(!$link) die("Error on connection!");

fwrite($link, "GET scrape?info_hash=$hash HTTP/1.1\r\n");
fwrite($link, "Host: tracker.prq.to\r\n");
fwrite($link, "Connection: close\r\n\r\n");

while(!feof($link))
{
    echo fgets($link, 128);
}
fclose($link);

In capitalist America bank robs you.

ReyBrujo

Moderator

January 2001

It is possible you need to spoof the Agent as well.

--
RB
光子「あたしただ…奪う側に回ろうと思っただけよ」
Mitsuko's last words, Battle Royale

ngiacomelli

Member #5,114

October 2004

Thanks to everyone for being so helpful, thus far. Sadly, I'm not quite there. (Puts on his dunce cap). Using fsockopen always seems to result in a timeout.

Here's the info my PHP script spits out for debugging:

Original hash: 0fca732e5b16bfecf50213973d22d0d273cd30e6
Hex hash:ü§2å±kþÏP!9sÒ- '<Ó // = $newHash
torrent scrape URL: http://denis.stalker.h3q.com:6969/scrape
cut torrent URL:denis.stalker.h3q.com // = $newScrapeURL
scrape PORT:6969 // = $port

// And the GET request:
GET scrape?info_hash=%00%FC%A72%E5%B1k%FE%CFP%219s%D2-%0D%27%3C%D3%0E%06 HTTP/1.1 

And here's just a quick snippet of the code suggested (modified slightly). I'm not sure exactly what I should be spoofing the user-agent to be (I've heard some scrape functions refuse access by browsers).

link = fsockopen($newScrapeURL, $port);
if(!$link) die("Error on connection!");

fwrite($link, "GET scrape?info_hash=".urlencode($newHash)." HTTP/1.1\r\n");
//fwrite($link, "User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021204\r\n");
fwrite($link, "Host: ".$newScrapeURL."\r\n");
fwrite($link, "Connection: close\r\n\r\n");

while(!feof($link))
{
    echo fgets($link, 128);
}
fclose($link);

Any suggestions?