Surfing As GoogleBot – Their IP, Their User-Agent, Their Bot Characteristics
After reading this article and this article which give frustratingly over-simplifications on user-agent spoofing to get past cloaked websites, I figured I should write something on how to REALLY behave like Google. Cloaking often goes well beyond this, using IP delivery, User Agent cloaking, javascript and cookie detection, and referer detection – all of which can be used to determine that you are you and not a bot.
So, how do you beat all 5 major types of cloaking?
1. Beat IP Delivery: Use Google Translate as a Proxy, translating from spanish->english even though the site is already in English.
2. Beat User-Agent Cloaking: Use the FirefoxUser-Agent Switcher to spoof as GoogleBot
3. Beat Javascript Detection: Use the Firefox Web Developer Toolbar to turn off javascript.
4. Beat Cookie Detection: Use the Firefox Web Developer Toolbar to turn off cookies.
5. Beat Referer Detection: Use the Firefox RefControl Extension to prevent referer from being sent.
Using these in conjunction can be extremely effective, even at pay-for-information sites.
Doing this may be against the terms of service of the site you are visiting. There are plenty of popular sites out their that cloak content which is normally only available to paying members. While these techniques work on those sites too, be careful.
Good browsing!
No tags for this post.21 Comments
Trackbacks/Pingbacks
- Metagg - Metagg is tracking this post... Find out what Social News Sites are discussing this post over at metagg.com...
- links for 2007-08-15 « GroundupHipHop - [...] Surfing As GoogleBot - Their IP, Their User-Agent, Their Bot Characteristics (tags: cache hacks howTo tech google)…
- links for 2007-08-15 « Hip Hop News - [...] Surfing As GoogleBot - Their IP, Their User-Agent, Their Bot Characteristics (tags: cache hacks howTo tech google)…
- meneame.net - Navegando como Googlebot... Cuando hace tiempo se comentó que algunas páginas webs mostraban contenido restringido a Google para que…
- als Google Bot surfen um Spam zu finden - Boardunity Forum - [...] Surfing As GoogleBot - Their IP, Their User-Agent, Their Bot Characteristics [...]
- links for 2007-08-16 « Dejected Resistance - [...] Surfing As GoogleBot - Their IP, Their User-Agent, Their Bot Characteristics (tags: google firefox hack privacy) [...]
- links for 2007-08-16 at DeStructUred Blog - [...] Surfing As GoogleBot - Their IP, Their User-Agent, Their Bot Characteristics (tags: firefox hack Hacking howto google) [...]
- links for 2007-08-16 - [...] surfing as googlebot - their ip, their user-agent, their bot characteristics [...]
- TerminalDigit - How To Really Surf As Google - [...] read more | digg story [...]
- My Stuff | use google to access subscription only sites - [...] More details on thegooglecache.com/. [...]
- Navegar como GoogleBot | adseok - [...] Ya se comentaron algunas formas de acceder a contenido de pago haciéndote pasar por GoogleBot, el robot de Google.…
- Techs Or More » How to REALLY Surf as Google (get into those pesky pay-for-access sites) - [...] read more | digg story [...]
- Google Bot has privileges - [...] the site tries to set.Referrer: Use the Firefox Extension RefControl to disable the Referrer.The website describing the techniques is…
- I am the new Bot! « Blaufish - [...] 1, 2010 by blaufish Den här webben är inte stor nog för oss bÃ¥da, det kan bara…
- Google bot has privileges - [...] website describing the techniques is currently down because it was not able to handle the massive amount of [...]
- Google bot has privileges – user agent string, website experts - [...] website describing the techniques is currently down because it was not able to handle the massive amount of [...]
Beat Referer Detection: Use the Firefox RefControl Extension to prevent referer from being sent… Dint know about that extension! thanks for the heads up
More obsfucation is ideal, but it’s handjob reviews that get you busted.
You don’t need RefControl because Web Developer has an option to disable Referers from being sent.
You also don’t need to install any extension to disable referers. just go to about:config and type “referer” in the box. Change the value to 0.
Also, you can disable cookies and javascript without using extensions. Check your firefox preferences.
Google Translate no longer hides your IP.
Author Comment: Not true… I just ran a check myself to see what IP shows up in the REMOTE_ADDR field of $_SERVER variables in PHP when a page is accessed by Google Translate. It showed the Google IP. Google Translate sends an extra header called HTTP_X_FORWARDED_FOR along with your IP address as well, but the vast majority of sites do not check for this.
The webdeveloper toolbar can block refers as well.
Thanks for the expanded explanation, the other articles *were* over-simplified.
Omg, do you really need all these extensions?
Using Opera I can open a menu* and easily disable cookies, javascript, referer etc.
Changing user agent.. right now I don’t know, but I’m sure you can customize it somewhere, like almost everything else in Opera.
* press F12 or go to Tools > Quick Preferences
Yes, you can do the same in Firefox. But you need to open the menu to do it.
Alt-O (Windows) will open Options.
Under Content, Javascript can be disabled.
Cookies (for all sites or a specific site) can be disabled under Privacy.
Um… OK. I think you do need an extension for referer. Or just go to about:config and change the value of network.http.sendRefererHeader. Its 2 by default. I don’t know what values mean what, but I’m sure I could figure it out.
http://www.spreadfirefox.com/node/5841 says 0 will turn it off.
about:config also got a bunch of general.useragent settings, but I’m not sure which one does what. For useragent, take a look here:
http://johnbokma.com/mexit/2004/04/24/changinguseragent.html
Yes, Firefox has it all built in.
But Firefox was /designed/ to be extended. It was /designed/ to have minimal extra features so people could chose exactly what features it has and what it doesn’t.
Here are few more tricks to access blocked sites at your office/school/university. Access Blocked Sites
Use the Google Web Accelerator for a much more usable method of beating IP delivery than Google Translate.
WebDev toolbar can do it all. Or you can just view a site’s cache.
You can achieve all of the 5 mentioned steps with a single extension: prefbar. It adds a toolbar that makes enabling/disabling cookies, javascript, referrer as simple as clicking a checkbox. User agent and proxy selections are handled through drop-down boxes. There are also dozens of other included customizable selectors, and you can make your own (I made one to disable redirects – another useful googlebot spoof).
http://prefbar.mozdev.org/
*sigh*
1. Beat IP Delivery: Use Google Translate as a Proxy, translating from spanish->english even though the site is already in English.
Author Comment: Actually, the whole point is to use Google’s IP, so that if someone is cloaking using IP delivery, they will still assume you are Google. Other proxies would not fall into the same c-block range as Google’s, making you less likely to succeed
OR…just use a proxy. Any proxy.
3. Beat Javascript Detection: Use the Firefox Web Developer Toolbar to turn off javascript.
OR…simply block JS through FF settings, no need to download WD.
4. Beat Cookie Detection: Use the Firefox Web Developer Toolbar to turn off cookies.
OR….just use FF to turn off cookies.
Have you guys ever looked through Tools–>Options–>Content and Privacy?
#2 and #5: OK.
#1, 3, 4: Why not show them the easiest way first…
“Google Translate no longer blocks your IP” (comment by Ares, above mine somewhere)…if that’s true, no point using it, correct? Also, as Isaac pointed out, with a few about:config changes you don’t need to use add-ons for this at all.
Uh…you got owned?
Google Translater does not block IP any more, so this trick is not working any more!
xB Browser does all of this already. Just download it and set your useragent to Googlebot.
It’s weird, GoogleBot loves my site graydarkarea.blogspot.com but not webmarketingfaq.blogspot.com; does anyone know why?
Does Google Web Accelerator hide your IP or dies it publish it as Translate currently does?
today i found http://keiths-proxy-server.appspot.com/. it appears to be hosted on google’s dime. which makes it interesting, no?
googlebot signs in. also check out billbot. thank me later.
other observations: refcontrl works almost as good as curl. webdeveloper toolbar is more important to me (almost) than firefox because of how hard it rocks. nytimes is a greased — uh, is a good place to cut ones teeth. science journals are the holy grail. and the new webcache.googleusercontent dealio actually makes things easier —
love to know yr responses, o original-poster & those whose eyes fall across these words
but wait theres the secret part! or if you wont let the little would be geeklette’s puny code thru, it is here.
Or ou could simply get a job at google