Surfing As GoogleBot – Their IP, Their User-Agent, Their Bot Characteristics

After reading this article and this article which give frustratingly over-simplifications on user-agent spoofing to get past cloaked websites, I figured I should write something on how to REALLY behave like Google. Cloaking often goes well beyond this, using IP delivery, User Agent cloaking, javascript and cookie detection, and referer detection – all of which can be used to determine that you are you and not a bot. So, how do you beat all 5 major types of cloaking? 1. Beat IP Delivery: Use Google Translate as a Proxy, translating from spanish->english even though the site is already in English. 2. Beat User-Agent Cloaking: Use the FirefoxUser-Agent Switcher to spoof as GoogleBot 3. Beat Javascript Detection: Use the Firefox Web Developer Toolbar to turn...

Is Your Site “On Theme”

After years of hearing clients, SEO talking-heads, and bloggers say the phrase “is your site on theme”, I decided that it was about time someone created a programmatic solution to this problem. To date, there has been a huge amount of discussion and disagreement about the role of “themes” (what does that mean anyway) in ranking web...

Duplicate Content Round-up: Diagnosis and Correction with Free Tools

Here is the Duplicate Content Tool if that is all you are looking for… “Duplicate content” has become a standard part of the SEO lexicon over the last year or so (2005-2006), and over that time, a handful of common causes have been identified – the most common of which is poor URL handling. There is one cause, legitimately having duplicate pages or very similar pages throughout your site, which is not caused by poor URL handling, and I would recommend SEOJunkie’s tool for determining that type of issue. To note, there is some skepticism (really, skepticism in the SEO world?) as to the actual effect of duplicate content on a sites ranking. I believe that the largest impact occurs through PR dispersion. This occurs when inbound and...

Clean IP Tool

One of the worst things that can happen to a site is that it gets caught on a server with a ton of spam and adult content. Many web hosts now allow their clients to set up numerous domains on the same shared hosting account, allowing spammers to create tens if not hundreds of sites on the same account which may very well share an IP with your current site. So, we threw together a simple tool that helps determine if other sites on your server have adult or spam content. It even checks the top 100 urls hosted on the same IP in google to make sure they are not banned, and tells you which ones are! Clean IP Tool No tags for this post.

Privacy and Accessibility: Responses to the #privacy initiative

There has been quite a bit of response to the #privacy (http://www.poundprivacy.org) initiative and I am thankful for all who have supported it and taken time to blog about it, link to it, or mention it among your friends and colleagues. The response has been very supportive. (SEOMoz, Google Inside, WebDevForums (15000+ Members), Syndk8 (6500+ Members) and many more.) If this is to be successful, it will require as much support as possible. There are a handful of responses which I believe should be discussed out in the open. I will try to cover the questions which have been raised about the standard as well as I can. (1) Doesn’t adding #privacy to a search flag it as “This is a really interesting search! Hot Stuff Here!”...

The New Standard for Search Privacy

There are really only two legal/ethical issues that are regularly lodged against search engines: copyrights and privacy. The copyright debate has taken its way to courts across the world and has resulted in several technologies which empower webmasters to protect their copyrighted information. No fewer than 4 separate standards: robots.txt, nocache, noindex, and nofollow, allow webmasters to opt-out of indexing, caching, and other search engine activites which may violate or endanger their copyright. How many standards exist to protect privacy? Until today, none. I am proud to introduce #privacy (http://www.poundprivacy.org), the new search standard for privacy. The concept is fairly simple: searchers should be able to opt out of tracking mechanisms when they...