The Strongest Cloaking Yet – Cross Domain Canonical Tag
For years the most advanced forms of bot detection, ip delivery, javascript and flash obfuscation, etc. have been employed by blackhat search engine optimizers to accomplish cloaking. These techniques, when used successfully, would allow the webmaster to pull the wool over the eyes of bots and feed sales-heavy (or worse) content to end users.
Google has fought valiantly to stop these techniques and, by and large, has removed all but the most sophisticated techniques. However, they have fallen on their own sword with the introduction of the new cross-domain canonical tag.
The Canonical Tag
The rel=canonical tag was a god-send for most webmasters. It allowed us to defeat duplicate content issues by placing a single line of code at the top of the HTML page, unbeknown to visitors, telling the visiting browser or bot the intended URL for that piece of content. No matter how crafty, strange or contrived the URL used to access the page was, ultimately the bot would know what it should be. Users needn’t be jostled by redirects, and webmasters needn’t rely on more complex server-side technologies to prevent duplicate content. All was well in the kingdom.
One important aspect of the canonical tag was that it only impacted same-domain content. I could place the tag on one page and tell it the canonical URL is located only on another page on my domain. This means that I could not find a way to sneak my canonical tag onto another webmaster’s site and expect to steal their PageRank.
This, however, has changed.
Cross-Domain Canonical Tag
The new cross-domain canonical tag allows webmasters to place a single tag that tells GoogleBot that the real page exists on another domain. The user does not see this tag unless they take the time to view the source. An even cleverer webmaster could even cloak the page so that the only thing that changes when GoogleBot visits is the display of that one tag. How then could a webmaster use this to cloak?. Simple.
Imagine creating 2 separate websites, one that discusses a kosher if not academic subject related to your target industry, and another that is just plain salesy as hell. For example, you could create a Poker and Gambling Addiction Awareness site that targets colleges and universities with programs and resources for fighting gambling addiction on campus. You could also create a poker referral site. Then, using the rel=canonical tag, you could convince GoogleBot that the Addiction Awareness site, that has no trouble getting great links from great sources, is really meant to be at the canonical poker affiliate site. A talented webmaster could even make the content of the two pages nearly identical, but just use different images and robots.txt blocked javascript to modify the look and feel of the second site.
All the PageRank, TrustRank, and potential rankings would be passed on to the target site.
A Solution in Search of a Problem
I don’t understand the cross-domain canonical tag. Nearly all webmasters with multiple domains issues have server-side access, allowing them to easily prevent duplicate content with mod-rewrite, ISAPI rewrite, etc. Sites dealing with scrapers and content syndication will see no benefit from the cross-domain policy, as it is highly unlikely the scraper sites will be willing to drop the canonical tag in place, potentially losing any rankings they might have. Ultimately, Google has given blackhatters a strong tool for cloaking that will require vigilance on Google’s part to detect and prevent.
Good Luck.
Hmmm, very interesting. It will be interesting to see how this tag gets abused.
rel=canonical doesn’t work that way (i.e. concatenating wildly different content) on the same domain…why would it work that way cross-domain?
In past, though, blackhatters had to take the risky step of cloaking to decide whether to 301 redirect the user from the clean, kosher domain to the dirty, salesy domain. Now, that cloaking is not necessary as it happens behind the scenes and with Google’s endorsement.
Google processes the canonical tag as a suggestion, not a directive: there’s still an algorithm at work to determine the principle version of a page, of which the canonical tag is just a factor (if a very strong, one).
I’d imagine that with the cross-domain canonical tag, Google recognises the potential of your hypothetical situation and would tone down the weight that the tag had in determining the principle version of a piece of content.
If one URL continued to attain links, trust and credibility, I’d imagine the cross-domain canonical wouldn’t help in sending this credibility elsewhere unless the new location showed evidence of being attracting links of similar value.
A thought on the author response on Daniel’s comment… A poker referral site would have a hard time making any money if it had the same textual content as a poker gambling addiction awareness site, wouldn’t it?
Just as a thought, I wouldn’t be surprised if Google starts (or is already) sometimes crawling pages with a spoofed UA string to see if what it gets is the same as what the normal Googlebot UA string gets.
You are right about spoofing UA. There is little to no question that Google has for a long time used non-bots to compare the content rendered to a user against that rendered to GoogleBot
well I ran the test.. it did not work cross domain… I have a very branded keyword specific rankings (40 top ten) which i tried to cannonical to a new url…. Original site removed from index lost all rankings .. new site in index… no rankings… so the link metrics were not passed to new domain … (complete duplicate content)
Author Response: Very interesting. So it seems that Google is not honoring the cross-domain canonical tag the way they claimed they would, even when the canonical tag is legitimate. Have you posted this in Matt Cutt’s blog comments to get a response?
Just Learn many article in your blogs about SEO, very interisting
Update…. It does seem to work cross domain although a brand new domain takes considerable longer to establish credibility than an aged domain name
Hi there, thanks for this thought provoking article. I must admit that I hadn’t thought of these concerns, and can see why you’d highlight the issue. However, I’m not sure I’d agree that most webmasters have access to appropriate servers. I have many clients who are not in that position where they can access this for all the hosts of their content, so this is an important new tool in their armoury. This often comes up as well in the arena of affiliates, where enforcing best practice on them can be a nightmare.
Well, this is essentially a cloaking issue and the use of the cross-domain canonical tag here is secondary. As soon as the site can show, without being detected, a nice page to the bot and a spam page to the visitors, the spammer has succeeded. If he can do that, he does not need the canonical tag. I discuss a similar issue here http://civm.ca?store=queen .