Digg.com Duplicate Content Issue…
While webmasters around the world still remain frustrated at Google’s inability to understand that www.site.com and site.com are by-and-large, identical, it appears that Digg.com never got the memo. Luckily, this helps us put into perspective exactly how out of hand it can get.
Right now, digg.com is showing about 1.5 million duplicate content pages due to not establishing a site-wide redirect from www.digg.com/{anything} to digg.com/{anything}
Google cached for digg.com: 6.69 million
http://www.google.com/search?q=site%3Adigg.com
Google cached for www.digg.com: 1.42 million
http://www.google.com/search?q=site%3Awww.digg.com
Duplication Example:
Digg’s Privacy Page
http://www.google.com/search?q=site:digg.com+inurl:digg.com/privacy&hl=en&lr=&filter=0
While ultimately I still hold that this is Google’s problem, not digg.coms, the solution for a rewrite is fairly simple using mod-rewrite and an htaccess file…
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^digg\.com [NC]
RewriteRule ^(.*) http://digg.com/$1 [L,R=301]
This will cause all website visitors, including a bot, to be 301 (permanent) redirected to the correct, canonical form of your url. Of course, in the example above, you will need to replace digg.com with your own domain name for it to work on your site.
Unfortunately, until Google clears up this problem Google will never clean up this problem, so adding this type of mod-rewrite is a standard of SEO (search engine optimization) for years to come.
“www” has become so common on the web that its almost invisible. No one calls it the World Wide Web anymore, not since 1994 anyway.
I prefer this .htaccess solution for removal of www:
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^www\.digg\.com$
RewriteRule (.*) http://digg.com/$1 [R=Permanent]
Whether Digg does a redirect or not is really not a concern. The bigger problem IMHO is the fact that there are that many pages of little to no information clogging up real search results. Digg stories give little to no real information about the topic, and the comments generally are uninspiring and rarely worth even reading. The real information the search results should be returning are the sources Digg points to with the Digg information buried back on page 100 or the SERP.
Thank you for sharing all this wonderful information 🙂