Google Responsible for Own Server Clog
As has been the case for over a year now, Google has a severe issue with duplicate caching. Google has yet to fully understand that www.domainname.com is identical to domainname.com and will readily index both.
Google will also index a /index.html and a / version of the same page. And, if the web.domainname.com subdomain is available, it will cache it as well.
I personally have cleaned up sites with 3-6x caches of each page in google, resulting in 100,000+ duplicates being cached. If Google would fix this one problem, it would clear up a huge amount of space.
No tags for this post.
i am new to SEO,and i really liked the topic where you discussed facts about “a modest proposal”. In fact, i myself has based my first post on my wordpress blog on that particular info.short but straight dude!!!!all your comments are very informational to me as a neophyte in SEO.Thanks,i’ll suely continue to read all of your posts.
sam
Not to mention duplicates from crawled https. I think there needs to be an addition to the robots.txt protocol to help with that one.
I , like Sam , im new to SEO but cant understand how Google can get away with making it so difficult for normal Joe to get in , and all these “spam” sites indexed without problems…
I suppose .. If you know how !
Great site , thanks!
Sparks Flying