W3C HTML Validation and Search Engine Optimization
It has been a while since I have posted some of Virante’s research to the blog, and a good friend and former COO Bob Misita called me out on it. I figured I would release some of the data from a recent study we did on the relationship of W3C HTML Validation and web page rankings. Because validation is quite complex, we chose to take a macro-look rather than our traditional methodology of getting individual sites into the SERPs via sitemaps and then tweaking individual independent variables.
In particular, we looked at the W3C validation of approximately 100 separate keywords in Google, Yahoo, MSN Live and Ask. For each keyword, we extracted the top 10 ranking sites, measured the number of errors via a W3C validation check, and used multiple statistical models to determine whether the individual rankings of the sites could be associated with validation error numbers.
The more rudimentary statistics are all we needed to fairly easily dismiss the assumption that validated content will perform better in the search engines – that is, in G,Y,M or A.
The erratic nature of average # of validation errors compared to the ranking position is fairly evident from the graph above. But, rather than assume that the data from the averages of all 100 keyword searches was accurate, we decided to look at the least squares regression for each and every keyword on each engine (400 different result sets).
Engine | Avg | Slope Avg |
155 | 1.61369625672E-19 | |
Yahoo | 146 | 0.00325581395349 |
MSN Live | 111 | 0.00418604651163 |
Ask | 102 | 0.000714285714286 |
As you can see, the slope of the Least Squares Regression Line is barely positive, the largest being Yahoo’s at 3/1000. If the confidence levels were high, you could assume that for every 333 validation errors removed from your page, you could see your rankings rise by 1 point. However, the confidence levels were not sufficient and, perhaps most glaring, fewer than 2% of the sites tested had greater than 333 validation errors (meaning the vast majority of sites could not benefit from such a change).
Engine | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
103 | 74 | 118 | 190 | 86 | 127 | 60 | 180 | 145 | 146 | |
Yahoo | 97 | 95 | 78 | 134 | 121 | 91 | 126 | 145 | 133 | 118 |
MSN Live | 54 | 102 | 78 | 59 | 122 | 79 | 76 | 100 | 128 | 88 |
Ask | 98 | 99 | 81 | 94 | 63 | 112 | 105 | 82 | 43 | 89 |
Even though validating sites appear to do better in Live and Ask than in Google and Yahoo, we can quickly counter this by looking at the aforementioned regression slopes. It is possible that W3C validation may play a role in being indexed (although I think this is unlikely). Importantly, we saw similar variation in the sites the 4 search engines allowed to rank – meaning that there appears to be no threshold score required to rank in any of these search engines.
So, there you have it. One less thing to worry about. While I still think HTML Validation is a worthy cause in-and-of-itself, one would be hard-pressed to prove that it is directly, positively correlated, much less causal, in regards to one’s search rankings.
11 Comments
Trackbacks/Pingbacks
- W3C HTML Validation and SEO - [...] W3C HTML Validation shows little to no impact on search rankings on Google, Yahoo, Ask or MSN Live. W3C…
- I Motori di Ricerca favoriscono siti W3C Compatibili ? - [...] posizionamento nei motori di ricerca ? Secondo uno studio recente che ho visto a the GoogleCache, W3C HTML Validation…
- HTML Validation Barely a Ranking Factor « SEO Chatter: What’s the buzz, man? - [...] clipped from www.thegooglecache.com [...]
- Bookmarks about W3c - [...] - bookmarked by 1 members originally found by nightcon1600 on 2008-10-19 It has been a while since I have…
Good post. I agree, as much as I support the idea of web standards, when it comes to ranking in search engines I find it has little to no effect whatsoever. It is very useful for other things, but search ranking: I also very much doubt it. As a technical thing search engines are good at working around markup mistakes, and instead concentrate on “social” factors to determine ranking (e.g. number of inbound quality links etc).
I think when “web standards” were being pushed a few years ago, people kept saying that it would help with search engine ranking. It *might* help with indexing however, but I find that it only helps as much as just being sure you have a page that is not broken then a search engine is more able to parse it.
One only has to look at the 1,000+ validation errors that Amazon.com has and compare that to how much money they are making with their absolute disaster of site by any W3C Validation standard.
Yes, you have one more evidence to prove there is little if any corelationship between W3C validation and your search engine rankings. We have observed it countless times that you need do the basic on-page optimization, develop useful content and build lots of quality inbound links to be able to rank well in the major search engines.
Also interesting to think about is the possibility that sites that have many errors would tend to be sites that are more poorly developed overall with less internal linking, poorer onsite SEO factors, and more violations of SEO guidelines such as pop unders, hidden text, banned mx servers and mal ware, etc.
Another interesting test would be page load time and its affect on rankings. Clearly Google thinks this is a quality issue for web pages and have begun working this factor into AdWords landing pages.
Validation might be a quantifiable indicator or “smell” of other aspects of code quality, such as clean markup.
While page rank is obviously an important factor, cleaner code can dramatically increase search engine relevance by increasing the search term to page weight ratio.
Nice to have you burst the bubble on all the often-arrogant, know-it-all, anal-retentive validation junkies.
I have said it before and I will say it again, Validation is most important to meet the compliance of different browsers (human related) rather than SEO purposes. It never hurts to validate anything of course, but as far as your crawlability it has very little to do.
great post, i’ve watched this topic discussed, but no one has used any data to support their reasoning
thnx for really putting some solid evidence out there of why wc3 validation doesn’t effect rankings
I disagree even though your evidence and mapping supports your case, you’re only taking a few web sites and plotting them on a graph for certain key terms. I recently had an issue with not getting indexed with MSN or LIVE.com. I contacted MSN directly and they told me to validate and resubmit and it will improve.
“We are sorry, but StopScraper has determined that your IP address is associated with a scraper.”
Ehh, ok? Normal RSS-feed usage. Default setting in app to check every hour. My IP is probably available in the admin interface of this blog. I now have changed to check every 12 hour. Please remove the block, since I find your stuff interesting. Thank you!
I like that this post, even with measurably relevant data, still leaves the discussion open, but more focused.
However, I consider that if a page looks bad to a visitor, then it will look bad to a bot; and vice versa. Mack asserts this point quite effectively in regards to browser compliance. If you have great rankings, but surfing your site in Opera means only using shortcut keys to navigate – the odds are that you’ll loose a sale or exposure. That’s just how it goes.
So, validation doesn’t matter, eh? Because HTML validation was not ‘perfect’ for these sites, I’m curious to see how the script and CSS coding of these same sites perform against validation? Identical ratios would convince me that it’s not a *ranking* factor, but a *user experience* factor (and aren’t the two equally relevant to organic search?).
Thanks for recharging the debate!
Chat Man