Possibility, Probability and Proof: On Case Studies, Correlation and Causation in SEO
I wanted to spend a bit today dealing with a pretty big issue that I see regularly in our industry, bickering over the state of research in search engine optimization. Rarely a week goes by where someone doesn’t say something about correlation does not equal causation in response to the mere sight of the word. So today, I’m going to write about 3 types of SEO research we see in our industry quite regularly and what I believe the appropriate response should be…
Case Studies
This one is actually my pet peeve. We tend to let this one go with a pass nearly every time despite the fact that it is likely the least meaningful SEO research presented. I was particularly drawn to this issue after a series of case studies coming out of Link Research Tools over the last several months regarding the Penguin update. First, full disclosure, Link Research Tools has a product called Link Detox which is somewhat a competitor to Virante’s Remove ‘Em. I don’t really see them as a competitor because many of our users use Link Detox to build their remove list and upload them to Remove ‘Em, but I felt it was worth saying.
Derek Devlin pointed out an example of an excellent case study he did with Link Research Tools that was careful with its language. In all fairness, I feel compelled to include it here as an example of how Open Penguin Data set that these factors do predict likelihood of being penalized, but even that level of study can’t say which of them caused the penalty in the first place.
This is not to say that case studies like the one above aren’t valuable. They are incredibly valuable! But case studies, once again, show us the Possibilities. Go take a look at IAcquire’s, Ayima’s or Distilled’s case studies and they will show the possibilities that can come out of working with an awesome team of experts. When you compare those possibilities with those of less seasoned experts, you can see why one might be better than the other.
Now, it is worth pointing out here that over time enough case studies can be aggregated into a meta-study that then enters into the realm of Probability, not just Possibility. If Link Research Tools were to aggregate their findings across all of them, they could start to draw some conclusions about the probably culprits in a Penguin update, but until then they are dealing with anecdotal evidence one site at a time.
Correlation Studies
This one is another pet peeve of mine, but not the way most people think. The second anyone mentions a correlation study in our industry, the hivemind comes alive yelling “correlation does not imply causation”, a predictable, lamentable chorus of deniers who barely understand what the words mean in the first place. Look, we get it, correlation does not imply causation. Sometimes, authors of correlation studies get that wrong, or at least appear to imply that it matters. However, correlation studies are incredibly valuable for the purpose of helping us understand Probability.
Now, I want to be careful here. I do not mean to say that a correlation study would tell you that “Probably Facebook Likes Cause Improved Rankings”. That is not what I am saying at all. However, I am contending that the probability of one causing the other increases with the sophistication and control of the correlation study itself, especially as you subtract out confounding factors.
You see, there are 2 common problems with correlation studies. The first is what most of us know as “directional”, meaning that we say one causes the other when, in reality, the other causes the one. The classic example would be saying that Ice Cream consumption causes the Sun to come out. Clearly that is not the case, rather on hot sunny days more ice cream is consumed. The second is what we call confounding factors. We could say that the Sun causes Napkin usage when, in reality, Ice Cream consumption causes napkin usage, regardless of whether or not the Sun is out. It is this confounding factors part that good correlation studies can remove and, ultimately, give us some strong probabilities.
With the Facebook Likes example, if we control for traffic, we can actually get a good picture of whether or not they impact rankings. In fact, by controlling for traffic, we can control for the very mechanism responsible for the “Direction” issue as well.
The machine-learned algorithm behind Penguin Analysis is a perfect example of this. We can’t prove which factors cause Penguin, but we can identify a number of risk factors which, in combination, tend to correlate with sites that ultimately get penalized. We can even give the probabilities of how each influences the algorithm. Correlation studies like help us determine the probabilities without giving us the Proof.
Causation and Experimentation
This is the final type of SEO research and the rarest of all. This is likely because it is more difficult to create a truly controlled environment from which an experiment can be launched and studied. However, it gives us the clear directional answer to ranking factors. A perfect example of this would be the experiment Virante used to prove that nTopic content recommendations cause increased organic Google traffic.
In this experiment, we used nTopic to make keyword recommendations for content pages on a single domain. Some pages got the recommendations, some didnt, and some got random word recommendations. We tracked their organic traffic over time and ran analyses to determine whether or not the changes were effective. The cohort modified by nTopic recommendations saw an average shift in traffic of 131.48% while the others saw stagnation or even drops in traffic.
However, it is worth pointing out that while we had proof, it was quite narrow. We could now say this one sentence: “Insertion of nTopic recommended keywords can increase organic traffic from Google” and that is all. Case studies are the broadest – they give us a wide range of possibilities, but we can’t be fully certain of any of them. Correlation studies are in the middle, they give us a clearer idea of the possibilities, but we still can’t be certain. Experiments are the most narrow, giving us a direct idea of what is responsible over which we can be certain.
It is not to say that experiments are better, they are just different. They help answer 1 small question definitively, but they by no means are the only method of attaining valuable knowledge. If that were the case, the entire body of knowledge we know as history would be useless.
Conclusions
The takeaway is this. Knowledge can be brought about in our industry via a lot of different pathways. Be aware of what each pathway can offer you as you consume case studies, correlation studies, and experiments. Be aware of when authors escalate their study’s findings above what their research actually concludes, but don’t ignore what those actual findings do conclude based on a misstatement or two.
Good luck and happy learning.
No tags for this post.
The debate over correlation studies will never end but I think what will resonate better with marketers and critics (like me) is a slight change in language that reflects a more realistic expectation.
Instead of defending a correlation study with a sham nod to “correlation does not equal causation” (which is not entirely accurate — sometimes correlations confirm causes), what people should say is “these correlations may help us make better predictions in the future” without promising that predictions will always be right.
People tend to be naturally skeptical of predictions, whereas they don’t really grok “probabilities” and are too easily deceived by “look at this cool correlation”.
Besides which, a fairly static algorithm (like Penguin presumably is) is much easier to reverse-engineer or analyze than a dynamic “algorithm” like Google’s rankings system, which is constantly changing.
Russ,
You did a great job at publishing a counter-point without it becoming a rant. You are spot-on with your analysis of the industry and the proper way to intake research. I hope these views become more wide-spread and we can elevate the level of discourse on research.
Include a link to this article every time you publish a case-study, correlation research, or causation experiment.
Yes. Thank you. Let’s try to learn from all data sources, and treat them all with a healthy amount of skepticism.
Hi Russ,
All very good points you make here in terms of a studies validity.
However, I would like to point out that not all Link Research Tools Case Studies took this approach, in my case study analysing Icelolly.com versus competitors in the “cheap holidays” vertical, I was very clear to conclude with a set of hypotheses, which as you will know are suppositions or proposed explanations made on the basis of limited evidence as a starting point for further investigation.
I thought it was important to highlight that my approach was just this, a starting point for further investigation – given that you have made a sweeping assertion that ALL LRT case studies over stated their claims.
Cheers
Derek.
Very good ability not to rant about this topic! It can get quite frustrating 🙂
Very interesting article and smart analysis 🙂
A really interesting article that makes some really valid points on causation and experimentation. SEO is not an easy to quantify science and you raise some thought provoking points for me.
good to see that the conversation continues to evolve