On Mathematics, Experimentation and Value

Jeff Ferguson of Amplitude Digital recently authored a piece entitled “Do We Have the Math to Truly Decode Google’s Algorithms?” on the venerable Search Engine Journal with substantial assistance from Data Analytics Consultant Jennifer (Fields) Hood. Please read the article before continuing with mine. While the article has much to commend itself, I believe that it presents faulty logic, mischaracterizations, and ultimately misleading conclusions.

The answer to the question “Do we have the math to truly decode Google’s algorithms” is an emphatic No. However, the primary culprit behind this unfortunate reality is not the incompetency of industry practitioners chiseling away at the algorithm, rather the algorithm itself now employs sufficient black-box machine learning techniques as to render it impossible for anyone without a priori knowledge of the inputs to “decode” the algorithm. Nevertheless, it is essential that we don’t forget the maxim that “perfect is the enemy of good” in our search for knowledge about how to help our clients’ sites perform better in the search engines.

The Central Thesis

The central thesis of Ferguson’s piece seems to go something like this…

Studies conducted by “Gentlemen Scientists” without “any form of testing or certification”, “with rare exceptions” are incapable of analyzing the “complex systems found in search engine algorithms and the information they organize.”

So, how does he proceed to defend this claim? The argument appears to rest on the following premises:

Given the complexity of Google’s algorithm, studying that algorithm requires a person of a certain degree of formal mathematics education.
Specific criticisms of a handful of study models
Weak correlations are not worth our consideration.
The standard for publishing should be “proof”.
Research which leads to the creation of a product or service is unethical.
Sampling is biased.
There is no peer review in our industry.
Epistemic positivism regarding truth and knowledge.
A false equivalency between certainty and usefulness.

Whew. Ferguson has certainly given us a lot to work with in his piece, so let’s go to work. Before we dive into each individual critique, let me start with a broader consideration.

Building a Dam

Ferguson begins his piece with an analogy. He recalls the story of William Mulholland, a self-taught civil engineer who, despite several innovations and successes, ended his career in tragedy when a dam he inspected and approved failed, causing the deaths of hundreds. While no doubt this is a tragedy, I think it is wholly disanalgous to the work of an SEO studying the search engines. To be clear, my concern with the analogy is not the risk involved (to which Ferguson alluded at the end of his post), but rather with the projects themselves. The correct analogy would be if an untrained SEO were attempting to study the SERPs for the purpose of creating their own search engine, or if Mulholland were not attempting to build a dam, but rather to find a way over, under, or through that dam (consequences be damned, pun fully intended). An SEO does not need the requisite knowledge to build a search engine in order to poke holes in the algo, nor does a layman need a formal degree in civil engineering to poke holes in a dam. This analogy is one giant false equivalency.

But I think there is another issue at stake here. There is no evidence presented that Mulholland’s failure would have been prevented had he received a formal education. Mulholland consulted on several dams that still stand today, and several dams constructed by engineers with formal education have failed. In fact, Mulholland himself raised concerns about the “perilous nature of the face of schist on the eastern side of the canyon in his annual report to the Board of Public Works in 1911” that was dismissed by construction manager Stanley Dunham. I do not mean this to be an indictment of formal education, rather that the failure of the St Francis Dam isn’t concrete evidence (pun fully intended again) of Ferguson’s central thesis.

Addressing Critiques

I will now respond briefly to each of the critiques leveled by Ferguson in his writings. I do not intend to, in any way, hand-wave away concerns about the quality of studies presented by SEOs. We can improve on our studies and I have personally written on that subject and spearheaded with a group of technical SEOs a peer-reviewed research contest which has always included at least one statistician as a judge. My position is simply that we can gain valuable insights from studies of varying degrees of sophistication and confidence. With that being said, let me begin.

Claim 1: Studying Complexity Requires Formal Education

Aside from the obvious response that “having knowledge” is the key to success rather than how one attained it, I don’t think it is necessary for someone to have received formal statistical education, much less received a degree in mathematics, to perform valuable research. Take for example the Designer Jewelry Case Study on Amplitude Digital’s site. They claim to have employed techniques like generating “high quality backlinks” at a “healthy pace” which “send powerful credibility signals that Google… can’t ignore.” These claims are not backed up by any experimental design at all, much less one that is scientifically rigorous. Yet, Amplitude Digital makes truth claims about not only what factors matter in Google’s algorithm, but that their usage of those directly affected search traffic. Maybe it did, or maybe their competitors all screwed up their own sites, accounting for the organic growth of their client. I happen to think that Amplitude Digital’s use of case studies is completely justified, is valuable to potential clients and to the community as confirmatory evidence of search engine optimization, and in no way needs statistical validation to provide those modest contributions. I wonder if Ferguson feels the same way? Or will he say that “it wasn’t meant to be scientific”, a retort he finds eminently objectionable.

Claim 2. Specific criticisms of a handful of study models

Ferguson begins by taking aim at Rob Ousbey’s 2019 presentation which included, among many other findings, a relationship between user engagement and rankings. He leans on the expertise of Jen Hood who indicates concerns with the correlation model used stating “the easy test would be: if you can rank on Page 1, especially the top of the page, without previously having any engagement, then the engagement is most likely driven by placement, not the other way around.”

Unfortunately, this would not suffice as an effective experimental model for determining whether engagement or links are more important. Imagine an algorithm made up only of three features: relevancy, content, and links. Now, imagine that you attempt to rank a page only with links, only with content and only with engagement. If you rank the page only with links, does that mean engagement isn’t also a factor? If you rank only with relevant content, does that mean links and engagement aren’t factors? What if you set up two sites and used links on one and engagement on the other and the links won, would that mean links matter more? The answer to all those questions is No. In fact, the relative weight of these factors is practically unknowable, only their relative weights in terms of independent metrics from which we could derive equivalency (X increase in links is equivalent to Y increase in CTR).

Now, we could run experiments that could give us some knowledge. For example, we could start with a random set of keywords, choose a random link in the top 10 for each of those, and mimic various engagement behaviors (click through, pogo sticking, etc) and measure ranking changes for the cohort which received the engagement vs the others which did not.

Correlation studies are not and never have been intended to show causal relationships, but rather give us hints about how the algorithm may work and thus spur on further investigation. Often that investigation is not an individual study but instead the combined behaviors of hundreds of SEO practitioners testing the interventions on their own sites.

Weak Correlations are not Worth Our Time

I find this objection particularly frustrating. We know that no single potential ranking factor is going to explain the majority of the algorithm. We could imagine a simple ranking formula that involves 10 metrics, each weighted the same. A correlation study would yield that each individual factor would have a fairly low correlation coefficient. If we discovered all 10, we could potentially build a model that explains 100%, despite each individual ranking factor having a small coefficient. Weak correlations are going to be part of any complex system. Anyone can try this on your computer right now with Excel. Create 10 columns x 10 rows and fill each with =RAND(). Create an 11th column that is the sum of the previous 10. Now choose any column and the 11th column, go to Data > Data Analysis > Correlation. I just ran this test and the Pearson correlation coefficient was .16.

The Standard for Publishing Should Be Proof

The word prove and proven are thrown around quite a bit. I think we need to be very careful here with our terminology. Inferential analysis by its very nature cannot produce “proof” or “certainty”. It can only produce likelihood under the assumption that the future will behave the same as the past. Correlation studies in particular are not intended to prove anything, rather to provide evidence towards some conclusion. Typically, SEOs implement changes on their and their customers sites building anecdotal evidence (case studies), look to correlation studies for validation or or invalidation of their findings at scale, poll the community through interactions on social media, blogs and forums, and, if it is very important, might go so far as to perform a controlled experimental study. Would it be nice to have a peer reviewed journal and dedicated researchers with proper credentialing regularly submitting? Sure, many of us have discussed forming one and, for a while, SEMJ attempted this very thing. However, the question is whether such sophistication is required to produce valuable information. I think not.

Research which leads to the creation of a product or service is unethical.

Let me quote the exact text to which I am referring…

When I (Ferguson) mentioned to Jen Hood how many of the studies she reviewed have spawned new guiding metrics or entirely new products, she was surprised anyone takes those metrics or products seriously.

“Anyone claiming that they have a metric which mimics Google is asserting that they’ve established many cause-effect relationships that lead to a specific ranking on Google,” Jen wrote, referring to Moz’s Domain Authority.

No. It is difficult for me to take seriously Ferguson at this point (although I give Jen the benefit of the doubt as she is not part of the industry and does not key background information on Domain Authority)

Moz, with regard to Domain Authority, makes no causal claims. Humorously, the masthead on my Twitter account for two years was “DOMAIN AUTHORITY IS NOT A RANKING FACTOR” because so many people made incorrect assumptions about it. In fact, even though I now work for System1, my pinned tweet is still Google Does Not Use Moz’s Domain Authority as a Ranking Factor.

First, Moz doesn’t claim to have a metric which mimics Google. Moz claims to have a metric which predicts with some degree of accuracy the likelihood a site will rank based solely on domain-level link metrics. Domain Authority is a machine-learned metric trained on SERPs. We make no claim that there is a cause-effect relationship between increasing Domain Authority and increasing rankings, or increasing any of the constituent features of Domain Authority and increasing rankings. Perhaps Jen would like to read up on some of my articles on Domain Authority:

So, what is the value of Domain Authority if it doesn’t purport to play a causal role? Well, I give a handful of scenarios in my piece “In Defense of Domain Authority“. Perhaps the most obvious usage is to compare one’s Domain Authority with your competitors in order to help determine whether ranking difficulties are more likely due to links or to poor content. Used as a rule of thumb, DA like DR and CF/TF can be very useful.

Perhaps what is most frustrating about this particular claim is that the data scientists, engineers and mathematicians who work on Domain Authority are eminently qualified – they include ex-Google engineers, a Statistics Professor with a PhD in Applied Mathematics, and an Artificial Intelligence expert with a BS and PhD in Applied Mathematics. I was, by all accounts, the least qualified person in the room, but what I did bring was domain knowledge – an incredibly important part of being an effective data scientist which is curiously missing from Ferguson’s piece.

Sampling is Biased

This depends on the study. Traditionally, I have approached sampling from a number of directions, recognizing that there is no perfect solution. At Moz, when studying link graphs, we created an approach to sampling URLs from the web based off of a methodology originated by Google for a similar purpose. When sampling keywords, we would normally use a stratified sample of keywords based on search volume and CPC.

With regard to Jumpshot in particular, data was acquired from Avast and AVG (desktop and android) users which represented a significant proportion of the United States. We were well aware of the biases in the data (no Mac users, for example).

It is important to point out that simply because we can identify a way in which a sample is imperfect does not mean that it necessarily creates inaccurate results. Take, for example, national polls for presidential campaigns. A relatively small number of respondents can give accurate prediction to within a few percentage points. However, we know that sampling is imperfect in polling for a wide array of reasons (types of people who won’t answer calls from unknown numbers, people with unlisted phone numbers, people who do have land lines, people who are available at a certain time of day, people who are unwilling to give political opinions over the phone).

If one wishes to be contrarian about the outcome of a particular study and believes the cause is poor sampling, then they need to explain the causal chain which converts the sampling issue into a biased outcome. And even if that causal chain exists, the results can be valuable as long as we are aware of the bias.

There is No Peer Review in our Industry

There is no formal peer review, but there is certainly scrutiny. This is not unique to our industry – in fact, it is such a big problem in academia that it has been dubbed the “Replication Crisis“. In 2016, 70% of scientists claimed they had failed to reproduce another scientist’s experiment. While I certainly encourage reproducing studies, if we set that as a necessary standard we pull the rug out from under far more than SEO… from all of modern science.

Epistemic positivism regarding truth and knowledge

Ok, so this is a little esoteric, but it is important to respond. The claim “there is no truth that does not exist without experimental verification of that truth” is a self refuting claim. It literally undermines itself because you cannot run an experiment to prove that experiments are the only source of truth… it would presuppose that experiments are the source of truth and thus be circular in its reasoning. Where is the experiment which shows that Ferguson’s article is true?

We can get along making reasonable inferences to the best explanation by considering testimony, correlation, direct experience, and experiments. And we can give greater epistemic warrant to experiments over testimony, for example, but we have to be careful. Ferguson’s entire argument is based on the testimony of one analyst. And is he qualified to understand her? And how would he know if he was or was not?

This is all nonsense. This degree of skepticism devolves into into meaninglessness.

A false equivalency between certainty and usefulness

I think this is the most important of all the critiques. We don’t have to be certain that a tactic works or a ranking factor is real in order for it to be useful – we merely need to be right more often than our competitors. That’s it.

Concluding Thoughts

If the article followed the title, I would have no concerns with its contents. Of course we do not have the mathematics to decode Google’s algorithm. Such mathematics do not exist. But this article quickly moved away from the question of unraveling Google’s algorithm to a much broader question: do we have the mathematics to learn about the algorithm and optimize accordingly. To that question, the answer is an emphatic Yes. It is born out every day by the successes of our fellow SEOs. I should hope that Jeff Ferguson believes that; otherwise, what is he selling?

Posted by admin on Jul 16, 2020 in Advanced, Rants & Raves, Research, White Hat | 0 comments

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

On Mathematics, Experimentation and Value

The Central Thesis

Building a Dam

Addressing Critiques

Claim 1: Studying Complexity Requires Formal Education

Claim 2. Specific criticisms of a handful of study models

Weak Correlations are not Worth Our Time

The Standard for Publishing Should Be Proof

Research which leads to the creation of a product or service is unethical.

Sampling is Biased

There is No Peer Review in our Industry

Epistemic positivism regarding truth and knowledge

A false equivalency between certainty and usefulness

Concluding Thoughts

Submit a Comment

Calendar

Recent Comments

Categories

Angular

Angular Properties

Blogroll

Free Tools

Other Resources

Powered By

Research