Should we move to an all HTTPS web? No.
Joost de Valk has started a great discussion about https everywhere over at his blog and it is well worth the read, however I believe he has come to the wrong conclusions. The discussion was spurred on by Bing’s apparent move to HTTPS which would influence the passing of referrer and, subsequently, keyword data to webmasters from search queries. It is worth noting that as of the writing of this, Bing’s HTTPS version is not working and Bing has made no announcement of a move.
Much of this discussion in the SEO community revolves around Section 15.1.3 of RFC2616 which indicates that…
Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.
Subsequently, as a user moves around the web the referrer is not passed when they transfer from an HTTPS to an HTTP website. Yoast makes the case that…
- For practical purposes, you should move to HTTPS so as not to lose referrer data
- For privacy purposes, you should move to HTTPS to protect your users
- In general, referrers should not be passed between websites only inside websites.
I will deal with each of these arguments separately…
Should Referrers be Passed Between Websites
Joost gives this real world example as an argument of why referrers in general should not be passed.
Let’s compare it with a real world case: say that you’re shopping in a mall. You leave store A, and they put a sticker on your back. You enter store B and the shopkeeper there takes the sticker from your back and can see what you looked for in store A. You would argue against that, wouldn’t you?
I find this argument problematic to say the least. There are numerous parts of this metaphor that make it far more insidious than what happens on the web, the most important of which is the level of anonymity. As a web user, we generally enjoy human-level anonymity. While IP addresses may be exposed (they are under HTTPS as well), this does not give the shop-owner much information about our actual identity. The sticker on our back is particularly troubling because our person is being tracked, not a piece of software we control but can hardly be identified by. A better scenario would be shopping with a drone that flies from store to store. Would we be so concerned if it had a sticker on its back?
On the contrary to Joost’s argument, that sticker is very important not just for unscrupulous marketers. While I have yet to see a reasoned argument as to how that referrer data might be used to harm the user, I can think of plenty of ways on plenty of websites for this to be of non-commercial value to many different parties, far too many to enumerate here…
- If the user is led to a non-existent page, the referrer allows a webmaster to reach out to the linking party to make a fix. (helps the traffic sending webmaster)
- If the referrer comes from a known malicious source, ie: a blog post asking their users to harass or harm the target site, it can be blocked (helps the traffic receiving webmaster)
- If the referrer indicates a particular keyword, it can be used to create a customized experience for the user, commercial or non-commercial. (helps the user)
A counter example to Joost’s real-world description would be a shopkeeper who…
- Keeps having shoppers showing up at the back door
- Keeps having shoppers show up to vandalize his store
- Keeps having shoppers show up who are interested in one particular product line
Unfortunately, the HTTP and HTTPS specs never indicate that a referrer SHOULD be passed anywhere, only that it SHOULDN’T be passed from HTTPS to HTTP. More websites moving to HTTPS will cause this chain of referrer data to sever more often, removing much value from the web.
But here is the most important part… if you believe that cross domain referrer passing is intrinsically unethical, you needn’t install HTTPS to accomplish your goal. Simply scrub your referrers.
Now, one could argue that if EVERYONE moved to HTTPS, this wouldn’t be a problem… but there are real limitations to this…
What Would Moving Everyone to HTTPS Look Like
This is a lot harder than one would think. First, Certificate Authorities generally request that a unique IP address be assigned to each cert. We know this is a real problem with IPv4. Costs aside, it would actually be impossible to provide every website with its own IP address until their is widespread adoption of IPv4. Luckily, IPv6 is on the way (it has been for years), but it actually has built in security that invalidates much of the need for the kind of software-layer encryption which SSL offers.
Of course, a website might use Domain Validated only certificates which is, to say the least, concerning. Users are then left with the presumption of security that is not necessarily there. To date, SSL certs have served the dual purpose of encrypting the communication with a website and verifying the owner of that website. DV certs, generally speaking, do not do the latter, but would be a necessary part of creating an HTTPS everywhere kind of scheme that Joost envisions. Of course, if you believe the referrer should not be passed on some sort of weak ethical privacy argument, then this is of no concern to you. But, once again, I point out that you needn’t use HTTPS to accomplish your goal, just scrub the referrers.
Does HTTPS Offer True Privacy?
The answer here is a resounding no. HTTPS really only prevents sniffing – that is the practice of some third party snooping into your connection. It does not prevent a government from subpoenaing the data the webmaster has on you. It does not prevent a government from subpoenaing your ISP to see where your traffic is sent. It does not prevent the government from using any number of techniques to access your software or the websites software to gather otherwise private data.
This is not to say preventing sniffing of this nature isn’t valuable for protecting your data from general hacking concerns, but it is a very specific purpose. HTTPS should not be tossed around as a solution to concerns about referrers, a problem for which there are simpler solutions.
Concluding Thoughts
IPv6 will give us far greater implicit protection than what we have now. If you are truly concerned about referrers, just scrub them. Don’t worry about spending money on an SSL cert for a site that does not cover sensitive matters. Finally, and most importantly, focus on educating your users. They have the capacity within their software right now to make decisions about how their referrer is passed and how cookies are handled. Empower them to make choices, don’t strip them of those choices like Google.
While I have yet to see a reasoned argument as to how that referrer data might be used to harm the user
Imagine you’ve recently found out that you’ve got a sensitive health issue such as HIV. You’re doing research about treatments for that health issue on Google using HTTP not logged into your Google Account (covering all bases for the sake of the example) and click through to a couple of sites.
Third party tracking scripts on the destination website also receive the referrer data from your Google query. Now those advertising networks have the ability to allow their clients to target you based on your search behaviour – not unlike how you target users in Google AdWords.
Someone else uses your computer, tablet or phone and notices that while browsing the web they keep seeing ads regarding HIV related issues – initially they pass it off as crazy advertising but then it keeps happening.
Soon they begin to wonder that their friend or someone using that computer might have HIV, which is why they are seeing so many HIV related ads while using that device.
Suddenly, the secret that the person desperately wanted to keep secret unknowingly gets out due to query data leaking not only into the website they visited but also into the half dozen different tracking scripts/beacons they had have running on their site.
Sadness follows.
I don’t dispute that Google play a massive role in this simply due to their size and penetration throughout the world.
However, I don’t believe that searching on Google in and of itself results in your search criteria leaving Google – that would be a violation of their privacy policy.
That being said, your behavioural data is available to advertisers to target you – but it isn’t down to a keyword level which is what the referrer header has contained for such a long time now.
I think this comment could use some fleshing out, those third party trackers know the page you are currently on is relevant to HIV. Without the keyword they don’t know what the user was searching for, just that they viewed that page.
Appreciate they could gleam useful information from the URL but the tracker is going to fire for all traffic hitting that page – not just the traffic coming from Google and in due course, they may not even know the traffic came from Google if they continue to dial up the removal of the referrer header.
Ultimately I guess my point was that sensitive information can leave the website you’re browsing and lead to you being targeted everywhere you go online by different businesses which could lead to a less than ideal outcome.
After writing the above, I’m not tempted to install a plugin to strip my referrer headers in all instances – just to see how I get targeted online as I move about.