Site icon josh.com

Why did Google declare war on HTTP?

Google is waging a war to force websites to only serve content over secure
https connections by demoting the search ranking of websites that continue to use normal http connections. “…we’re also working to make the Internet safer more broadly. A big part of that is making sure that websites people access from Google are secure.”

At first take, this seams like a magnanimous move by the internet’s benevolent dictator. Security is a good thing, so by forcing lazy websites to finally go secure we are all better off… right?

Unfortunately, things are not so simple and Google’s motivations are likely not so benevolent…

The benefits of https

https encrypts the conversation between you and a website so that no one can electronically eavesdrop on or change any content. This is important for connections where secrets are exchanged. You do not want the person sitting next to you at Starbucks to be able to grab your account number and balance as you log into your bank account over the public Wifi. As long as your connection to the bank is using https, the most an eavesdropper can divine is what website you are looking at- they can not see the actual data (or at least not easily).

The other benefit of https is that it guarantees that the pages you see are really coming from the web site you see in your address bar. With just http, is it possible for someone who happens to control the network between you and the website (i.e. the person running the Wifi access point you are using or your ISP) to spoof the website and send you their own content but make it look like it was coming from the web server you thought you were connected to.

These both sound like worthy benefits that everyone would always want, right?

https is not free

Unfortunately, https is not all benefit – there are significant costs to serve a site over https compared to http.

https costs to users

From the point of view of you the user, pulling up the exact same site over an https connection will be slower than it would have over an http link because…

A google engineer sitting in his apartment in San Francisco browsing on his Mac Pro connected to a 150 megabyte per second dedicated fiber link might not notice the extra 25 milliseconds https cost him to connect to a website. A 12 year kid in Tanzania will definitely notice the extra 10 seconds for his OLPC laptop to pull up the same site over his village’s shared satellite down-link.

https costs to websites

From the point of view of a website, serving the exact same site over https will be more expensive than it would have been over http for all the same reasons it hurts users.

A given website will need more or faster servers and more bandwidth to serve the exact same traffic over https than they would need have served it over http.

Google has more than 1 million cutting edge, custom designed and built servers connected to their 1 petabit/sec of bandwidth distributed across data centers around the globe. They well can afford the extra overhead of making all connections https.

For smaller websites with modest hardware and network connections located in far away places, the costs of being forced to needlessly switch to https can have a huge impact.

Does the Wikipedia logo really need to be secured with https?

For some content, the extra costs of https are well worth it. I absolutely want to log into my bank account over an https connection even if it takes a bit longer. But what about the WikiPedia logo?

What benefit does anyone get by forcing this image (or the Wikipedia homepage) to go over https? None.

There is a huge amount of content on the internet that does not need the protections of https and does not justify the costs. But google does not make any distinctions- it punishes any website who refuses to switch over to https regardless of the nature of the content.

Why would google do this?

Maybe they really are just trying to do the good and right thing for the world, but are blind to the negative impacts of their actions on people not like themselves.

Or maybe, just maybe…. (Warning: for wacky conspiracy theory ahead)

It’s all about the ads

Google made $74 billion last year and 90% of that came from serving ads. Making sure that you see ads is very important to google.

Blocking ads used to be something that was possible, but really required some heavy tech skills to actually pull off and the results were spotty. Blocking (or at least trying to) your ads was more of a geeky act protest than a useful practice.

Recently ad-blocking technology has gotten better and easier. One way of blocking ads is to use ad-blocking software. In the past, Google has shown a willingness to use some surprisingly heavy handed tactics try and stop ad-blocking.  But even with their wild popularity, the impact of these blockers is limited because they must be individually installed on each device. Users must take an active step block the ads on their own device.

More troubling I think to google is the other way of blocking ads – content modifying proxy servers. These proxies are able to filter out ads wholesale for every downstream user, and they work without any user action required. They also work for all types of devices from text-based Linux PCs to Apple watches – all automatically without any software or changes on the device at all. The user might not even know that the ad filtering is happening, they just don’t see the ads in webpages they pull up.

It is almost impossible to stop a filtering proxy from blocking ads over http connections, conversely but it turns out that it is almost impossible for a proxy to filter ads over https connections. Might this be a motivation for Google to attempt to effectively ban the http protocol?

A funny thing happened on the way to my server

I needed to debug a problem with a web app running on the computer sitting next to me, so I pulled up the webpage on my phone and then looked though the webserver’s logs to find my connection. It should have been very easy to find since my phone was on the same physical network as the web server I was debugging – but the request was nowhere to be found. How could this be possible? I know the phone somehow got the page because it was on my screen, but how could it do so without making a log entry?

After much head scratching and trail and error, I finally found the log entry that corresponded to my phone’s request, but it was coming from Mountain View, CA! I was in NYC, my phone was in NYC, my server was in NYC- so how was this request getting routed though a server on the other side of the continent?

After even more head scratching and network sniffing and reverse DNSing, I discovered that…

By default, every Chrome http request from every Android on earth is redirected to google servers over an encrypted connection

Did you hear that? Let me say it again. If you buy and brand new Android phone, turn it on, connect it to your Wifi network, and then open a webpage form a server that still accepts http requests, that request is automatically and silently captured by the phone, encrypted, and rerouted to a google proxy server. This is a real thing people.

Hmmmm…..

UPDATE 1/25/2018

A Web App I wrote more than a decade ago and has been chugging along since then without single hiccup… broke today. After an hours worth of work the issue turned out to be that Flickr now literally hangs up the phone (sends a 0 byte response) on any incoming connection that does not support one of the newest versions of HTTPS connections. Because the server this app is running on is so old, it literally now can not work any more thanks to this policy that is supposed to be protecting me. Good thing no one can ease-drop on me downloading lists of publicly visible photos from Flickr!

UPDATE 12/21/2020

Predictably moving all traffic to https prevented caching which hurt performance and was especially hard on websites that serve the same content over and over again to many different people (think about an NY TIMES article – you explicitly do not want a personalized version) to mobile users.

So google came up with a plan to fix the problem they created, while further deepening their strategic goals. It was called AMP and effectively what it did was to make google a cache for your web content. With AMP, your content is literally stored on and served from google servers – it does not even have your URL any more. Seriously.

If you want to see the AMP version of this very webpage, it is *not* at josh.com – it is here…

https://wp-josh-com.cdn.ampproject.org/c/wp.josh.com/amp/2016/04/01/googles-war-on-http/

Click on it and you will see this web page, but explicitly the origin is ampproject.org over HTTPs from google’s servers. Do you understand how messed up this is? Here is the google narative…

  1. We need to kill http to protect people from having their webpages getting intercepted
  2. …which predibly is devastating to to the performance of highly cacheable content
  3. …so we introduce a new (complicated google) system that intercepts webpages.

So some people started grumbling that google was redirecting and intercepting large parts of the internet into their proprietary system, so google “opened” AMP up and made a committee with some non-google people on it to give the appearance that it was not just a google thing. Well, one of those non-google people just resigned from the AMP committee and said…

The stated goal of the AMP AC is to “make AMP a great web citizen.”

I am concerned that – despite the hard work of the AC – Google has limited interest in that goal.

I have resigned from the Google AMP Advisory Committee

…to the surprise of no one following this issue.

FAQ

Q: Come on, I’ve tested it and the extra latency for establishing an https connection is nominal.
A: You almost certainly do not live on a high latency link like huge parts of the world do. Even on an amazing first-world awesome high speed HughesNet Gen4 link, each round trip costs 500+ milliseconds.  Now imagine how it feels to be on the other end of a multi-hop link in Asia or Africa where google’s righteous we-know-what-best-for-you stand means you have to wait an extra 5 seconds every time you connect to a new https website.

Q: I don’t want eavesdroppers to be able to see what websites I am visiting, so I want all connections to be https.
A: First off, while https does hide the url and content of the pages you are pulling up, it does not hide what websites you are visiting. An eavesdroppers can still see that you visited facebook.com and then cupcakes.com even if those connections were completely over https because (1) the address and port you connect to are still in the clear, and (2) your DNS requests preceding the https opens are in the clear. Sorry.

Next off, if you want to hide what you are doing and you are willing to incur any extra overhead that might cause, that’s fine and it is your choice. You can either set your browser to use https by default or manually enter https when you go to a new (secret) website. But google is forcing all websites to serve https to all visitors for all content. If I try to goto “http://wikipedia.com”, I am unilaterally redirected to the https site and there is nothing I can do to stop it. I am forced to take all the costs of https even though I would much rather go over http. I am all for giving people more choices and trusting them to make the right decisions for themselves. Google’s no-http policy takes options away from people.

Q: I am not worried about someone else seeing the same Wikipedia article that I am seeing, but I need https to insure that the Wikipedia page I see has not been messed with. 
A: This is a very valid thing to do, but https is a terrible way to do it.  If this was really what google was concerned about, they could have advocated having content providers sign their content. This would make it possible for browsers to verify that the content really came from the expected source and had not been altered. Signing is much more efficient than encrypting because (1) signed content can be locally cached and and it is still verifiable as being authentic without any round trips to the remote server at all, (2) while signing something does take some computation, you only need to sign any object once in an off-line process rather than having to do a computationally expensive re-encrypt it each and every time it is sent down an https connection.

Further, signed content is potentially even more secure than relying on https. An https server must both (1) be connected to the internet and (2) have access to private authentication keys. Since these machines are on the internet, they are vulnerable to attack, and once compromised can be used to serve malicious content and that content will be encrypted with the valid private key on the server. With content signing, all signing can be done before the content is loaded onto the web-servers so the private keys need never touch any internet facing machine. An attacker who compromises a web server with signed content can not change that content without invalidating the signature.

Q: This is about privacy. Forcing all traffic over https blocks my government/isp/employer/parents from filtering my web traffic!
A: Anyone who is in a position to filter your http traffic is also in a position to filter your https traffic. With http, they have the option of only filtering specific pages or content while letting the rest of a site though. With https, their only option is to block the entire site, which in practice is typically what they do.

Exit mobile version