Page MenuHomePhabricator

Automatically rewrite HTTP links to HTTPS for sites in the HSTS preload list
Closed, ResolvedPublic

Description

Instead of having bots constantly updating links on wikis to swap from HTTP->HTTPS, we should have MediaWiki rewrite them. We can use the HSTS preload list to get a list of domains that are now HTTPS only.

Legoktm's analysis indicated that this would affect ~3% of all HTTP links on the English Wikipedia.

Some prior discussions/motivation:

Event Timeline

This is awesome! What a nice boon for the security of users on older or alternative browsers (especially the hundreds of millions of people using UC Browser). Let me know if you need any help re: the HSTS preload list; I know the guy who manages it.

Oh, and let's make sure to support preloaded TLDs too. There's currently 13, with more on the way.

Oh, and let's make sure to support preloaded TLDs too. There's currently 13, with more on the way.

There is

For example https://github.com/wikimedia/mediawiki-extensions-SecureLinkFixer/blob/master/domains.php#L2762

Oh, and let's make sure to support preloaded TLDs too. There's currently 13, with more on the way.

For reference, the basic set of test cases I have right now are https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/SecureLinkFixer/+/master/tests/parser/parserTests.txt which includes one for the .dev TLD that is preloaded.

I see that you're using the Firefox HSTS preload list. Any particular reason for that? It's a copy of the master list maintained by the Chromium project, except with some additional lag time. So then there's lag time between Chromium and Firefox, and lag time between Firefox and MediaWiki.

I see that you're using the Firefox HSTS preload list. Any particular reason for that? It's a copy of the master list maintained by the Chromium project, except with some additional lag time. So then there's lag time between Chromium and Firefox, and lag time between Firefox and MediaWiki.

Mostly stronger ideological alignment with Mozilla/Firefox, no real technical reason. We are pulling from master of mozilla-central so it should just be the lag time of Chromium to Firefox (AIUI). Once the list is updated in the MediaWiki extension, it'll take a week (worst case) to get deployed to Wikimedia sites, and then 30ish days (worst case again) for parser caches to expire.

Unfortunately the lag time between Chromium's and Firefox's preload list can be substantial, potentially up to one full Firefox release lifecycle (12 weeks). For instance, this batch of domains added to Chromium's list on 2018-05-22 still hasn't made it to Firefox's list as of today, 2018-07-31. So you're looking at up to 13 weeks of total potential lag time (plus up to 30 days for parser caches) between a site being added to the list and it being enforced by this extension.

Unfortunately the lag time between Chromium's and Firefox's preload list can be substantial, potentially up to one full Firefox release lifecycle (12 weeks). For instance, this batch of domains added to Chromium's list on 2018-05-22 still hasn't made it to Firefox's list as of today, 2018-07-31. So you're looking at up to 13 weeks of total potential lag time (plus up to 30 days for parser caches) between a site being added to the list and it being enforced by this extension.

Mozilla is supposed to automatically update their list every day, I pinged a Mozillian friend of mine and they're looking into it: https://bugzilla.mozilla.org/show_bug.cgi?id=1479918

Yeah, it currently only seems to be once per release cycle. That might be all they need though based on their use case, seeing as how they're building a hard-coded list into a compiled and tested executable that will be widely distributed. Your use case is different -- you're not building and testing a browser, you just want the most up to date version of the list possible, whereas they might not want the list changing around during a release cycle as it could make testing difficult.

Hi Cyde - I filed the bug over on the Mozilla side. Updating Firefox's HSTS preload list once per release cycle was never the plan - I'm not sure where that info came from. It's supposed to be updated twice weekly, but a recent infrastructure migration broke the task.

I'm glad to see that Mozilla has fixed what ended up being a larger HSTS preload list syncing bug. Thanks for filing that.

On another note, Legotkm, would it be possible to add a hard-coded/separate list of domains to upgrade links for, in addition to domains on the HSTS preload list? www.google.com isn't HSTS preloaded for complicated reasons related to legacy browser support, but those reasons aren't applicable to upgrading security on links originating from WMF sites. Given that www.google.com is the #1 domain on the list of all outgoing links with over a million, it seems like a real boost in security for hopefully little effort.

Let me know if I can help with that in any way.

I'm curious, what's the external blocker?

This is live on the beta cluster now. You can test it on e.g. https://en.wikipedia.beta.wmflabs.org/wiki/Special:ExpandTemplates. Type in http://en.wikipedia.org/, and see that while the wikitext is unchanged, the raw HTML output will use https://.

Legoktm's analysis indicated that this would affect ~3% of all HTTP links on the English Wikipedia.

I would be curious to see whether this number varies significantly on wikis where KolbertBot is not active . On the Italian Wikipedia, LauBot upgraded links to HTTPS on some 160k articles by using 155k positive rules and 3800 negative rules from HTTPS Everywhere, so far with no reported issues.

Legoktm's analysis indicated that this would affect ~3% of all HTTP links on the English Wikipedia.

I would be curious to see whether this number varies significantly on wikis where KolbertBot is not active . On the Italian Wikipedia, LauBot upgraded links to HTTPS on some 160k articles by using 155k positive rules and 3800 negative rules from HTTPS Everywhere, so far with no reported issues.

Pick a wiki, and I can re-run the analysis :) For reference, the code I used is at https://git.legoktm.com/legoktm/hsts-analysis

Live on group0 wikis now (the left-most column on https://tools.wmflabs.org/versions/). Random page I found to test, https://www.mediawiki.org/wiki/User:Traviadan/Artikelschmiede has a bare link to http://mediawiki.org which generates the HTML of <a class="external free" href="https://mediawiki.org">http://mediawiki.org</a>. woot.

Pick a wiki, and I can re-run the analysis :)

German Wikipedia seems a good candidate: I see tons of HTTP links to very popular domains.

Nemo_bis triaged this task as Medium priority.Jul 25 2019, 8:19 AM

Pick a wiki, and I can re-run the analysis :)

German Wikipedia seems a good candidate: I see tons of HTTP links to very popular domains.

Posted on https://git.legoktm.com/legoktm/hsts-analysis:

On 2019-07-25, 2.66% of the German Wikipedia's external links using HTTP
point to domains that are on the HSTS preoload list (291,575 out of
10,965,705 HTTP links total).

For whatever reason, there are a lot of links to .invalid domains, once
those are removed, we go up to 2.87% 🤔.

And 2.87% was the same number as English...

I put up the raw data at https://people.wikimedia.org/~legoktm/hsts-analysis-dewiki/

Legoktm claimed this task.

Deployed everywhere \o/