User Details
- User Since
- May 14 2016, 1:39 AM (413 w, 6 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Cyde [ Global Accounts ]
Oct 1 2018
I'm curious, what's the external blocker?
Sep 24 2018
Let me know if I can help with that in any way.
Sep 20 2018
I'm glad to see that Mozilla has fixed what ended up being a larger HSTS preload list syncing bug. Thanks for filing that.
Aug 2 2018
Another potential optimization -- because of the above restrictions, the overwhelming number of entries on the list have 2 labels. Using a dumb regex against the raw Chromium json file, I'm seeing 5.1K entries with 3 labels and 49K labels with 2 parts, almost 10X as many. Given that, it might make sense to consult the second-level domain first, and then the third-level domain, and then finally the TLD. I know the code won't be as clean but I suspect it'll be faster, because you'll be querying in order of highest hit rate.
In addition to checking access time, you should also check memory usage. How much bigger is the nested structure than the single level hashmap?
Yes, that's correct, you wouldn't also have to look up .org given that wikipedia.org is in the HSTS preload list. I was thinking from the perspective of some other domain that isn't preloaded individually but that is on a preloaded TLD, e.g. http://blog.google should always be rewritten to https://blog.google because .google is preloaded.
Yeah, it currently only seems to be once per release cycle. That might be all they need though based on their use case, seeing as how they're building a hard-coded list into a compiled and tested executable that will be widely distributed. Your use case is different -- you're not building and testing a browser, you just want the most up to date version of the list possible, whereas they might not want the list changing around during a release cycle as it could make testing difficult.
Jul 31 2018
Small correction to the description -- For that sample domain some.fake.domain.wikipedia.org, you'd also have to check whether org is preloaded, as TLDs themselves can be preloaded (e.g. .app, .dev, .google, and 10 more). Assuming that your 39K static array is kept in-memory and is implemented internally using a hashtable (I don't know enough about the internal workings of PHP), I'd think that'd be good enough. The only negative cache entries you'd be hitting more than some small fraction of the time would be those for common TLDs, at which point that lookup probably costs similar to the lookup into the array.
Unfortunately the lag time between Chromium's and Firefox's preload list can be substantial, potentially up to one full Firefox release lifecycle (12 weeks). For instance, this batch of domains added to Chromium's list on 2018-05-22 still hasn't made it to Firefox's list as of today, 2018-07-31. So you're looking at up to 13 weeks of total potential lag time (plus up to 30 days for parser caches) between a site being added to the list and it being enforced by this extension.
I see that you're using the Firefox HSTS preload list. Any particular reason for that? It's a copy of the master list maintained by the Chromium project, except with some additional lag time. So then there's lag time between Chromium and Firefox, and lag time between Firefox and MediaWiki.
Great. Thanks again!
This is awesome! What a nice boon for the security of users on older or alternative browsers (especially the hundreds of millions of people using UC Browser). Let me know if you need any help re: the HSTS preload list; I know the guy who manages it.
May 21 2016
Here is the existing script that runs on pywikibot-compat: