Page MenuHomePhabricator

Check all wikis for inclusions of http resources on https
Closed, ResolvedPublic

Description

Per IRC, it was noted numerous sites were including arbitrary http files in CSS, JS and otherwise which isn't good.

If we can gather a list of offending wikis, we can look at trying to fix them up.

Visiting the main page on all wikis via HTTPS should be enough

Details

Reference
bz34670

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:11 AM
bzimport set Reference to bz34670.

Chris, just a further note on this - It's not something we need to do every upgrade, it's a "1 off" task post our proper HTTPS switchover in the latter half of last year.

Links don't matter so much, but inclusion of resources does more

<Nemo_bis> Reedy, hoo has switched about 3000 MediaWiki messages (with CSS, JS, whatever) to protocol-relative URLs and this should have fixed most of it on most wikis
<Nemo_bis> except evil JS which produce HTTP links with string manipulation

If I understand the correctly, checking on commons, given that commons seems to be the most complex wiki to have gotten 1.19 so far, I find on https://commons.wikimedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=http%3A%2F%2F&fulltext=Search&ns8=1&profile=advanced many hundreds of results for "http://"

A quick check of the first few dozen results yields some suspicious links:

http://toolserver.org
http://stats.wikimedia.org
http://s23.org/wikistats/wmspecials_html.php
http://www.wikilovesmonuments.be

Reference to the cleanup mentioned above: [[m:Stewards'_noticeboard/Archives/2012-04#Fixing_HTTPS_on_Wikimedia_wikis]].

(In reply to comment #5)

A quick check of the first few dozen results yields some suspicious links:

http://toolserver.org
http://stats.wikimedia.org
http://s23.org/wikistats/wmspecials_html.php
http://www.wikilovesmonuments.be

What do you mean suspicious?
Hoo, in the link above, said that Commons had already taken care of the problem. Most URLs in the search results, from a cursory reading, seem simple external links which can't always be replaced (although interwikis can be used for many toolserver.org URLs and all stats.wikimedia.org resources).
I'm not sure, however, if anyone went through all of them.

It's true, that I already fixed several thousand of MediaWiki: pages including http only URIs, but there are some that can't be fixed that easily (eg. were the scripts are obfuscated or using nested imports [scripts which include even other scripts]).

Chris Steipp is working on that following an idea of mine, together with the 3rd party include problem. The idea behind that project is to scan the wikis using a real-user browser scenario, so that we find all inclusions, no matter how they're produced.

CCed and assigned to Chris... he'll post the result on meta wiki as soon as he got them.

(In reply to comment #7)

Chris Steipp is working on that following an idea of mine, together with the
3rd party include problem. The idea behind that project is to scan the wikis
using a real-user browser scenario, so that we find all inclusions, no matter
how they're produced.

This also needs enabling (or otherwise including) all gadgets.

(In reply to comment #7)

Chris Steipp is working on that following an idea of mine, together with the
3rd party include problem. The idea behind that project is to scan the wikis
using a real-user browser scenario, so that we find all inclusions, no matter
how they're produced.

Hoo, Chris, any update here, or way to help?

Yes, I can give you an update... this is work in progress:
Chris had it running on his personal laptop but it broke after a change. As doing this on a laptop isn't a to good idea for such a large automated task the plan is to migrate it to a labs instance after the scripts have been fixed.

I (finally) got around to writing a phantomjs script that loads Main_Page on all 878 wikis in all.dblist over https, and looks for any calls to http urls.

Current result for this issue:

https://ce.wikipedia.org
loads http://upload.wikimedia.org/wikipedia/commons/1/10/Wikipedia-logo-v2-200px-transparent.png

https://sa.wikipedia.org
loads http://strategywiki.org/w/index.php?title=User:Najzere/edit_counter.js&action=raw&ctype=text/javascript

https://ve.wikimedia.org redirects to http://wikimedia.org.ve, no ssl on that domain

(In reply to comment #11)

https://ve.wikimedia.org redirects to http://wikimedia.org.ve, no ssl on that
domain

Redirect works fine for me in my browser... No HTTPS on the target, but it doesn't try to go to HTTPS

(In reply to comment #11)

I (finally) got around to writing a phantomjs script that loads Main_Page on
all 878 wikis in all.dblist over https, and looks for any calls to http urls.

Does the script work as unregistered user? The easy bulk of such problems (stuff loaded by default, "http" mentions in MediaWiki namespace) was already resolved, though of course people keep reintroducing more, but gadgets are more unpredictable.
Quentinv57 has a JavaScript for global preferences change, maybe it can be adapted to enable all gadgets on all wikis for the test account.

Next iteration will be to have it log in, and add every gadget available on every wiki... but I haven't figured that part out yet.

Quentin, can you confirm your script can enable all gadgets on all wikis for a test account for Chris?

Does Quentin have a script for that? I haven't had the time to get one working yet, so it would be great to get that.

(In reply to comment #18)

Does Quentin have a script for that? I haven't had the time to get one
working
yet, so it would be great to get that.

Yes, [[m:User:Pathoschild/Scripts/Synchbot#Global_settings_change]].
I think this applies: "Due to the potential for misuse, this bot is not open-source". https://github.com/Pathoschild/Wikimedia-contrib#readme
Maybe Pathoschild can give you the source too.

Nemo was right. Ran a script to add all the gadgets to my user, and then reran my script to check Main_Page, action=edit for my user page, and Special:RecentChanges. These all popped up.

aswiki

bnwikibooks

dewikiversity

dvwiktionary

elwiki

elwikinews

eswikiquote

(and a few more amethyst urls)

euwiki

orwiktionary

ruwikinews

Ruslik00 wrote:

I fixed all above except bnwikibooks (can not find anything), eswikiquote and also
http://upload.wikimedia.org/wikipedia/commons/e/ea/Button_easy_cite.png on euwiki (can not find either).

It was a good news actually that so few of them turned up.

In fact, a bigger problem is 404 errors.

Chmarkine set Security to None.
Chmarkine added a subscriber: Chmarkine.

Have been going around removing http: references from Wikimedia-hosted wikis. Not sure what to do about the private wikis where I don't have an account/an admin flag.

Modern browsers can be instructed via an Upgrade-Insecure-Requests header or CSP directive to request any resource over HTTPS from a given domain, even if the URL has HTTP in it.

CSP can also be used to report insecure URLs via [[ http://www.w3.org/TR/upgrade-insecure-requests/#reporting-upgrades | default-src https: + report-uri ]].

Modern browsers can be instructed via an Upgrade-Insecure-Requests header or CSP directive to request any resource over HTTPS from a given domain, even if the URL has HTTP in it.

What benefit does this give over HSTS at this point? Can that be configured to apply only to Wikimedia domains? (Couldn't see quickly in the examples.)

CSP can also be used to report insecure URLs via default-src https: + report-uri.

This sounds certainly useful, if it can be restricted to embedded content (rather than just links).

What benefit does this give over HSTS at this point?

None, I guess.

This sounds certainly useful, if it can be restricted to embedded content (rather than just links).

script-src only affects <script> tags. There is also style-src, object-src and so on; they can be combined.

Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.

This is probably obsolete nowadays. But will let Security Team triage and keep, merge or close accordingly :)

Reedy added a subscriber: Legoktm.

This is probably obsolete nowadays. But will let Security Team triage and keep, merge or close accordingly :)

Ditto

In T249466#6030705, @Legoktm wrote:

And CSP stuff seems to cover this