Page MenuHomePhabricator

Detect tools.wmflabs.org tools which are HTTP-only
Closed, ResolvedPublic

Description

First step for T102367 is to find unsecure resources.
How do we do this? Options I see:

  1. naively grep all the projects' PHP and JavaScript code looking for hardcoded http:// URLs;
  2. make a list of tools.wmflabs.org URLs and test them all for unsecure resources with a simple URL fetching script;
  3. set some dark magic JavaScript site-wide logging sending all such occurrences of unsecure resources to Sentry or something;
  4. some smart ruby mechanize crawler to test all the domain recursively.

Event Timeline

Nemo_bis created this task.Feb 29 2016, 8:50 PM
Restricted Application added projects: Cloud-Services, Traffic. · View Herald TranscriptFeb 29 2016, 8:50 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Restricted Application added a project: Operations. · View Herald TranscriptFeb 29 2016, 8:59 PM
scfc added a subscriber: scfc.Feb 29 2016, 9:16 PM

I don't think it is feasible to detect those tools this way because there would be way too many code paths that are only triggered under special circumstances. What is easier:

  1. On the proxy, log http requests (we currently don't log the request's scheme AFAICS).
  2. Group by tool and advise the top ten's maintainers on how to amend their tools (or change incoming links on Wikipedia, etc.).
  3. Lather, rinse, repeat until the ratio http to https is below a threshold.
  4. Announce a deadline for switching on HSTS.
  5. Switch on HSTS.
  6. Deal with tools that got broken by that.
Dzahn added a subscriber: Dzahn.
Dzahn added a comment.EditedFeb 29 2016, 9:33 PM

step 0. figure out which instance(s) are "on the proxy"

step -1. what is the relevant role class? i think "dynamicproxy" module but no role? is "description="Kubernetes to dynamicproxy syncronizer" involved?

scfc added a comment.Feb 29 2016, 10:24 PM

The proxy are the instances tools-proxy-01 and tools-proxy-02. They use the role class role::labs::tools::proxy which includes toollabs::proxy which includes dynamicproxy which uses modules/dynamicproxy/templates/urlproxy.conf for configuration. So in the latter, some directives to log the scheme would need to be added.

Change 274161 had a related patch set uploaded (by Dzahn):
dynamicproxy: custom log schema (http/https) for tools

https://gerrit.wikimedia.org/r/274161

@scfc thank you for the details. i uploaded a WIP patch. can i use gzip like that? log path ok? how long do we want to run it,hmm.. also comments on gerrit

Change 274161 merged by Dzahn:
dynamicproxy: custom log schema (http/https) for tools

https://gerrit.wikimedia.org/r/274161

works now:)

root@tools-proxy-01:~# tail -f /var/log/nginx/access-scheme.log

shows first results

Dzahn added a comment.EditedMar 2 2016, 1:40 AM

here's a first list of tools using http, status 200

add-information
admin
anagrimes
anomiebot
archaeo
autodesc
autolist
awb
bambots
bawolff
betacommand-dev
bibleversefinder
book2scroll
catfood
catnap
catscan2
catscan3
checkdictation-fa
checker
checkwiki
citationhunt
citations
cluebot
commonshelper
copyvios
croptool
crosswatch
denkmalliste
dewikinews-rss
de.wikipedia.org
dexbot
dibot
dispenser
dnbtools
dplbot
dupdet
dykfeed
dykstats
enwp10
eranbot
erwin85
expose-data
farhangestan
fengtools
filedupes
file-siblings
fist
flickr2commons
ftl
geocommons
geograph2commons
geohack
glamtools
guc
heritage
hewiki-tools
hoo
icalendar
icommons
ifttt
imagemapedit
intuition
ircredirect
isbn
isbn2wiki
isin
itwikinews-rss
jackbot
jira-bugimport
joanjoc
kasparbot
kmlexport
krdbot
languagetool
listeria
locator
magnustools
magnus-toolserver
magog
makeref
manypedia
matvaretabellen
meta
metamine
missingtopics
mix-n-match
most-wanted
multidesc
musikanimal
nagf
newwebtest
ninihil
not-in-the-other-language
nullzero
os
osm
osm4wiki
otrsreports
pagepile
pageviews
para
parliamentdiagram
paste
persondata
phabricator-bug-status
phetools
pltools
potd-feed
projektneuheiten-feed
ptwikis
pywikibot
quentinv57-tools
quick-intersection
random-featured
reasonator
refill
reftoolbar
sdbot
seealsology
sighting
sigma
skipenwp10
slumpartikel
spellcheck
static
stewardbots
stimmberechtigung
suggestbot
supercount
svgtranslate
templatecount
templatetiger
templatetransclusioncheck
templator
test-lighttpd-precise
test-lighttpd-trusty
test-webservice-generic
timescale
toolschecker
toolscript
tusc
typoscan
ukbot
url2commons
usersearch
usualsuspects
vcat
video2commons
videoconvert
w
watchr
whois
widar
wiki
wikidata-exports
wikidata-game
wikidata-primary-sources
wikidata-reconcile
wikidata-terminator
wikidata-todo
wikiedudashboard
wikihistory
wikisense
wikishootme
wikisoba
wikitest-rtl
wiki-todo
wikitrends
wikiviewstats
wikiviewstats2
wikivoyage
wikiwatchdog
wiwosm
wlm-de-utils
wlm-stats
wmcounter
wp-signpost
wp-world
wsexport
xtools
xtools-articleinfo
xtools-ec
yifeibot
ytcleaner
zoomable-images
zoomviewer
Magnus added a subscriber: Magnus.Mar 7 2016, 4:42 PM

@Dzahn: Does "using http" mean "available over http", or "do not work on https"?

scfc added a comment.Mar 7 2016, 5:29 PM

@Magnus: In this context, it means that a request came in with http and was not redirected or failed. This could be by someone typing in the URL manually, a link from Wikipedia/etc. or an intra-application link.

Yea, it meant the request came via http and was 200.

Luke081515 moved this task from Triage to Backlog on the Cloud-Services board.Mar 25 2016, 3:00 PM

This seems not particularly useful as a data point, since it is just counting:

  1. Tools that are doing HTTPS redirects themselves
  2. Users who have HTTPSEverywhere turned on.
Restricted Application added a project: Traffic. · View Herald TranscriptApr 18 2016, 6:15 PM

By "this" I assume you mean the list above? I'd like more comments on the methods proposed in the description.

  • make a list of tools.wmflabs.org URLs and test them all for unsecure resources with a simple URL fetching script;
  • some smart ruby mechanize crawler to test all the domain recursively.

These are not unlikely to kill tool labs or the replicas, as many tools are not scraping-friendly, and happily send off long sql queries from a GET request.

  • set some dark magic JavaScript site-wide logging sending all such occurrences of unsecure resources to Sentry or something;

We can probably do this with CSP: T130748: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses

  • naively grep all the projects' PHP and JavaScript code looking for hardcoded http:// URLs;

Sounds like a good idea to me (but probably on the NFS server and not on a bastion ;-))

As a side note, I set up a VM for my PetScan tool:
http://petscan.wmflabs.org/

This does not require http, but works for either, as I couldn't find a way to forward http to https on the proxy level.

Do I need to do that within the server I run?

scfc added a comment.Apr 24 2016, 3:12 AM

@Magnus: https://petscan.wmflabs.org/ seems to work fine. Did you mean something else?

Wouldn't it be better for http to always redirect to https?

tom29739 added a subscriber: tom29739.EditedApr 24 2016, 12:22 PM

In Tools the tool never 'sees' http because everything goes through the proxy:

Should be fairly simple to do.

I doubt this is possible. Several tools do not support https.

Webservices don't directly interact with HTTPS - it is the proxy that does.

So all tools should be compatible if http is redirected to https, as per @Magnus's suggestion.

  • naively grep all the projects' PHP and JavaScript code looking for hardcoded http:// URLs;

Sounds like a good idea to me (but probably on the NFS server and not on a bastion ;-))

Bad idea on the NFS server as well :)

BBlack added a subscriber: BBlack.Apr 24 2016, 1:43 PM

In Tools the tool never 'sees' http because everything goes through the proxy:

Should be fairly simple to do.

I doubt this is possible. Several tools do not support https.

Webservices don't directly interact with HTTPS - it is the proxy that does.

So all tools should be compatible if http is redirected to https, as per @Magnus's suggestion.

The problem is on the client side. There are probably many prominent non-browser client tools of these services that will fail to follow the HTTPS redirect on GET/HEAD (just fail to follow a redirect in general, or fail to have proper HTTPS support).

For clients that mostly use POST the problem is worse, as there's no hope for a universally-compatible secure redirect method for HTTP POST to HTTPS, and even if you had one, the client has already leaked critical POST data in the initial request before the redirect happens. The best you can do in terms of enforcement there is 403 the insecure POST requests and break the clients to make it obvious they need to switch to HTTPS URLs. That's a big breaking change that needs announcement and will make users hate you (which is something we're facing with production APIs, too).

What you can do to help modern browsers, though, without taking the redirect and/or POST-breakage risks, is universally add a long-duration HSTS header so that they lock onto HTTPS if they've ever visited once via HTTPS.

BBlack moved this task from Triage to Watching on the Traffic board.Oct 4 2016, 1:20 PM
BBlack moved this task from Watching to BadHerald on the Traffic board.Oct 4 2016, 1:27 PM
Glrx added a subscriber: Glrx.Nov 2 2016, 5:11 PM
scfc moved this task from Triage to Backlog on the Toolforge board.Dec 5 2016, 4:01 AM
Glrx removed a subscriber: Glrx.Dec 23 2016, 6:29 PM

In Tools the tool never 'sees' http because everything goes through the proxy:

They do get an X-Forwarded-Proto header, and that's how video2commons currently does the detection and redirect.

For clients that mostly use POST the problem is worse, as there's no hope for a universally-compatible secure redirect method for HTTP POST to HTTPS, and even if you had one, the client has already leaked critical POST data in the initial request before the redirect happens. The best you can do in terms of enforcement there is 403 the insecure POST requests and break the clients to make it obvious they need to switch to HTTPS URLs. That's a big breaking change that needs announcement and will make users hate you (which is something we're facing with production APIs, too).

I imagine most tools should use relative uris within the tool (because it could take more time to type/code with absolute urls), so http: posts go to http:, https: posts go to https:, so they should be okay.

AFAIK, the problems are:

  • Those that use http urls for posts internally -- should be easy to fix, just use relative uris; after all, why not?
  • Those that require resources from other Wikimedia domains -- should be easy to fix, just change http to https or make them protocol-relative, since (almost) all Wikimedia domains support https.
  • Those that require external resources (eg. T102457) -- due to ToU / privacy policy, these should be a very small percentage, and most can be easily fixed by using cdnjs or hosting their own version of the resources inside the tool.
bd808 closed this task as Resolved.Mar 26 2019, 12:56 AM
bd808 claimed this task.
bd808 added a subscriber: bd808.

We have been sending out Strict-Transport-Security: max-age:86400 headers following a 302 redirect for GET and HEAD requests for nearly 3 months now. There have been no widespread reports of "OMG! Tool <x> is totally broken" so I think we can actually call this resolved. There is still the matter of POST requests to deal with, but that is a matter of enforcement rather than detection.