Page MenuHomePhabricator

Serve thumb.php requests with Thumbor
Closed, ResolvedPublic

Description

Right now we have a special case: https://github.com/wikimedia/puppet/blob/486fd8839ac87758dbc5bdb87cf467ac46f8df59/hieradata/role/common/cache/text.yaml#L74-L76 where thumb.php requests are served by image scalers.

I think it should be fairly easy to send those requests to Thumbor instead. Thumbor just needs to parse thumb.php URLs like it does upload.wikimedia.org URLs

Additionally, we need to make sure that Thumbor has the necessary rights to read originals and write thumbnails for private wiki Swift containers.

Revisions and Commits

rTHMBREXT Thumbor Plugins
Restricted Differential Revision

Event Timeline

Gilles triaged this task as Medium priority.Jun 28 2017, 9:45 PM

Assuming file storage of private wikis is just in a different tree within Swift, that means Thumbor won't need additional access credentials, right?

Still, we should be careful about exposing private files to more services and the increased risk of accidental leakage in the future if the service were to have a bug of some sort.

Given these requests will be on the wiki origin, they will carry authentication (cookies). Ideally we'd strip those in Varnish before sending to Thumbor. However, if I recall correctly, Thumbor queries the MediaWiki API for some things, right? If so, then it would need the cookies and forward them. That increases security risk from Swift expose to MediaWiki user authentication expose.

Thumbor doesn't query the MediaWiki API at all, we managed to avoid that dependency.

I did forget about user auth, I guess what we can do is still use Mediawiki as an intermediary for user auth, then acting as a proxy to Thumbor. That code could run on app servers.

This would allow Thumbor to stay dumb and only talk to Swift, with the private containers configured so that Thumbor can read and write to them.

Summary of a possible strategy:

  • MediaWiki handles thumb.php and checks the user's auth against the private wiki
  • MediaWiki optionally rewrites the URL into a desirable format.
  • MediaWiki proxies the request to Thumbor
  • Thumbor, when it sees an extra "this is private" prefix in the URL, which must be impossible to craft for the varnish/rewrite.py codepath, uses a different Swift user, which has access to private containers

Regarding the prefix, I think it would make sense for it to be a hardcoded "private/" and "public/" in MediaWiki and rewrite.py, respectively, which would be less prone to mistakes.

Since all those requests are internal, it might also be possible for Thumbor to identify if the "client" (rewrite.py or MediaWiki) is authorized to access private containers, based on their IP address. Maybe that's even simpler than a URL prefix, considering Thumbor should have access to cluster IP configuration information. This way we could have MediaWiki and rewrite.py send the same kind of requests to Thumbor, and the firewalling of private data is purely logical.

In fact in case of partial network compromise, it would ensure that Thumbor doesn't respond to requests that would spoof the private URL scheme, if the client IP isn't in the whitelist (app servers).

As for MediaWiki, it can't really get it wrong, since the request's full domain name is what determines the container.

I think the idea of having 2 different swift users is worth keeping. Depending on the client IP, use one or the other.

Or maybe we combine both approaches for redundancy. Special prefix that can't come from rewrite.py + IP address check. You only get to use the special swift user for private container access if both criteria are met.

re: authentication based on the IP I think is fragile, hard to say "these are only appservers IPs" and if the criteria is external vs internal IPs then also varnish boxes have internal IPs.

A more robust approach I think would be to keep authentication at http(s) layer, e.g. have mediawiki include a shared (among mw and thumbor) secret token that thumbor can check for legitimate requests for private wikis. Alternatively public key signing of an header or the url would also work, maybe not worth the complexity in this case. (mw signs with its private key, thumbor has mw's public key and check the signature)

@fgiunchedi the shared secret key sounds like the simplest thing to do. Do you think we should have a separate thumbor swift user for private wikis? Or just grant the existing user r/w access to those containers if it doesn't have it already?

@Gilles I think a single swift user should be fine. The reason being that we have to enforce access to private wikis containers at the thumbor level anyways (when the request contains the shared secret and comes over https)

Hmmm the more I think about the implementation, the more the shared key would be make-believe security.

The security gate is handled in thumb.php by Mediawiki ("are you authorized to view this?). Presumably the key would be sent after that point. But there's no guarantee that the security check logic is compromised by an update, and we'll keep sending the key anyway. There's no real difference between that and only contacting thumbor via its thumb.php endpoint from thumb.php after the gate checks. With or without the key, the gatekeeping code in thumb.php is unchanged.

The little amount of gatekeeping there would be in thumbor is that thumbor would only access private containers if the request comes from the thumb.php endpoint (which is different than varnish's for public thumbnails). No key required, the different URL handler is enough.

Hmmm the more I think about the implementation, the more the shared key would be make-believe security.

The security gate is handled in thumb.php by Mediawiki ("are you authorized to view this?). Presumably the key would be sent after that point. But there's no guarantee that the security check logic is compromised by an update, and we'll keep sending the key anyway. There's no real difference between that and only contacting thumbor via its thumb.php endpoint from thumb.php after the gate checks. With or without the key, the gatekeeping code in thumb.php is unchanged.

I disagree, any machine on the internal network can talk to thumbor, enforcing a secret between thumbor and mw reduces the unauthorized access surface significantly. If we also restrict access to thumbor from mw and varnish only, we would be back to the "access control based on the IP" problem mentioned earlier in the task.

Fair enough, I didn't have the case of privilege escalation in mind.

Gilles added a revision: Restricted Differential Revision.Nov 17 2017, 9:59 AM

Change 402579 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/core@master] Add ability to proxy thumbnail requests to a service

https://gerrit.wikimedia.org/r/402579

Change 402582 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 1.9

https://gerrit.wikimedia.org/r/402582

Change 402582 merged by Filippo Giunchedi:
[operations/debs/python-thumbor-wikimedia@master] Upgrade to 1.9

https://gerrit.wikimedia.org/r/402582

This is now blocked on the Mediawiki core change to add the ability to proxy requests to Thumbor from thumb.php

Change 402579 merged by jenkins-bot:
[mediawiki/core@master] Add ability to proxy thumbnail requests to a service

https://gerrit.wikimedia.org/r/402579

Change 407608 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Give officewiki read access to Thumbor

https://gerrit.wikimedia.org/r/407608

Change 407611 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Proxy public wiki thumb.php wikis through Thumbor

https://gerrit.wikimedia.org/r/407611

Change 407612 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/vagrant@master] Thumbor private wiki support

https://gerrit.wikimedia.org/r/407612

Change 407612 merged by jenkins-bot:
[mediawiki/vagrant@master] Thumbor private wiki support

https://gerrit.wikimedia.org/r/407612

Change 407611 merged by jenkins-bot:
[operations/mediawiki-config@master] Proxy public wiki thumb.php requests through Thumbor

https://gerrit.wikimedia.org/r/407611

Mentioned in SAL (#wikimedia-operations) [2018-02-12T19:17:25Z] <niharika29@tin> Synchronized wmf-config/filebackend.php: Proxy public wiki thumb.php requests through Thumbor T169144 (duration: 00m 55s)

Deployment was successful on prod, seems to have broken thumb.php on Beta. I'll look into that tomorrow. Not high priority because thumb.php requests are highly unusual because they need to be manually crafted (and would thus be even more unusual on Beta).

Fixed on Beta, I had just modified the wrong PrivateSettings.php

TODO:

  • Use a separate Swift user for private containers access. This will allow avoiding access leak mistakes where a new private wiki is created and the whitelist in puppet not updated.
  • Check that the wiki creation maintenance script doesn't give access to the regular Thumbor user when it's a private wiki being created. Instead, give access to the private-specific thumbor user

Change 407608 merged by Filippo Giunchedi:
[operations/puppet@production] Give officewiki read access to Thumbor

https://gerrit.wikimedia.org/r/407608

Change 412928 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Add Thumbor/Mediawiki shared secret

https://gerrit.wikimedia.org/r/412928

For Beta I'm going to use Wikisource for testing: https://upload.beta.wmflabs.org/wikisource/en/thumb/6/62/Wind_in_the_Willows_%281913%29.djvu/page1-862px-Wind_in_the_Willows_%281913%29.djvu.jpg

The mediawiki-config change will be needed, though, to have the secret passed by Mediawiki.

Change 412934 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Avoid Puppet restarting Thumbor instances agressively

https://gerrit.wikimedia.org/r/412934

Change 412935 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Give officewiki read access to Thumbor

https://gerrit.wikimedia.org/r/412935

Change 412934 merged by Filippo Giunchedi:
[operations/puppet@production] Avoid Puppet restarting Thumbor instances agressively

https://gerrit.wikimedia.org/r/412934

Change 412935 merged by Filippo Giunchedi:
[operations/puppet@production] Give officewiki read access to Thumbor

https://gerrit.wikimedia.org/r/412935

Change 412952 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Serve officewiki thumbnails with Thumbor

https://gerrit.wikimedia.org/r/412952

Running this on Terbium should give mw:thumbor access to the containers of all private wikis:

foreachwikiindblist "% private.dblist" extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --backend=local-multiwrite --private

Change 412980 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Add all private wikis to swift::proxy::private_container_list

https://gerrit.wikimedia.org/r/412980

Change 413115 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Serve private wiki thumbnails with Thumbor

https://gerrit.wikimedia.org/r/413115

Change 412980 merged by Filippo Giunchedi:
[operations/puppet@production] Add all private wikis to swift::proxy::private_container_list

https://gerrit.wikimedia.org/r/412980

Change 412928 merged by jenkins-bot:
[operations/mediawiki-config@master] Add Thumbor/Mediawiki shared secret

https://gerrit.wikimedia.org/r/412928

Change 413168 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Add Thumbor/Mediawiki shared secret

https://gerrit.wikimedia.org/r/413168

Change 413168 merged by jenkins-bot:
[operations/mediawiki-config@master] Add Thumbor/Mediawiki shared secret

https://gerrit.wikimedia.org/r/413168

Mentioned in SAL (#wikimedia-operations) [2018-02-21T15:24:27Z] <gilles@tin> Synchronized wmf-config/filebackend.php: Thumbor private wiki support deployment: [[gerrit:413168|Add Thumbor/Mediawiki shared secret (T169144)]] (duration: 01m 12s)

Mentioned in SAL (#wikimedia-operations) [2018-02-21T15:27:05Z] <gilles@tin> Synchronized private/PrivateSettings.php.example: Thumbor private wiki support deployment: [[gerrit:413168|Add Thumbor/Mediawiki shared secret (T169144)]] (duration: 01m 11s)

Change 412952 merged by jenkins-bot:
[operations/mediawiki-config@master] Serve officewiki thumbnails with Thumbor

https://gerrit.wikimedia.org/r/412952

Mentioned in SAL (#wikimedia-operations) [2018-02-21T15:37:43Z] <gilles@tin> Synchronized wmf-config/filebackend.php: Thumbor private wiki support deployment: [[gerrit:413168|Serve officewiki thumbnails with Thumbor (T169144)]] (duration: 01m 11s)

Change 413115 merged by jenkins-bot:
[operations/mediawiki-config@master] Serve private wiki thumbnails with Thumbor

https://gerrit.wikimedia.org/r/413115

Mentioned in SAL (#wikimedia-operations) [2018-02-21T15:50:37Z] <gilles@tin> Synchronized wmf-config/filebackend.php: Thumbor private wiki support deployment: [[gerrit:413115|Serve private wiki thumbnails with Thumbor (T169144)]] (duration: 01m 12s)

This is confirmed to work now, verified on officewiki and internalwiki.

Change 773298 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] swift: Create a new read-only role on mw account for backup taking

https://gerrit.wikimedia.org/r/773298