Page MenuHomePhabricator

Get rid of remaining non-Thumbor MediaWiki image scaling in WMF production
Open, MediumPublic

Description

Thumbor is supposed to be WMF's image scaling system. However, the pre-Thumbor image scaling system in MediaWiki still thinks it is responsible for image scaling, is still maintained, and is still doing hundreds of image rendering and scaling operations per day. T260330 is a proposal to split out MediaWiki shell execution to a service. It would simplify that project if we used the pre-existing Thumbor service when MediaWiki requests immediate image scaling, or if we avoided those calls in the first place, rather than building a new Thumbor-like thing to replace those shell executions.

Code search indicates that File::transform() is called with the RENDER_NOW flag from the following places:

  • ThumbnailRenderJob (should be prevented by $wgUploadThumbnailRenderMethod)
  • SpecialUploadStash (should be prevented by $wgUploadStashScalerBaseUrl)
  • thumb.php (should be prevented by thumbProxyUrl)
  • PagedTiffHandler
  • TimedMediaThumbnail

Analysis of exec.log shows convert being called due to UploadStash, mostly via a job, e.g.

2020-08-06 12:39:28 [fa727d75-2be8-4ab4-831f-ac72a3aa1150] mw1387 idwiki 1.36.0-wmf.2 exec DEBUG: MediaWiki\Shell\Command::execute: /bin/bash '/srv/mediawiki/php-1.36.0-wmf.2/includes/shell/limit.sh' 'OMP_NUM_THREADS='\''1'\'' MAGICK_TMPDIR='\''/tmp/magick-tmp'\'' /usr/bin/firejail --quiet --profile=/srv/mediawiki/php-1.36.0-wmf.2/includes/shell/firejail.profile --blacklist=/srv/mediawiki/php-1.36.0-wmf.2/LocalSettings.php --noroot --seccomp --private-dev -- '\''/usr/local/bin/mediawiki-firejail-convert'\'' '\''-quality'\'' '\''80'\'' '\''-background'\'' '\''white'\'' '\''-define'\'' '\''jpeg:size=120x151'\'' '\''/tmp/localcopy_2031ebe257d3.jpg'\'' '\''-thumbnail'\'' '\''120x151!'\'' '\''-set'\'' '\''comment'\'' '\''File source: https://id.wikipedia.org/wiki/Istimewa:UploadStash/file/17o6wwd3amws.xc7gzv.1023578.jpg'\'' '\''+set'\'' '\''Thumb::URI'\'' '\''-depth'\'' '\''8'\'' '\''-sharpen'\'' '\''0x0.8'\'' '\''-rotate'\'' '\''-0'\'' '\''-sampling-factor'\'' '\''2x2,1x1,1x1'\'' '\''/tmp/transform_c48664777a79.jpg'\''' 'MW_INCLUDE_STDERR=1;MW_CPU_LIMIT=50; MW_CGROUP='\''/sys/fs/cgroup/memory/mediawiki/job'\''; MW_MEM_LIMIT=1048576; MW_FILE_SIZE_LIMIT=524288; MW_WALL_CLOCK_LIMIT=180; MW_USE_LOG_PIPE=yes'

Trying to filter out UploadStash mostly just gives me other translations of UploadStash. Other conversion types (vips, rsvg, etc.) are also present but don't have such useful debugging information in exec.log. Maybe once UploadStash is fixed, the log entries will go away and we can make transform(RENDER_NOW) throw an exception. Then we can uninstall the file transform utilities on the appservers.

Event Timeline

Should we work together with Structured Data Engineering on this, as a knowledge transfer opportunity? They own image scaling at least in theory, right?

My recollection of this UploadStash mechanism is that it only kicks in for older browsers that are incapable of doing client-side rescaling of images in order to display provisional thumbnails during the upload process. I think the grade C experience can live without those provisional thumbnails.

I think the problem is that $wgUploadStashScalerBaseUrl is only set if $wmgUseUploadWizard is true, which it isn't for most wikis.

I don't think there's anything that would prevent wikis that don't have UploadWizard enabled from using $wgUploadStashScalerBaseUrl

I did some command line tests in production, and investigated the code a bit more carefully. I'm pretty sure it will work. One surprising thing is that $wgUploadStashScalerBaseUrl could be set to "foo" and it would still work, because $repo->getThumbProxyUrl() takes precedence over $wgUploadStashScalerBaseUrl, and $repo->getThumbProxyUrl() is always set in production. I think a reasonable core change would be to activate remote scaling if $repo->getThumbProxyUrl() is set. Then $wgUploadStashScalerBaseUrl can be false in production and can probably be deprecated.

I reproduced an exec.log message for local scaling like the one in the task description, by triggering an upload warning on en.wikipedia.org. Then I scaled it via Thumbor using curl, with the URL that I would expect outputRemoteScaledThumb() to use.

Change 664086 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Better config for disabling all local image scaling

https://gerrit.wikimedia.org/r/664086

Something we will have to watch out for when this gets deployed is that some Swift containers for "temp" might be missing some privileges for the Thumbor Swift user. This happened before (containers that were supposed to have the rights applied didn't when they came in use) and would show up as errors in the Thumbor logs.

I looked at all the temp container ACLs with "swift stat". The following containers lack the correct permissions for Thumbor:

  • wikibooks-als-local-temp
  • wikimedia-uk-local-temp
  • wikipedia-mo-local-temp
  • wikiquote-als-local-temp
  • wiktionary-als-local-temp
  • wiktionary-mo-local-temp
  • wikimedia-ve-local-temp

The corresponding wikis are all deleted (in deleted.dblist). So it should be OK.

Change 664086 merged by jenkins-bot:
[mediawiki/core@master] Better config for disabling all local image scaling

https://gerrit.wikimedia.org/r/664086