Page MenuHomePhabricator

Move wikitech images to swift
Closed, ResolvedPublic

Description

If we want to have a proper LVS pair supporting wikitech (currently labweb1001 and 1002), we can't use local images like we do on silver.

  • create a swift container for wikitech images
  • import all images on silver (or better yet, just all images referenced on wiki pages) into swift/wikitech
  • change file references in wikitech database to refer to swift files rather than local files
  • figure out how to get a dump of images (either of the whole namespace, or of all referenced images) for syncing to wikitech-static

If that last step is difficult or impossible then this whole enterprise should probably be scrapped.

Event Timeline

Andrew triaged this task as Medium priority.Mar 5 2018, 2:05 PM
Andrew created this task.

It looks like rebuildImages.php may provide some of the 'crawl all images' logic that I'll need for this

create a swift container for wikitech images

AFAIK there should be no need to manually create a swift container, you should be able to use mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php labswiki --backend=local-multiwrite

import all images on silver (or better yet, just all images referenced on wiki pages) into swift/wikitech

Take a look at how I did https://phabricator.wikimedia.org/T64835#2450333

change file references in wikitech database to refer to swift files rather than local files

I don't think the DB keeps that sort of info, I think it should just be a case of changing the wiki config to have the correct file repo set up

figure out how to get a dump of images (either of the whole namespace, or of all referenced images) for syncing to wikitech-static

Should be entirely possible.

Change 416598 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] role::mariadb::ferm: Allow db access to labweb

https://gerrit.wikimedia.org/r/416598

I've moved all the files we need into swift. To use them, though, we need to turn on wmgUseClusterFileBackend and as soon as I do that wikitech starts wanting to talk to production dbs (m4, I suspect.)

So... to move forward we may need to allow that access. I don't love this from a security standpoint, but right now use of swift appears to be tightly coupled to direct db access. Supporting swift on wikitech WITHOUT this looks to me like it will involve substantial rearranging within mediawiki code.

Alternatively, we can just not use swift, but that will mean no redundancy for wikitech (just like now with silver).

Change 416607 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/mediawiki-config@master] wikitech: use files from swift rather than local uploads.

https://gerrit.wikimedia.org/r/416607

Change 416625 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] labweb wikitech: update vhost

https://gerrit.wikimedia.org/r/416625

Change 416625 merged by Andrew Bogott:
[operations/puppet@production] labweb wikitech: update vhost

https://gerrit.wikimedia.org/r/416625

@Andrew, @Krenair, and I worked through a number of related issues over irc and seem to have landed on a set of configuration changes which allow wikitech to use swift for image storage and still use the instant commons method of talking to commons via the APi rather than the direct database queries that are used from the core network.

I'd like to give most of the credit here to @Krenair for asking the right questions and then pointing to the things that needed changing once he heard the answers. It was a great irc-based remote hands debugging session. :)

m4 is eventlogging databases and no one should have access to it except analytics.

As far as dumping the images back out, I'm dusting off some old scripts that used to be used for media tarballs back in the day, and repurposing the code. There might be something more elegant or ready-made that's been written in the meantime. If no one intervenes soon, I'll get a draft of a script together for testing.

This would dump the images to a filesystem using the standard directory layout (/a/af/sometitle.jpg) under wikiname/date, so it could easily be tarred up and made available for download. Images already current won't be downloaded again, just copied into the dir for the new date. Seem ok to you?

Change 416598 abandoned by Andrew Bogott:
role::mariadb::ferm: Allow db access to labweb

https://gerrit.wikimedia.org/r/416598

As far as dumping the images back out, I'm dusting off some old scripts that used to be used for media tarballs back in the day, and repurposing the code. There might be something more elegant or ready-made that's been written in the meantime. If no one intervenes soon, I'll get a draft of a script together for testing.

This would dump the images to a filesystem using the standard directory layout (/a/af/sometitle.jpg) under wikiname/date, so it could easily be tarred up and made available for download. Images already current won't be downloaded again, just copied into the dir for the new date. Seem ok to you?

@ArielGlenn that would be very helpful -- worst case I could use it as a pattern to follow in a custom script. Ideally we'd only be dumping images that are actually used on the wiki vs. every file that's ever been uploaded. (That appears to be a difference of 10x files).

Change 417009 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] cheap image dump script that might be ok for wikitech

https://gerrit.wikimedia.org/r/417009

Change 417017 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/mediawiki-config@master] preliminary steps for moving wikitech to swift and hhvm

https://gerrit.wikimedia.org/r/417017

Change 417017 merged by jenkins-bot:
[operations/mediawiki-config@master] preliminary steps for moving wikitech to swift and hhvm

https://gerrit.wikimedia.org/r/417017

Change 416607 merged by jenkins-bot:
[operations/mediawiki-config@master] wikitech: use files from swift rather than local uploads.

https://gerrit.wikimedia.org/r/416607

This seems to work -- I uploaded a file on wikitech.wikimedia.org and viewed it on newwikitech.wikimedia.org. Uploading from newwikitech.wikimedia.org seems to work as well, but I'll be more convinced once the new wikitech setup is actually called 'wikitech' and there isn't a storm of confusion between the two domain names.

test image dump script seems to be ok on labsweb1001:
ariel@labweb1001:~$ sudo -u www-data python ./get_images.py --verbose --configfile dump_images.conf.labweb1001 --wiki labswiki
ran to completion proucing a bunch of files in /mnt/dumpsdata/xmldatadumps/public/images/labswiki/20180309

Next up is to see about running it from the wikitec-hstatic host to retrieve stuff from the wikitech live host; are there populated image and imagelinks tables on wikitech-static to try a test run?

@ArielGlenn There should be populated mw tables on wikitech-static that are updated daily. The images are probably a week or so out of sync since that process stopped working when I moved things over to swift.

Also, I'm going to close this task in favor of T188926

Change 417009 abandoned by ArielGlenn:
cheap image dump script that might be ok for wikitech

Reason:
Moved to other repo for use and merged,see https://gerrit.wikimedia.org/r/operations/wikitech-static

https://gerrit.wikimedia.org/r/417009