Page MenuHomePhabricator

New wikitech-static sync process for images
Closed, ResolvedPublic

Description

Once images are moved to swift, we'll need a better way to copy images to wikitech-static.

This will probably look something like this:

  1. Script on labweb dumps the imagelinks table (possibly a maintenance script, possibly a raw sql command)
  2. imagelinks table copied to wikitech-static
  3. wikitech imports imagelinks table, wgets each file, then modifies each imagelinks entry to refer to the new, local file

Event Timeline

The sample image dump script is in /root on wikitech-static, along with a config file that works ok.

I had to change two things in the config over on wikitech-static:

root@wikitech-static-ord:/srv/mediawiki/w# diff LocalSettings.php.orig LocalSettings.php
8c8
< require_once("../config/Settings.php");
---
> require_once(__DIR__ . "/../config/Settings.php");

root@wikitech-static-ord:/etc/php/7.0/cli# diff php.ini.orig php.ini
704a705
> include_path = ".:/usr/share/php:/srv/mediawiki/w"

You'll want these changes generally for running maintenance scripts from the command line. I live-hacked them in, not sure where you maintain these.

Images were dumped to /srv/mediawiki/imagedumps/wikitech/$date. If this isn't a good location, you can change the base directory (/srv/mediawiki/imagedumps) in the config file.
There's still plenty of cleanup to be done for this script, but there's your proof of concept.

You'll want these changes generally for running maintenance scripts from the command line. I live-hacked them in, not sure where you maintain these.

The install on -static is just a raw MW install so not otherwise documented/stored anywhere else. There's probably some way or other that we could puppetize this but it would be kind of a mess.

Images were dumped to /srv/mediawiki/imagedumps/wikitech/$date. If this isn't a good location, you can change the base directory (/srv/mediawiki/imagedumps) in the config file.

This looks promising! We'll want it to just update a single directory on every run (rather than fill a series of date-stamped directories.) Is that easy? After that I think all that's left is to run this from a cron shortly after the db is updated.

When I make this change

#$wgUploadDirectory = '/srv/mediawiki/images';
$wgUploadDirectory = '/srv/mediawiki/imagedumps/wikitech/20180312';

Some images seem to work but quite a few fail on thumbnail generation. I haven't found a simple fix yet.

That's because all the thumbs stuff isn't in that directory. You might want to sym link the 0-9a-f directories into the dated ones each time or something, assuming that the run s good (I gotta mark that it is, it's in the todos at the top of the file).

@Andrew, I think the issue with thumbs is that you need to have the thumb directory in wherever you are storing the images, and it needs to be writable by the web server. Try that and see how it goes.
If php error logging is enabled (I couldn't find where it was), you should be able to see some complaints that would point you to that, I think. There's probably similar things for other subdirectories in /srv/mediawiki/images, like temp, timeline, and maybe the favicon path too.

The first thumbnail issue seems to be fixed via chown -R ww-data <imagedir>. There's another bug behind that one which I'm working on.

The other thumbnail issue was caused by an obsolete version of https://gerrit.wikimedia.org/r/#/c/196186/ on -static. Everything should work now presuming that whatever invokes Ariel's script (probably /usr/local/sbin/import-wikitech.sh) sets up ownership properly.

I've moved all of this code into a new repo, git clone https://gerrit.wikimedia.org/r/operations/wikitech-static, in the wikitechsync subdir.

Things seem to mostly work, but updatefile() gets me some binary nonsense rather than a proper xml file when downloading .svg files.

Change 420052 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/wikitech-static@master] get_images: decompress compressed image downloads

https://gerrit.wikimedia.org/r/420052

Change 420052 merged by Andrew Bogott:
[operations/wikitech-static@master] get_images: decompress compressed image downloads

https://gerrit.wikimedia.org/r/420052

This seems to all work now but I'm going to wait 24 hours and check some test images I just added to wikitech

Andrew claimed this task.