Page MenuHomePhabricator

Wikidata lexeme ttl dumps should be in a "predictable" folder
Closed, ResolvedPublic

Description

Wikidata dumps are done weekly usually we can expect the "all" ttl dumps to appear in a folder dated on mondays, lexemes ttl dumps used to be on saturdays but they recently moved to sundays.

When consuming dumps it's easier to expect specific a week-day for the data to arrive, should the folder of the various wikidata dumps be set to the schedule time? Currently it seems that the truthy and lexeme dumps are set to time they're launched which depends on the time taken by the "all" dump.

Event Timeline

Change 622342 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/puppet@production] Use dedicated schedules for the various wikidata ttl dumps

https://gerrit.wikimedia.org/r/622342

There are concerns about possible perf degradation due to the fact that some dumps will now run concurrently for some time (some overlap between all and truthy but also between truthy and lexemes).
@ArielGlenn do you think we should ping someone before moving forward on this or is it an option to see the impact in production for the next run?

I think we can just move this through and keep our eyes on it.

Change 622342 merged by Ryan Kemper:
[operations/puppet@production] Use dedicated schedules for the various wikidata ttl dumps

https://gerrit.wikimedia.org/r/622342

dumpsgen  58563  0.0  0.0   4276   700 ?        Ss   20:50   0:00 /bin/sh -c python3 /srv/deployment/dumps/dumps/xmldumps-backup/generatemiscdumps.py --configfile /etc/dumps/confs/addschanges.conf --dumptype incrdumps --quiet
dumpsgen  58565  0.0  0.0  57976 18128 ?        S    20:50   0:03 python3 /srv/deployment/dumps/dumps/xmldumps-backup/generatemiscdumps.py --configfile /etc/dumps/confs/addschanges.conf --dumptype incrdumps --quiet
dumpsgen  63246  3.9  0.0 456952 64728 ?        S    21:08   2:51 /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=commonswiki --stub=gzip:/mnt/dumpsdata/otherdumps/incr/commonswiki/20200912/commonswiki-20200912-stubs-meta-hist-incr.xml.gz --quiet --spawn=/usr/bin/php7.2 --output=bzip2:/mnt/dumpsdata/otherdumps/incr/commonswiki/20200912/commonswiki-20200912-pages-meta-hist-incr.xml.bz2
dumpsgen  63248  0.0  0.0   4276   740 ?        S    21:08   0:00 sh -c bzip2 > '/mnt/dumpsdata/otherdumps/incr/commonswiki/20200912/commonswiki-20200912-pages-meta-hist-incr.xml.bz2'
dumpsgen  63249  2.9  0.0  13604  8492 ?        S    21:08   2:08 bzip2
dumpsgen  63250  0.0  0.0   4276   736 ?        S    21:08   0:00 sh -c '/usr/bin/php7.2' '/srv/mediawiki/php-1.36.0-wmf.8/../multiversion/MWScript.php' 'fetchText.php' '--wiki' 'commonswiki'
dumpsgen  63251 18.5  0.0 453812 63484 ?        S    21:08  13:24 /usr/bin/php7.2 /srv/mediawiki/php-1.36.0-wmf.8/../multiversion/MWScript.php fetchText.php --wiki commonswiki

Dumps before merging. Nothing specific to wikidata is running so I just merged it now.

We'll want to monitor across the next week to verify all looks good.

Gehel claimed this task.