Page MenuHomePhabricator

Two page content jobs for wikidatawiki are taking days to complete.
Closed, ResolvedPublic

Description

The files in question are

wikidatawiki-20230501-pages-meta-history11.xml-p17175753p17198090.bz2.inprog  
wikidatawiki-20230501-pages-meta-history11.xml-p17220668p17243316.bz2.inprog

One job appears to have completed today at last; the other is still running. They have been running for at least 5 days; such jobs should take no more than 8 hours.

Only these two jobs are affected and only on this wiki.

Event Timeline

ArielGlenn created this task.

In the meantime I have manually run jobs for the pmh 25, 26, 27 page ranges, starting yesterday, and they are nearly complete. I was able to kick off 23 and 24 today. All of this on the testbed/spare hosts snapshot1009,14,15, running out of a screen under the user ariel, jobs running as the dumpsgen user.

As these complete I will run earlier pmh jobs on these hosts, so that we can be sure the dump will complete on time.

Sample command (from /srv/deployment/dumps/dumps/xmldumps-backup/fixup_scripts):

bash ./do_dumptextpass_jobs.sh --date 20230501 --skiplock --jobinfo 24:60192699:65585258 --numjobs 25 --wiki wikidatawiki --config /etc/dumps/confs/wikidump.conf.dumps:wd

where the starting and ending page ids are determined by doing an ls of the pages-articles output files for that wiki and run date, with part number 24.

Done:

  • pmh bz2s: 1-26, 27 is finishing up today
  • pmh 7zs: 1-26, half of 27
  • checksums: for 1-14,17,18,20-26 bz2s and 7zs, half of 27

Currently running:

  • snapshot1009: idle
  • snapshot1014: idle
  • snapshot1015: idle
  • snapshot1010: idle

This comment will be updated as parts complete.

Of note, the two files /mnt/dumpsdata/xmldatadumps/public/wikidatawiki/20230501/wikidatawiki-20230501-pages-meta-history11.xml-p17175753p17198090.bz2.inprog and /mnt/dumpsdata/xmldatadumps/public/wikidatawiki/20230501/wikidatawiki-20230501-pages-meta-history11.xml-p17220668p17243316.bz2.inprog were never moved into their final locations by the script after they were completed. It's unclear why this happened. The files were moved manually.

The wikidata dumps completed today, so all dumps are done for this run. Whew!

ArielGlenn claimed this task.

When I reran one of these jobs, it ran to completion in the usual period of time. Next time we see this behavior, we can try shooting the job and letting it rerun in a timely fashion, rather than blocking for days. Not exactly a resolution to whatever the underlying bug may have been, but it will have to do.