Page MenuHomePhabricator

dumps of small wikis are hanging since four days
Closed, ResolvedPublic

Description

Currently only wikidatawiki and enwiki seem to be 'active'.

Snippet of https://dumps.wikimedia.org/backup-index.html:

2021-08-01 10:48:50 cswikinews: Dump in progress

2021-08-01 10:48:49 in-progress Redirect list
    cswikinews-20210801-redirect.sql.gz

2021-08-01 10:48:49 azbwiki: Dump in progress

2021-08-01 10:48:48 in-progress Interwiki link tracking records
    azbwiki-20210801-iwlinks.sql.gz

2021-08-01 10:48:50 hywikibooks: Dump in progress

2021-08-01 10:48:49 in-progress Name/value pairs for pages.
    hywikibooks-20210801-page_props.sql.gz

2021-08-01 10:48:46 aywikibooks (closed): Dump in progress

2021-08-01 10:48:45 in-progress List of pages' geographical coordinates
    aywikibooks-20210801-geo_tags.sql.gz

2021-08-01 10:48:49 kkwiktionary: Dump in progress

2021-08-01 10:48:48 in-progress Redirect list
    kkwiktionary-20210801-redirect.sql.gz

Event Timeline

Thanks for catching this. It is surely related to T287989 and I think we have at last got it sorted. I'll be watching the run this morning to make sure it's back to normal once teh retry kicks off in another few hours.

Change 710922 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[operations/dumps@master] If a run is interrupted weirdly, the runsettings file may be empty, handle this

https://gerrit.wikimedia.org/r/710922

Change 710922 merged by jenkins-bot:

[operations/dumps@master] If a run is interrupted weirdly, the runsettings file may be empty, handle this

https://gerrit.wikimedia.org/r/710922

Mentioned in SAL (#wikimedia-operations) [2021-08-09T08:03:43Z] <ariel@deploy1002> Started deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug

Mentioned in SAL (#wikimedia-operations) [2021-08-09T08:03:51Z] <ariel@deploy1002> Finished deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug (duration: 00m 03s)

The next run of the scheduler is in 15 minutes. I'll be watching to make sure things start up.

ArielGlenn claimed this task.

Jobs are running. I'm going toadd a note to the dumpsdata switch docs to set permissions and ownership on all files after the switch, and close this task.

are jobs still running? they seem to have stopped rather soon after jobs were started, or am i mistaken?

Change 711104 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[operations/dumps@master] Deal with other ways the run settings file may be corrupt

https://gerrit.wikimedia.org/r/711104

Change 711104 merged by jenkins-bot:

[operations/dumps@master] Deal with other ways the run settings file may be corrupt

https://gerrit.wikimedia.org/r/711104

Change 711109 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[operations/dumps@master] don't try to apply settings from a corrupt runsettings file

https://gerrit.wikimedia.org/r/711109

Change 711109 merged by jenkins-bot:

[operations/dumps@master] don't try to apply settings from a corrupt runsettings file

https://gerrit.wikimedia.org/r/711109

are jobs still running? they seem to have stopped rather soon after jobs were started, or am i mistaken?

They were hung on hewikisource, which had a corrupted status file of some kind. I manually cleaned up by running a no-op job, and restarted the runners. I see current page articles being dumped now but because I trust nothing at this point I'll be checking the run every hour for today until I quit for the evening.

looks like it's working, this time

the true test will be the 20210820 jobs

looks like it's working, this time

the true test will be the 20210820 jobs

The 20210820 jobs won't have an issue because there will be no unpacking of partially generated status file tarballs on the wrong host. They will start clean... but I won't close this until the 0801 run completes.

i get the following error on the 0801 files downloaded directly from dumps.wikimedia.org when archiving:

SECURITY WARNING! SUSPICIOUS FILE: ctime changed since archive of reference was done, while no other inode information changed

and this repeats for every file in 0801, but not previous dumps, i'm not sure if this is related

dar archive manager detected it, and it says it's indicative of (but not necessarily) a rootkit

i just used wget to download the files

edit: it's possible that after extracting the files and managing md5 and sha1 sums that my dist upgrade changed the clock, and this could be why, sorry for the alert

The files are fine, I would ignore the message.

is the wikidata 0820 dump going to finish before the 0901 dump starts? what happens if it doesn't?

edit: sorry, just reloaded the page, it's finished - but it's something to think about as wikidata is rapidly expanding

I watch that all the time and have lpans. Don't worry.