Page MenuHomePhabricator

Dumplastbz2lock from mwbzutils 0.1.1. sometimes fails, causing page content files to falsely be considered bad
Closed, ResolvedPublicBUG REPORT

Description

The dumps for December 1st, 2020 seem to have problems recombining mutiple bz2 streams.

Steps to Reproduce:
See https://dumps.wikimedia.org/frwiki/20201201/ for example

Actual Results:

2020-12-01 23:54:53 failed Recombine multiple bz2 streams
frwiki-20201201-pages-articles-multistream.xml.bz2
frwiki-20201201-pages-articles-multistream-index.txt.bz2
2020-12-01 22:15:48 failed Recombine articles, templates, media/file descriptions, and primary meta-pages.
frwiki-20201201-pages-articles.xml.bz2

Expected Results:
Success in recombining multiple files...

Event Timeline

NicoV triaged this task as Unbreak Now! priority.Dec 2 2020, 12:34 PM
NicoV created this task.

The issue was with the new version dumplastbz2block; an older version was already deployed manually yesterday. The few remaining wikis with issues will be fixed up as those jobs get rerun later in the run. As a verification of that, I have run manually the articles recombine and the articles multistream recombine for frwiki, both of which succeeded. The files from those should be visible on the web server in an hour or two. Thank you for the prompt report, however!

NicoV lowered the priority of this task from Unbreak Now! to Medium.Dec 3 2020, 12:28 PM

Thanks @ArielGlenn , I could download the dump analysis for frWP

You're welcome! I'll leave this task open and rename it to reflect the underlying issue with dumplastbz2block.

ArielGlenn renamed this task from Dump fails to recombine multiple bz2 streams to Dumplastbz2lock from mwbzutils 0.1.1. sometimes fails, causing page content files to falsely be considered bad.Dec 3 2020, 1:31 PM

So far we have a heisenbug, breaking with the util under stretch but not the build on my local install. Still poking at it.

Change 645340 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps/mwbzutils@master] make sure bz2 header is read when reading blocks backwards

https://gerrit.wikimedia.org/r/645340

I've built a package based on the above patch, run a bunch of tests, and deployed it manually to snapshot1005 which is running enwiki. I'll be watching to make sure that as page content files are produced, they are verified properly. Once that's done I'll merge that and the package commit and deploy everywhere.

@ArielGlenn

At least for frWP, it seems the step "Recombine articles, templates, media/file descriptions, and primary meta-pages." doesn't start even if the step "Articles, templates, media/file descriptions, and primary meta-pages" is done for more than 20 hours. See https://dumps.wikimedia.org/frwiki/20201220/

@ArielGlenn

At least for frWP, it seems the step "Recombine articles, templates, media/file descriptions, and primary meta-pages." doesn't start even if the step "Articles, templates, media/file descriptions, and primary meta-pages" is done for more than 20 hours. See https://dumps.wikimedia.org/frwiki/20201220/

They seem done to me.

2020-12-21 01:41:19 done Recombine articles, templates, media/file descriptions, and primary meta-pages.

    frwiki-20201220-pages-articles.xml.bz2 4.6 GB

Note that the servers running everything else but enwiki have the old version of the utils so they wil work just as they always have.

The enwiki run is reusing those articles-pages bz2 files for multistream output, so the new package can go around everywhere at the end of this run.

The package files are already in a local dir on apt1001 waiting to be added to the repo, which will be done just before they are installed on the snapshot hosts.

Change 645340 merged by ArielGlenn:
[operations/dumps/mwbzutils@master] make sure bz2 header is read when reading blocks backwards

https://gerrit.wikimedia.org/r/645340

@ArielGlenn

At least for frWP, it seems the step "Recombine articles, templates, media/file descriptions, and primary meta-pages." doesn't start even if the step "Articles, templates, media/file descriptions, and primary meta-pages" is done for more than 20 hours. See https://dumps.wikimedia.org/frwiki/20201220/

They seem done to me.

2020-12-21 01:41:19 done Recombine articles, templates, media/file descriptions, and primary meta-pages.

    frwiki-20201220-pages-articles.xml.bz2 4.6 GB

Note that the servers running everything else but enwiki have the old version of the utils so they wil work just as they always have.

Hi @ArielGlenn
This is strange, but I see the same thing now. I checked on a mirror and only saw the separate pages, then I checked again ~15 hours later and it was the same. I checked the link on dumps.wikimedia.org, and the recombine was marked as Waiting. When you posted here, everything was ok, with recombine step done with a timestamp that was way before when I checked.

Huh, well maybe the rsync was lagged for some reason, though I don't see any reason for that. But all's well that ends well :-)

ArielGlenn claimed this task.

The new package is available in the apt repo and has been manually installed on all snapshot hosts, thus this task can be closed.