Page MenuHomePhabricator

zhwiki pages-meta-history bz2 dump hangs
Closed, ResolvedPublic

Description

File being produced:
/mnt/dumpsdata/xmldatadumps/public/zhwiki/20190101/zhwiki-20190101-pages-meta-history2.xml-p268800p330497.bz2.inprog
It's hung at the same point twice now, at size 254803968

Command being run:
/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=zhwiki --stub=gzip:/mnt/dumpsdata/xmldatadumps/temp/z/zhwiki/zhwiki-20190101-stub-meta-history2.xml-p268800p330497.gz --prefetch=7zip:/mnt/dumpsdata/xmldatadumps/public/zhwiki/20181201/zhwiki-20181201-pages-meta-history2.xml-p206129p268800.7z;/mnt/dumpsdata/xmldatadumps/public/zhwiki/20181201/zhwiki-20181201-pages-meta-history2.xml-p268801p331240.7z --report=1000 --spawn=/usr/bin/php7.2 --output=bzip2:/mnt/dumpsdata/xmldatadumps/public/zhwiki/20190101/zhwiki-20190101-pages-meta-history2.xml-p268800p330497.bz2.inprog --full

Event Timeline

ArielGlenn created this task.

Right now I'm checking to see if compression/decompression of any of the files plays a role; I've also noted which revision of which page is the last to be written. The following revision, at least in the stub file, sure looks harmless enough.

If using uncompressed i/o for everything makes no difference, I'll see if using a shorter stub file with about 10 pages including the one where he job hangs, still has problems or not.

Uncompressed stubs and output file don't help; uncompressed prefetch files seem to make a difference. The output file is now about halfway complete and much farther on than the two attempts that hung. I'll go ahead and let this complete, compress the file and move it into place so that the regular run can continue on, while I continue looking into the cause of the problem.

Ne bz2 file has been copied into place and current zhwiki processes shot; the scheduler should pick it up later today or tomorrow and complete the run.

Run is completed. I am still trying to create smaller input files that reproduce the problem, no luck yet. In the meantime, the revision that breaks everything is this one: https://zh.wikipedia.org/w/index.php?title=Template:X2&oldid=35165097 (WARNING, this is a 12 megabyte text revision so it may break your browser!)

We have another instance of the pages-meta-history hanging. Command:

/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=metawiki --stub=gzip:/mnt/dumpsdata/xmldatadumps/temp/m/metawiki/metawiki-20200701-stub-meta-history1.xml-p157254p321600.gz --prefetch=7zip:/mnt/dumpsdata/xmldatadumps/public/metawiki/20200601/metawiki-20200601-pages-meta-history1.xml-p76243p158540.7z;/mnt/dumpsdata/xmldatadumps/public/metawiki/20200601/metawiki-20200601-pages-meta-history1.xml-p158541p321600.7z --report=1000 --spawn=/usr/bin/php7.2 --output=lbzip2:/mnt/dumpsdata/xmldatadumps/public/metawiki/20200701/metawiki-20200701-pages-meta-history1.xml-p157254p321600.bz2.inprog --full

Running processes:

dumpsgen  3827  0.0  0.0   4276   740 ?        S    Jul07   0:00 sh -c 7za e -bd -so '/mnt/dumpsdata/xmldatadumps/public/metawiki/20200601/metawiki-20200601-pages-meta-history1.xml-p158541p321600.7z' 2>/dev/null
dumpsgen  3828  0.1  0.0  25092  8948 ?        S    Jul07   1:08 /usr/lib/p7zip/7za e -bd -so /mnt/dumpsdata/xmldatadumps/public/metawiki/20200601/metawiki-20200601-pages-meta-history1.xml-p158541p321600.7z
dumpsgen  3848  0.0  0.0   4276   744 ?        S    Jul07   0:00 sh -c '/usr/bin/php7.2' '/srv/mediawiki/php-1.35.0-wmf.39/../multiversion/MWScript.php' 'fetchText.php' '--wiki' 'metawiki'
dumpsgen  3849  0.0  0.0 451640 59684 ?        S    Jul07   0:03 /usr/bin/php7.2 /srv/mediawiki/php-1.35.0-wmf.39/../multiversion/MWScript.php fetchText.php --wiki metawiki
dumpsgen 61341  1.5  0.1 492844 98080 ?        S    Jul07  18:09 /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=metawiki --stub=gzip:/mnt/dumpsdata/xmldatadumps/temp/m/metawiki/metawiki-20200701-stub-meta-history1.xml-p157254p321600.gz --prefetch=7zip:/mnt/dumpsdata/xmldatadumps/public/metawiki/20200601/metawiki-20200601-pages-meta-history1.xml-p76243p158540.7z;/mnt/dumpsdata/xmldatadumps/public/metawiki/20200601/metawiki-20200601-pages-meta-history1.xml-p158541p321600.7z --report=1000 --spawn=/usr/bin/php7.2 --output=lbzip2:/mnt/dumpsdata/xmldatadumps/public/metawiki/20200701/metawiki-20200701-pages-meta-history1.xml-p157254p321600.bz2.inprog --full
dumpsgen 61395  0.0  0.0   4276   700 ?        S    Jul07   0:00 sh -c lbzip2 -n 1 > '/mnt/dumpsdata/xmldatadumps/public/metawiki/20200701/metawiki-20200701-pages-meta-history1.xml-p157254p321600.bz2.inprog'

Relevant strace output:

root@snapshot1007:~# strace -p 3828
strace: Process 3828 attached
write(1, "l:Contributions/135.180.106.208|"..., 3514368^Cstrace: Process 3828 detached
 <detached ...>
root@snapshot1007:~# strace -p 3849
strace: Process 3849 attached
read(17, ^Cstrace: Process 3849 detached
 <detached ...>
root@snapshot1007:~# strace -p 61341
strace: Process 61341 attached
wait4(3827, ^Cstrace: Process 61341 detached
 <detached ...>

I've saved all the files needed to rerun the command, to dumpsdata1001:/data/temp/dumpsgen/hangbug

The command was run with only the second prefetch file, and output moved into place after the current metawiki run was killed. A new run has already started, picking up where the old one was stopped.

@ArielGlenn This might be a stupid question. Since the template causing an issue is a sandbox for people to try out how templates run, people may put crazy huge things in there (like the one you mentioned above). is it possible to skip all of these sandboxes in future dump?

I don't know that such templates cause the issue for metawiki dumps. In addition it is possible to generate the output by tweaking the command line a little, and then the problem revision(s) are dumped as expected.

In the meantime we have not had a hang for the zhwiki dumps since this was reported last year, so I think it's fine to continue to dump everything.

I have gotten some of the input files to be smaller for the command that hangs, and have tried changing a few things to make it not hang. Here's the summary.

The problem child pageid is 219724 (User talk:Tegel) and the revision where it hangs is 20090511 with a size of 17553256 bytes, from metawiki. Text id is 20521344.

On any snapshot host (where /mnt/dumpsdata is available), the following hangs:

/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=metawiki  --stub=file:/mnt/dumpsdata/temp/dumpsgen/hangbug/metatesting-p219724p219740.txt   --prefetch='nbzip2:/mnt/dumpsdata/temp/dumpsgen/hangbug/missing.bz2;/mnt/dumpsdata/temp/dumpsgen/hangbug/meta-prefetch-p219700p219740.bz2' --report=1 --spawn=/usr/bin/php7.2 --output=file:/mnt/dumpsdata/temp/dumpsgen/hangbug/metawiki-20200701-pages-meta-history1.xml-p157254p321600.txt.testing --full

The p<start>p<end> numbers in the filenames indicate the first and last page numbers in each file.

  • The stubs can be in gz or plain text format, it doesn't matter.
  • The first prefetch file can be a regular file with content or be missing altogether. But it must be listed; if only the second file is listed, things work.
  • The spawn option must be used; if we do not retrieve revisions via a spawned php process, things work.
  • The prefetch files can be in 7z format (as they usually are) to cause the command to fail, or in bz2 format with a stream wrapper that looks just like SevenZipStream.php but invokes bzip2 instead (as in the above command), to cause the command to fail. If they are in bzip2 format without this wrapper, i.e. bzip2:/mnt/dumpsdata/...blah.bz2, the command works. This indicates to me that the stream wrapper is somehow involved in the problem.
  • I note that in SevenZipStream.php, the close method there really ought to call pclose instead of fclose from what I read in the docs, but even with that change made, the command still hangs.

I have not yet tried removing some revisions from the stubs to see if I can narrow down the input, and therefore the processing, even further. That's next.

If I remove "a few" revs from the stub file, the command still hangs. If i remove most, it doesn't. I'll try to get it down to the minimum possible.

I have removed some revisions from /mnt/dumpsdata/temp/dumpsgen/hangbug/metatesting-p219724p219740.txt 1943 revisions are now processed before the hang. If I remove just one more revision the command succeeds. This is the smallest I can get the files, and the least amount of processing output we're going to have, which isn't great.

I see

2020-07-13 10:30:09 [862c1ecd4a46aee776631ba2] snapshot1010 metawiki 1.35.0-wmf.40 error ERROR: [862c1ecd4a46aee776631ba2] [no req]   ErrorException from line 199 of /srv/mediawiki/php-1.35.0-wmf
.40/includes/export/BaseDump.php: PHP Warning: XMLReader::read(): mediawiki.compress.nbzip2:///mnt/dumpsdata/temp/dumpsgen/hangbug/meta-prefetch-p219700p219740.bz2:1227921: parser error : xmlSAX2
Characters: huge text node {"exception_id":"862c1ecd4a46aee776631ba2","exception_url":"[no req]","caught_by":"mwe_handler"} 
[Exception ErrorException] (/srv/mediawiki/php-1.35.0-wmf.40/includes/export/BaseDump.php:199) PHP Warning: XMLReader::read(): mediawiki.compress.nbzip2:///mnt/dumpsdata/temp/dumpsgen/hangbug/met
a-prefetch-p219700p219740.bz2:1227921: parser error : xmlSAX2Characters: huge text node
  #0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
  #1 /srv/mediawiki/php-1.35.0-wmf.40/includes/export/BaseDump.php(199): XMLReader->read()
  #2 /srv/mediawiki/php-1.35.0-wmf.40/includes/export/BaseDump.php(154): BaseDump->nodeContents()
  #3 /srv/mediawiki/php-1.35.0-wmf.40/includes/export/BaseDump.php(104): BaseDump->nextText()
  #4 /srv/mediawiki/php-1.35.0-wmf.40/maintenance/includes/TextPassDumper.php(612): BaseDump->prefetch(integer, integer, string)
  #5 /srv/mediawiki/php-1.35.0-wmf.40/maintenance/includes/TextPassDumper.php(935): TextPassDumper->getText(string, string, string, integer)
  #6 [internal function]: TextPassDumper->startElement(resource, string, array)
  #7 /srv/mediawiki/php-1.35.0-wmf.40/maintenance/includes/TextPassDumper.php(475): xml_parse(resource, string, boolean)
  #8 /srv/mediawiki/php-1.35.0-wmf.40/maintenance/includes/TextPassDumper.php(294): TextPassDumper->readDump(resource)
  #9 /srv/mediawiki/php-1.35.0-wmf.40/maintenance/includes/TextPassDumper.php(162): TextPassDumper->dump(boolean)
  #10 /srv/mediawiki/php-1.35.0-wmf.40/maintenance/doMaintenance.php(107): TextPassDumper->execute()
  #11 /srv/mediawiki/php-1.35.0-wmf.40/maintenance/dumpTextPass.php(29): require_once(string)
  #12 /srv/mediawiki/multiversion/MWScript.php(101): require_once(string)
  #13 {main}

in the logs, I wonder what happens when that is caught. Something that doesn't always work, apparently.

Change 612178 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/core@master] make prefetch for dumps work with large revisions for multiple files

https://gerrit.wikimedia.org/r/612178

Hindsight is 20:20.

The above was tested and works on the problem example, which now runs to completion.

Change 612178 merged by jenkins-bot:
[mediawiki/core@master] make prefetch for dumps work with large revisions for multiple files

https://gerrit.wikimedia.org/r/612178

Leaving this open until wmf.41 is everywhere; then it can be closed.

Nothing to review, the change was merged and it's everywhere so this can be closed. Doing so :-)