Page MenuHomePhabricator

PHP Warning: XMLReader::read(): Memory allocation failed : growing input buffer
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error
labels.normalized_message
[{reqId}] {exception_url}   PHP Warning: XMLReader::read(): Memory allocation failed : growing input buffer
error.stack_trace
from /srv/mediawiki/php-1.41.0-wmf.8/includes/export/BaseDump.php(217)
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.41.0-wmf.8/includes/export/BaseDump.php(217): XMLReader->read()
#2 /srv/mediawiki/php-1.41.0-wmf.8/includes/export/BaseDump.php(172): BaseDump->nodeContents()
#3 /srv/mediawiki/php-1.41.0-wmf.8/includes/export/BaseDump.php(115): BaseDump->nextText()
#4 /srv/mediawiki/php-1.41.0-wmf.8/maintenance/includes/TextPassDumper.php(634): BaseDump->prefetch(integer, integer, string)
#5 /srv/mediawiki/php-1.41.0-wmf.8/maintenance/includes/TextPassDumper.php(959): TextPassDumper->getText(string, string, string, integer)
#6 [internal function]: TextPassDumper->startElement(resource, string, array)
#7 /srv/mediawiki/php-1.41.0-wmf.8/maintenance/includes/TextPassDumper.php(497): xml_parse(resource, string, boolean)
#8 /srv/mediawiki/php-1.41.0-wmf.8/maintenance/includes/TextPassDumper.php(319): TextPassDumper->readDump(resource)
#9 /srv/mediawiki/php-1.41.0-wmf.8/maintenance/includes/TextPassDumper.php(187): TextPassDumper->dump(boolean)
#10 /srv/mediawiki/php-1.41.0-wmf.8/maintenance/includes/MaintenanceRunner.php(681): TextPassDumper->execute()
#11 /srv/mediawiki/php-1.41.0-wmf.8/maintenance/run.php(51): MediaWiki\Maintenance\MaintenanceRunner->run()
#12 /srv/mediawiki/multiversion/MWScript.php(140): require_once(string)
#13 {main}
Impact
Notes

3 of them emitted from snapshot1009 at 11:54:49.468, 12:08:46.094 and 12:29:18.345. All for commonswiki.

Event Timeline

That's me testing; snapshot1009 is the testbed host. I may wind up opening a task for the underlying issue, still investigating though.

It turns out that there isn't some other error causing this one, as far as I can tell. Here's what I know right now:

  • This error only happened with one page range content job on one wiki. It is reproducible.
  • When I display the value of memory_limit just before the error, it is still -1 (no memory limit at all).
  • Grafana (https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-site=eqiad&var-cluster=dumps&var-instance=snapshot1009&var-datasource=thanos&from=now-15m&to=now for the testbed host where I ran this) shows no spike in memory usage, the ps listing shows no huge increase, and a display of memory_get_usage() (with both true and false passed) shows no huge increase either. Values range from virt size (2.3 gig via ps) to memory_get_usage (49 or 50 meg).
  • The page id where it happens is 110067251, revision id 591385496 which is quite short as to content length, less than 300 bytes. So are the previous revisions for that page. https://commons.wikimedia.org/wiki/File:Vignec_(Hautes-Pyr%C3%A9n%C3%A9es)_1.jpg to check for yourself.
  • if I change the prefetch to use the bz2 content files from the previous run, as opposed to the 7z compressed files, the error does not occur. Note that both sets of files contain identical content (verified via mdsum). Bz2 input streams are handled by a compiled-in php module, whereas 7z input streams are handled by a short php odule we wrote that wraps the 7za command.
  • If I change the page range to one significantly smaller that still contains the specific page id, the error does not occur.

Reproduce the error by:

On snapshot1009, be the dumpsgen user, run

/usr/bin/php7.4  /srv/mediawiki/multiversion/MWScript.php  dumpTextPass.php  --wiki=commonswiki --stub=gzip:/mnt/dumpsdata/temp/dumpsgen/commons-stubs-p110055359p110200000.gz --prefetch='7zip:/mnt/dumpsdata/temp/dumpsgen/commonswiki-20230401-pages-meta-history6.xml-p109898724p110227308.7z;/mnt/dumpsdata/temp/dumpsgen/commonswiki-20230401-pages-meta-history6.xml-p110227309p110592952.7z'  --dbgroupdefault=dump  --report=1000  --spawn=/usr/bin/php7.4   --output=file:/mnt/dumpsdata/temp/dumpsgen/commonswiki-pmh-bugs.txt

These files will not appear in the production dumps output tree, nor will they be cleaned up by any of our cleanup scripts. Snapshot1009 is our testbed host so this is a fine place to test. Please don't run more than one of these at a time, as we are running a backfill for another wiki there and want it to complete in a timely fashion.

The broken job hangs forever until shot; it is hanging in FormatJson::encode, which is invoked by the jsonSerializeException method of MWExceptionHandler after catching some sort of out of memory exception. Whatever MediaWiki does when it runs out of memory, it should not do that.

I have worked around the issue for the moment by manually running this job with the bz2 prefetch files.

Verified that with those same files from the above command the error is still present, nothing in the MW codebase has changed whatever the underlying issue is.

running a manual version on a screen, as the dumpsgen user: 14381.pts-0.snapshot1009

/usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript
.php dumpTextPass.php --wiki=commonswiki --stub=gzip:/mnt/dumpsdata/xmldatadumps/temp/c/commonswiki/commonswiki-20230901-stub-meta-history6.x
ml-p109905557p110218318.gz --prefetch='lbzip2:/mnt/dumpsdata/xmldatadumps/public/commonswiki/20230801/commonswiki-20230801-pages-meta-history
6.xml-p109799563p110102281.bz2;/mnt/dumpsdata/xmldatadumps/public/commonswiki/20230801/commonswiki-20230801-pages-meta-history6.xml-p11010228
2p110455199.bz2' --dbgroupdefault=dump --report=1000 --spawn=/usr/bin/php7.4 --output=lbzip2:/mnt/dumpsdata/temp/dumpsgen/commonswiki-xml-unz
ip-reader-problem/commonswiki-20230901-pages-meta-history6.xml-p109905557p110218318.bz2 --full

To expand on this a bit more: we saw the same error and stack trace on a slightly different page range, but with the identical symptoms. Logstash link here: https://logstash.wikimedia.org/goto/62b164dd91e2763a0a402d02087be836 Running the job hangs at the same point every time, even if nothing else is happening on the host; there aren't a particularly large number of revisions for the problem page, and their size isn't very large either. As before, using bz2 prfetch files permits the job to run to completion.

@Milimetric I've left things as they are for now, get ahold of me and we'll do the file verification and cleanup together, just to wrap this up.

quick recap of cleanup:

  • verified output from the command above, with dumplastbz2block, all looked good, had a closing tag and a good header.
  • killed the process that was stuck on snapshot1013 and removed the inprog file that it was writing. Replaced it with the output of the manual run above

We could make sure that for commonswiki, the setting config "sevenzipprefetch" is 0. I'll need to check that this is one of the settings that can be overriden, and that the code will recognize 0 as a 'false' value. This should get done before next month's full run.