Page MenuHomePhabricator

Flow (current and/or history) dumps failed on various wikis with php exhausting allowed memory
Closed, ResolvedPublic

Description

Wikis impacted so far: mediawikiwiki, wikidatawiki, arwiki, frwiki, zhwiki

Sample exceptions from log:

Fatal error: Allowed memory size of 698351616 bytes exhausted (tried to allocate 32768 bytes) in /srv/mediawiki/php-1.38.0-wmf.20/includes/Storage/SqlBlobStore.php on line 592
Error from command(s): /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php extensions/Flow/maintenance/dumpBackup.php --wiki=mediawikiwiki --current --report=1000 --output=bzip2:/mnt/
dumpsdata/xmldatadumps/public/mediawikiwiki/20220201/mediawikiwiki-20220201-flow.xml.bz2.inprog 
2022-02-02 16:53:32: mediawikiwiki *** exception! error dumping flow page files
2022-02-02 16:53:32: mediawikiwiki ['Traceback (most recent call last):\n', '  File "/srv/deployment/dumps/dumps-cache/revs/8820784abc20f1902490ec23bcf8543cf5954c83/xmldumps-backup/dumps/r
unner.py", line 454, in do_run_item\n    item.dump(self)\n', '  File "/srv/deployment/dumps/dumps-cache/revs/8820784abc20f1902490ec23bcf8543cf5954c83/xmldumps-backup/dumps/jobs.py", line 1
83, in dump\n    done = self.run(runner)\n', '  File "/srv/deployment/dumps/dumps-cache/revs/8820784abc20f1902490ec23bcf8543cf5954c83/xmldumps-backup/dumps/flowjob.py", line 68, in run\n  
  raise BackupError("error dumping flow page files")\n', 'dumps.exceptions.BackupError: error dumping flow page files\n']
Fatal error: Allowed memory size of 698351616 bytes exhausted (tried to allocate 9437184 bytes) in /srv/mediawiki/php-1.38.0-wmf.20/extensions/Flow/includes/RevisionActionPermissions.php o
n line 87
Error from command(s): /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php extensions/Flow/maintenance/dumpBackup.php --wiki=mediawikiwiki --current --report=1000 --output=bzip2:/mnt/
dumpsdata/xmldatadumps/public/mediawikiwiki/20220201/mediawikiwiki-20220201-flowhistory.xml.bz2.inprog --full 
2022-02-02 17:30:17: mediawikiwiki *** exception! error dumping flow page files
2022-02-02 17:30:17: mediawikiwiki ['Traceback (most recent call last):\n', '  File "/srv/deployment/dumps/dumps-cache/revs/8820784abc20f1902490ec23bcf8543cf5954c83/xmldumps-backup/dumps/r
unner.py", line 454, in do_run_item\n    item.dump(self)\n', '  File "/srv/deployment/dumps/dumps-cache/revs/8820784abc20f1902490ec23bcf8543cf5954c83/xmldumps-backup/dumps/jobs.py", line 1
83, in dump\n    done = self.run(runner)\n', '  File "/srv/deployment/dumps/dumps-cache/revs/8820784abc20f1902490ec23bcf8543cf5954c83/xmldumps-backup/dumps/flowjob.py", line 68, in run\n  
  raise BackupError("error dumping flow page files")\n', 'dumps.exceptions.BackupError: error dumping flow page files\n']

Event Timeline

ArielGlenn created this task.

Could be caused by T300667, depending on how many title objects the dumper creates.

I've been trying some testing on deployment-prep to see if I can reproduce the issue or find a workaround. I'll update more as soon as there's a concrete result.

I coudl not dump ranges of boards because Flow's maintenance script, dumpBackujp.php, wrongly casts the boarstart and boardend options to int, turning board ids into 0s. Removing those live on the testbed host makes dumping ranges possible, so now we are in business. First I'll see if we can get a dump out by dumping all the boards in smaller ranges, and then adding a header and footer onto them. If I can make that work for current revs, I'll try it again for historical and post them, so that the files are at least available. The underlying issue will still need to be addressed.

Change 759754 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/extensions/Flow@master] boardstart/end options to dumpBackup will be base 36 strings, not ints

https://gerrit.wikimedia.org/r/759754

When the above is deployed, I'll be able to work around the issue properly; until then I must rely on manually editing the file in place so that I can dump ranges of workboards.

Wikidata flow history dumps have failed for the same reason.

I have added manually generated flow and flow-history dumps for mediawikiwiki to the internal dump run directory; I then forced a run manually on all remaining jobs, of which there was only one. The updated files should show up on the public-facing servers in a few hours.

I have similarly generated the flow history dump for wikidatawiki but have not put it into place; I'll do that when the rest of the run concludes, in some days. The file is in /mnt/dumpsdata/temp/dumpsgen/flow/wikidatawiki-20220201-flowhistory.xml.bz2 on dumpsdata1003 just for reference.

Could be caused by T300667, depending on how many title objects the dumper creates.

We've seen two wikis with this problem that both were fine in the previous month's flow-history run, and while it's possible that both of them just crossed some threshhold as to the number of Titles generated, I feel it's more likely that some change deployed since Jan 1, and more likely since Jan 20, was responsible. Off to hunt through the backlog...

ArielGlenn renamed this task from mediawikiwiki Flow (current and history) dumps failed with php exhausting allowed memory to Flow (current and/or history) dumps failed on various wikis with php exhausting allowed memory.Feb 10 2022, 6:52 AM
ArielGlenn updated the task description. (Show Details)

Change 761574 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[operations/puppet@production] fix up flow dumps config for deployment-prep cluster

https://gerrit.wikimedia.org/r/761574

Change 761574 merged by ArielGlenn:

[operations/puppet@production] fix up flow dumps config for deployment-prep cluster

https://gerrit.wikimedia.org/r/761574

I have a workaround script, which needs to be tested and relies on using page ranges, which is rather inefficient because it will increase thenumber of queries, but we can't do UUID ranges in an automated fashion. This still will not address the underlying cause but may allow flow dumps to proceed for the future [citation needed].

Change 762127 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[operations/dumps@master] do flow dumps in multiple pieces and concat them together

https://gerrit.wikimedia.org/r/762127

The good news is that the above patch, clunky as it is, allowed me to rerun flow dumps for arwiki and zhwiki with no problems; I'll copy those into palce and run noops for those wikis soon.

The bad news is that frwiki flow history dumps fail on one single page, and there's no way to split that up into smaller units to dump, unfortunately. Here's the command that fails, and the output:

dumpsgen@snapshot1009:/srv/deployment/dumps/dumps/xmldumps-backup$ /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php extensions/Flow/maintenance/dumpBackup.php --wiki=frwiki --output=file:/mnt/dumpsdata/xmldatadumps/temp/f/frwiki/frwiki-20220201-flowhistory.xml.bz2.inprog_tmp --current --report=1000 --full --skip-header --start=8284195 --skip-footer --end 8284196
Fatal error: Allowed memory size of 698351616 bytes exhausted (tried to allocate 33554440 bytes) in /srv/mediawiki/php-1.38.0-wmf.21/extensions/Flow/includes/Repository/UserNameBatch.php on line 75

One single Flow board using more than 700M. Sigh. See
https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Forum_des_nouveaux
and it has more than 38k posts in it.

OK so first off, I do not seem to be able to override the memory limit using the --memory-limit option to the maintenance script. I can lower the limit but I cannot increase it past the 700M that is given to $wmgMemoryLimit in InitialiseSettings.

But as I read includes/maintenance/Maintenance.php and specifically finalSetup(), the adjustMemoryLimit() call in there should set the limit to 'max' (no limit) if I pass nothing (see https://gerrit.wikimedia.org/g/mediawiki/core/+/b1aac3dd9b95a04e227cb80855cce0a1f6a05ae8/maintenance/includes/Maintenance.php#764 ) Clearly this isn't happening, it's picking up the 700M setting instead, and I wonder why that is.

I have added manually the line

ini_set( 'memory_limit', '-1' );

to the execute() method of Flow's dumpBackup.php, and it now runs to completion, with a grand total of 55k posts (!) in that board. This is not wonderful for various reasons but at least it lets me get the flow history file for frwiki generated, which I'm going to do right now.

Change 759754 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] boardstart/end options to dumpBackup will be base 36 strings, not ints

https://gerrit.wikimedia.org/r/759754

On local install with branches 17, 18, 19, 20, the correct memory settings are seen in Flow every time. In deployment-prep against php-master, we have the 700M cutoff, as verified by printing out the setting from Flow's dumpBackup::execute at the top of the method. I guess it's in the configs someplace, since my local install doesn't use the production wmfconfig.

After a couple more prints in deployment-prep, I see that adjustMemoryLimit() in the maintenance class is called first, (presuambly in Setup()), then wfMemoryLimit() (also in Setup() I suppose) and then the second call that should happen to adjustMemoryLimit() in finalSetup() apparently never happens. In fact, the call to Maintenance::finalSetup() appears to never happen.

And that leads me to this commit which I would guess is the culprit: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/757469

Change 762422 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/extensions/Flow@master] force removal of memory limit for dump maintenance script

https://gerrit.wikimedia.org/r/762422

Tagging @Pchelolo for the finalSetup() issue. The above one line patch works around that for now; it should go in and be deployed before the 20th of the month, in time for the next dump run, or the finalSetup() issue should be resolved and that fix deployed. I'm not sure what the right fix is there; can one just call the parent or are there subtleties involving the SettingsBuilder that we have to account for?

@ArielGlenn could you please confirm if https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-deploy-2022.02.14?id=uhNt-H4Biq-bA-EUP2u_ is related to the errors described here? I see it's generated on a snapshot host, and looks similar, but has a different stack trace.

	from /srv/mediawiki/php-1.38.0-wmf.21/extensions/Flow/includes/Repository/UserNameBatch.php(75)
#0 [internal function]: MWExceptionHandler::handleFatalError()
#1 {main}

If it's not related, I can open a new task.

@ArielGlenn could you please confirm if https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-deploy-2022.02.14?id=uhNt-H4Biq-bA-EUP2u_ is related to the errors described here? I see it's generated on a snapshot host, and looks similar, but has a different stack trace.

	from /srv/mediawiki/php-1.38.0-wmf.21/extensions/Flow/includes/Repository/UserNameBatch.php(75)
#0 [internal function]: MWExceptionHandler::handleFatalError()
#1 {main}

If it's not related, I can open a new task.

Same exact thing indeed. Thanks for checking though!

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/762448 merged by Pchelko (thanks!), looking good in deployment-prep for Flow and for all the other dump jobs too.

Change 762422 abandoned by ArielGlenn:

[mediawiki/extensions/Flow@master] force removal of memory limit for dump maintenance script

Reason:

superceded by I4295ef31909bd0512870c960704971002b5204c5

https://gerrit.wikimedia.org/r/762422

I have put in place the flow dumps files for frwiki and wikidatawiki and the noop jobs are running now. Everything should show up sometime tomorrow on the public servers, fingers crossed.

Change 762127 merged by jenkins-bot:

[operations/dumps@master] do flow dumps in multiple pieces and concat them together

https://gerrit.wikimedia.org/r/762127

Flow jobs running properly this run, so I can close this out. At some ploint it would be nice to look at the actual memory issues in the Flow extension but since no one owns that...