Page MenuHomePhabricator

PHP 7.2 garbage collector segfault
Open, Stalled, LowPublic

Description

We have a 100% reproducible segfault in php 7.2. This bug was fixed in php 7.3 and is apparently in an alpha for php 7.4, but does not appear to have been cherry-picked for php 7.2.

To reproduce:

$ git review -d 523715 # git hash 32a72da6710f570b3815ca572baf9709b8f4ae40
$ tools/gen-json-blacklist.js
$ git review -d 523330 # git hash 10e616658817d7915318369a06b4116def8125cc
$ php bin/parserTests.php

Notes:

  • steps 1 and 2 might be optional, i.e. blacklist may not be needed for the segv)
  • you may have to ensure that the environment variable MW_INSTALL_PATH is *not* set
  • reproducible as of PS 5 of gerrit 523330

segfault occurs on php 7.2.19, does not occur in 7.3.4-2.

git bisect narrows down the responsible patch to:
https://github.com/php/php-src/commit/3b5b64ce75b00a00a256f1a59655a0830d071036

github indicates this patch is present in php-7.4.0alpha3 (and subsequent 7.4 tags) and php-7.3.0RC1 (and subsequent 7.3 tags). But it is not present in PHP 7.2.

We should decide whether to ask PHP's maintainers to cherry-pick this back to the PHP 7.2 branch, or else bump to 7.3 in production. It's probably unwise to run PHP 7.2 with a known garbage collector bug and reproducible segfault in production.

Details

Related Gerrit Patches:

Event Timeline

cscott created this task.Jul 17 2019, 10:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 17 2019, 10:05 PM
cscott added a subscriber: Stas.Jul 17 2019, 10:06 PM

For the record, the segfault is:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555c6f07c in zend_mm_alloc_small (bin_num=6, size=56, 
    heap=0x7ffff4e00040)
    at /home/cananian/Projects/Wikimedia/php-src/Zend/zend_alloc.c:1275
1275                        heap->free_slot[bin_num] = p->next_free_slot;
ssastry updated the task description. (Show Details)Jul 17 2019, 10:09 PM
ssastry added a project: Parsoid-PHP.
ssastry moved this task from Backlog to Deployment on the Parsoid-PHP board.
ssastry added subscribers: Core Platform Team, Joe, tstarling.
ssastry removed a subscriber: Parsing-Team.

FYI: This bug doesn't show up with PS6 of gerrit 523715. So, you need PS5 of that patch.

We build our own PHP 7.2 packages already, so we can just cherry-pick that patch ourselves.

Legoktm edited subscribers, added: Smalyshev; removed: Stas.Jul 17 2019, 11:38 PM

Worth keeping it on the radar, and/or checking w/ upstream to see why they didn't cherry pick this to 7.2 (maybe there's a nonobvious conflict). Also probably worth some investigation on the PHP side, maybe they've already got a minimized test case which is smaller than running all of parsoid. Anyway, opened the phab task to hand this off to platform.

(And I included git hashes of the specific patches I used above; as subbu notes we're actively developing those patches still, so if/when you try to reproduce double-check the hashes.)

Do we have some small piece of code that we can demo this problem? Absent that, do I assume correctly that the checkouts in the bug description are from parsoid repo? Could somebody post the link to the repo so I could check it out?

Yes, sorry, from the Parsoid repo. Try:

git clone "https://gerrit.wikimedia.org/r/mediawiki/services/parsoid"
cd parsoid
git fetch origin 32a72da6710f570b3815ca572baf9709b8f4ae40
git checkout FETCH_HEAD
tools/gen-json-blacklist.js 
git fetch origin 10e616658817d7915318369a06b4116def8125cc
git checkout FETCH_HEAD 
composer install
MW_INSTALL_PATH= php bin/parserTests.php

Do we have some small piece of code that we can demo this problem?

Unfortunately not.

Absent that, do I assume correctly that the checkouts in the bug description are from parsoid repo? Could somebody post the link to the repo so I could check it out?

Yes. https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/523330/8 manifests the problem again (PS5 had it, PS6 and PS7 didn't, PS8 - the link I gave you - does).

We build our own PHP 7.2 packages already, so we can just cherry-pick that patch ourselves.

We in fact build custom PHP 7.2 packages (so we can fold in the patch combined with an update to 7.2.20), but it would still be good to work towards getting the patch into a 7.2.x release as a) every other use of 7.2 will benefit from that and b) it will reduce overhead the subsequent time we rebase to a new 7.2.x release.

Just to be clear - we verified that applying https://github.com/php/php-src/commit/3b5b64ce75b00a00a256f1a59655a0830d071036 to 7.2 tree fixes the issue on 7.2? Do we have any other patches in private 7.2 tree that might be influencing it? Because the patch author says it shouldn't do anything for 7.2 so I wonder if it may be something else going on?

cscott added a comment.Aug 1 2019, 5:05 PM

@Smalyshev No, I did not verify that this patch fixes 7.2. I verified that the bug exists in the most recent release of PHP 7.2, and in releases of PHP 7.3 before php-7.3.0RC1, and releases of PHP 7.4 before php-7.4.0alpha3. That is, this patch fixed the segfault in the 7.3 and 7.4 branches.

I don't have a known fix for PHP 7.2. (That said, I haven't tried cherry-picking this patch to 7.2 either.)

I was able to reproduce this locally. As nikic says on GitHub, the patch cannot be meaningfully applied to 7.2. I suspect it's a double-free or use-after-free, and the exact consequences of that vary from version to version. With USE_ZEND_ALLOC=0, it aborts on exit with "corrupted double-linked list", which is a pretty sure sign that the allocator is not the root cause. I'll try to isolate it further.

Running under valgrind with gdbserver shows a crash in json_encode() while serializing a corrupt array that came from the DataBag. The backtrace indicates DOMDataUtils::storeDataAttribs(). So the DataBag was probably accessed after free due to a bug in the DOM extension. I patched Parsoid to store DOM-associated data in an SplObjectStorage instead, and there was no crash.

cscott added a comment.Aug 2 2019, 9:46 PM

...or we could stop using the DOM extension, I suppose (T217867). I'm not a fan of the quality of that code....

Running under valgrind with gdbserver shows a crash in json_encode() while serializing a corrupt array that came from the DataBag. The backtrace indicates DOMDataUtils::storeDataAttribs(). So the DataBag was probably accessed after free due to a bug in the DOM extension. I patched Parsoid to store DOM-associated data in an SplObjectStorage instead, and there was no crash.

Can you say more about this? We added a live reference to the created document from Env.php to ensure the php-wrapper dom and nodes stay live after creation. So, as long as an env object is live, all associated doms should be live as well and there shouldn't be a use-after-free. But, maybe that is why you are saying this maybe a bug in the DOM extension.

I'm using AddressSanitizer now, instead of valgrind. To integrate AddressSanitizer with PHP, the environment variable USE_ZEND_ALLOC=0 must be set. Compile with CFLAGS='-fsanitize=address' LIBS='-lasan -ldl'. Garbage collection is part of the bug, and adding gc_collect_cycles() before and after the call to runTestInMode() helps to trigger it earlier.

I still haven't gotten to the bottom of it, but I've discovered that casting a compile-time constant array to an object is essential to the crash. DataBag has such a cast in its constructor. The usual crash involves freeing that constant array and then later using it again. If I make the array in DataBag not be a constant, it crashes instead after freeing the constant array created by PipelineUtils:wrapAccum(). In both cases, the array is freed by the garbage collector.

The way DOMDocument handles custom properties seems pretty standard and may not be part of the bug.

Change 527952 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/services/parsoid@master] Workaround for PHP bug involving constant arrays cast to objects

https://gerrit.wikimedia.org/r/527952

... adding gc_collect_cycles() before and after the call to runTestInMode() helps to trigger it earlier.

Hmm .. with PHP 7.2.19-0ubuntu0.18.04.1 (cli) (built: Jun 4 2019 14:48:12) ( NTS ), I cannot reproduce this on Parsoid master branch (parser tests on the master branch have stopped crashing for a while now) even with these calls. Not sure what that means.

I still haven't gotten to the bottom of it, but I've discovered that casting a compile-time constant array to an object is essential to the crash. DataBag has such a cast in its constructor. The usual crash involves freeing that constant array and then later using it again. If I make the array in DataBag not be a constant, it crashes instead after freeing the constant array created by PipelineUtils:wrapAccum(). In both cases, the array is freed by the garbage collector.
The way DOMDocument handles custom properties seems pretty standard and may not be part of the bug.

I see .. we probably have other instances besides the two you flagged. We should probably fix all those instances if we want to be safe.

I isolated it and made a small test case. Submitted https://bugs.php.net/bug.php?id=78379 .

Change 527952 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Workaround for PHP bug involving constant arrays cast to objects

https://gerrit.wikimedia.org/r/527952

Krinkle moved this task from Untriaged to Other on the PHP 7.2 support board.Sep 6 2019, 12:24 AM

7.2.22 has two commits referencing https://bugs.php.net/bug.php?id=78379

358379be22c4e20f4942737e0e90422977355c63 - Fixed bug #78379 (Cast to object confuses GC, causes crash)
6b1cc1252e73e51e53194c8c65e3d2302bc83dca - Fixed second part of the bug #78379 (Cast to object confuses GC, causes crash)

Given we've now upgraded to 7.2.22 in prod, is this now Resolved?

Given we've now upgraded to 7.2.22 in prod, is this now Resolved?

Parsoid/PHP is using a workaround to avoid this GC bug. Not sure how MediaWiki core is affected.

Given we've now upgraded to 7.2.22 in prod, is this now Resolved?

Parsoid/PHP is using a workaround to avoid this GC bug. Not sure how MediaWiki core is affected.

I suppose someone could check out that old gerrit patch that triggered this bug consistently and see if php 7.2.22 fixes that. My laptop PHP (via ubuntu distros) is still at 7.2.19 so perhaps this has to be done on some test server.

I am going to untag Parsoid-PHP since this is a general PHP7 issue and not specific to Parsoid. We have a workaround for this in Parsoid-PHP.

Then which code base is affected by the issue in this task? MediaWiki core?

cscott added a comment.Nov 5 2019, 7:23 PM

No one has audited core or extensions for this particular usage pattern. So we assume there is at least *some* code in core/extensions which would be affected.

Parsoid used to use the problematic pattern extensively, but we added a (pretty ugly) workaround.

@tstarling did fill https://bugs.php.net/bug.php?id=78379 and some commits are in PHP 7.2.22 (see my previous comment T228346#5491809).

The upstream bug mentions that it may be okay for php 7.2.

If Parsoid has a reliable way to trigger the fault, it would be nice to check whether our 7.2.22 is still affected. If not I guess we can close our task.

(Test against our 7.2.24, please; we're just upgrading.)

JFTR, the mwdebug* servers are running 7.2.24 and can be used for additional tests.

Joe added a comment.Nov 6 2019, 10:26 AM

JFTR, the mwdebug* servers are running 7.2.24 and can be used for additional tests.

I just ran @tstarling's repro script on mwdebug2001 (php 7.2.24) and mw1278 (php 7.2.22) and in both cases we get the expected output. So the problem was indeed fixed in 7.2.22

mwdebug2001:~$ php test_T228346.php
----------------
object(C)#2 (1) {
  ["p"]=>
  object(stdClass)#1 (1) {
    ["x"]=>
    array(0) {
    }
  }
}

Thank you @Joe. From CI containers perspective adding a var_dump( 'PHP_VERSION' ); to Tim script:

$ cat fail.php|docker run --entrypoint=php --rm -i docker-registry.wikimedia.org/releng/php72:0.2.2-s1
string(53) "7.2.16-1+0~20190307202415.17+stretch~1.gbpa7be82+wmf1"
object(C)#2 (1) {
  ["p"]=>
  object(stdClass)#1 (1) {
    ["x"]=>
    array(1) {
      ["p"]=>
      *RECURSION*
    }
  }
}
$ cat fail.php|docker run --entrypoint=php --rm -i docker-registry.wikimedia.org/releng/php72:0.2.2-s2
string(47) "7.2.22-1+0~20190902.26+debian9~1.gbpd64eb7+wmf1"
object(C)#2 (1) {
  ["p"]=>
  object(stdClass)#1 (1) {
    ["x"]=>
    array(0) {
    }
  }
}

@cscott @ssastry maybe you can get a follow up task to remember to cleanup the workaround you had? Though you probably want to avoid crashing on pre 7.2.22 releases and upstream said that 7.4 might still be affected.

Else I guess this task can be closed with the resolution being PHP 7.2.22.

Jdforrester-WMF closed this task as Resolved.Nov 6 2019, 1:56 PM
Jdforrester-WMF claimed this task.

Else I guess this task can be closed with the resolution being PHP 7.2.22.

In that case, should we bump the minimum version the MW requires to 7.2.22? Would be good to decide this before 1.34.0 final ships.

Jdforrester-WMF reopened this task as Open.Nov 6 2019, 1:57 PM

Bah, mis-click.

Reedy added a subscriber: Reedy.Nov 6 2019, 4:55 PM

Else I guess this task can be closed with the resolution being PHP 7.2.22.

In that case, should we bump the minimum version the MW requires to 7.2.22? Would be good to decide this before 1.34.0 final ships.

Indeed. 19.04 has 7.2.24, Debian stable has 7.3, and oldstable has 7.0 so we don't really care about it

Else I guess this task can be closed with the resolution being PHP 7.2.22.

In that case, should we bump the minimum version the MW requires to 7.2.22? Would be good to decide this before 1.34.0 final ships.

I don't think that is strictly needed. After all the bug has not emerged outside of some code path in Parsoid, ie none of the MediaWiki core code we run has been affected.

I guess we can stick the requirement to 7.2 and maybe emphasis that the latest of 7.2 patch series is recommended.

In that case we'll want to bump to 7.2.22 in 1.35 when Parsoid lands in core? Hope we remember. ;-)

Krinkle added a subscriber: Krinkle.

OK. Moving to the next milestone then. In the interim, Parsoid's own composer.json or extension.json or run-time could assert to warn third-party users of the PHP port.

Jdforrester-WMF changed the task status from Open to Stalled.Nov 15 2019, 4:05 PM
Reedy added a comment.Nov 15 2019, 8:31 PM

In that case we'll want to bump to 7.2.22 in 1.35 when Parsoid lands in core? Hope we remember. ;-)

Worth making another task blocked by that? ;)

Jdforrester-WMF triaged this task as Low priority.Wed, Jan 15, 6:29 PM
Krinkle removed a subscriber: Krinkle.Wed, Jan 15, 6:38 PM