Page MenuHomePhabricator

HHVM segfault in memory cleanup
Closed, ResolvedPublic

Description

mw1261-mw1265 are running HHVM 3.18.2 now. They're mostly working fine, but mw1265 segfaulted with what looks like a race condition in memory cleanup. I've reported this upstream at https://github.com/facebook/hhvm/issues/7779

Not sure if it's new in 3.18, we might just as well have seen it with 3.12 already, but never really diagnosed it.

Event Timeline

Status update, this has now been narrowed down by Reedy to a single reproducer from the phpunit tests and one of the HHVM developers said he'd look into it a fix soon.

@Reedy investigation at https://github.com/facebook/hhvm/issues/7779 shows an issue within XmlReader. That seems very close to T156923#2992912 which has hit us with HHVM 3.12.11. By that I mean the stacktrace looks alike.

The 3.12.11 issue got fixed when you rebuild the package and removed a patch:

I backed out the bzip2-segfault-sweep.patch introduced in 3.12.11+dfsg-1 and that fixes it. Will upload new packages to apt.wikimedia.org tomorrow.

That might be related, but I'm not fully convinced. The patch we dropped from 3.12.11 was a backport from trunk. It might be that this feature was broken to begin with and noone else except uses/noticed it, but let's wait what the HHVM developers have to say. Since we now have a reproducible test case, that should lead to a proper fix soon I hope.

Mentioned in SAL (#wikimedia-operations) [2017-05-15T13:27:45Z] <moritzm> uploaded HHVM 3.18.2+dfsg-1+wmf3 to apt.wikimedia.org (addresses segfault in XML reader (T162586, T165074)

This is fixed in 3.18.2+dfsg-1+wmf3. So far this has only been reproduced with the test case from the test suite, I'll keep this bug open until it's fully rolled out to the systems currently using 3.18 (and confirmed to also no longer crash in practice).

Mentioned in SAL (#wikimedia-operations) [2017-05-23T12:09:26Z] <moritzm> uploaded hhvm 3.18.2+dfsg-1+wmf4 to apt.wikimedia.org (contains extended upstream fix for XML reader crash) (T162586)

I guess now we can switch CI from HHVM 3.12. to the latest 3.18 you have build? :-}

Yeah, that was even possible with +wmf3 (ran the crashing test manually in vagrant), but even more so with +wmf4.

This is resolved in 3.18.2+dfsg-1+wmf3 and 3.18.2+dfsg-1+wmf4, all the hosts migrated to 3.18 are using that version, so closing.