Page MenuHomePhabricator

ExcimerProfiler with PHP 8.2 JIT causes occasional memory corruption
Open, MediumPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Seems like installing excimer bugs PHP. No idea WHY and WHEN exactly happens (this error along with other weird errors (more complex) are reported by sentry), but uninstalling excimer seems to fix the issues.

What happens?:

Randomly following code

public function f() {
   $first = 30;
   $gap = 20;

   $images = [];
   $template = '...';

   for ($i = 1 + $gap; $i <= $first * $gap; $i += $gap) {
      $images[] = $this->amazonUrlBuilder->get($this->createFilesystemPathFor($template, $i));
   }
}

private function createFilesystemPathFor(string $template, int $i): string
{
    return $template . sprintf('%03d', $i) . '.jpg';
}

breaks with following error:

Argument #2 ($i) must be of type int, null given

Which seems INSANE - I mean, how $i can be null in that case?

And some weird stuff that should never happen.

This is error reported by sentry, never experienced it while developing.

What should have happened instead?:

No not have this error.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

PHP 8.2.22 (cli) (built: Aug 1 2024 22:11:28) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.22, Copyright (c) Zend Technologies

with Zend OPcache v8.2.22, Copyright (c), by Zend Technologies

Other information (browser name/version, screenshots, etc.):

I suspect that could be due JIT enabled?
PHP is ran as cli script inside Alpine Linux (docker)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Warxcell renamed this task from Excimer bugs php to Excimer bugs breaks PHP itself.Aug 14 2024, 6:49 PM
Warxcell renamed this task from Excimer bugs breaks PHP itself to Excimer bugs PHP itself.
Warxcell updated the task description. (Show Details)
Warxcell updated the task description. (Show Details)
Warxcell updated the task description. (Show Details)

How are you using Excimer? This sort of thing is more likely to happen if it's actually called.

Could be, Sentry is using it to trace && profile some of the requests (currently setup at 5% chance). Although I profiled requests manually - and I didn't encountered this error.

@Warxcell OK thanks, I found the calling code.

Do you still have logs? Did you have a flood of errors in different requests like at T342304?

No, no errors like "timeout". These are errors reported by sentry (which dissapeared after uninstalling excimer):

Uncaught PHP Exception Twig\Error\RuntimeError: "The column filter only works with sequences/mappings or "Traversable", got "boolean" as first argument."

An exception has been thrown during the rendering of a template ("Illegal offset type").

MaxMind\Db\Reader\InvalidDatabaseException: The MaxMind DB file contains bad data

all errors are somehow related to variable not in expected type. And they are not so quite often - errors seems to happen 1-2 errors per hour. (Something like 2 errors per 10k requests)

Btw I don't know if related - but the PHP is long-running process (does NOT die between requests).

image.png (428×776 px, 46 KB)

I tried running MediaWiki on PHP git master (8.4-dev), compiled with AddressSanitizer, with Excimer profiling at a high rate (1ms). I hit a dangling pointer bug pretty quickly which does seem to depend on Excimer although Excimer is never in the stack. I'm working on isolating it. Maybe it will be that simple.

A dangling pointer can cause PHP to randomly write to opcache shared memory, causing errors in subsequent requests, like literals with the wrong value, or garbled method names, or segfaults. Often you will see the same message over and over.

If the problem is limited to a single request, and mostly affects the arguments to functions, then it's more likely the VM stack was corrupted, like what we saw with PHP issue 11548.

I'm asking whether similar errors occur in subsequent requests because if there's opcache corruption, it's fairly difficult to isolate from the error messages, so the error messages don't really matter. But if the error messages are isolated to a single request then there is no opcache corruption and hence less distance between the cause and the error, so it is slightly more likely that the error messages will be useful. It's still not very likely.

The best chance of isolating the bug is if you can find a reproduction procedure.

PHP is big and changes often, and Excimer calls it in an odd way, so the most likely cause is a bug in PHP which is exposed by Excimer.

After spending a bit more than a day on this, I'm going to file a separate task for the bug I found, since it's affecting a specific line of code in MediaWiki and is relatively easy to reproduce.

Since the bug I found is not reproducible in 8.2 or 8.3, it's probably not the same as the one reported above. Further discussion on it will be on the PHP bug tracker at https://github.com/php/php-src/issues/15502

tstarling renamed this task from Excimer bugs PHP itself to ExcimerProfiler with PHP 8.2 JIT causes occasional memory corruption.Aug 21 2024, 10:09 PM
tstarling triaged this task as Medium priority.

All Excimer segfault bugs are now suspected duplicates of T389734, i.e. glibc timer aliasing, glibc#32833.