Page MenuHomePhabricator

Spike of impossible "Cannot declare class" fatal errors (opcache)
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error message
Fatal error:
Cannot declare class Wikimedia\MWConfig\XWikimediaDebug, because the name is already in use
Stack Trace
#0 /srv/mediawiki/wmf-config/profiler.php(12): require_once()
#1 /srv/mediawiki/wmf-config/PhpAutoPrepend.php(24): require_once()
Notes

A spike of over 7000 fatals in the span of one hour, on the mw1379 server.

It lasted approximately from 2020-06-01 14:00 to 14:50 UTC.

From Logstash mediawiki-errors:

Screenshot 2020-06-02 at 02.00.58.png (598×1 px, 62 KB)

Impact

All web requests on this server were presumably aborted with a system error. Regardless of wiki, user, title, or action.

Event Timeline

Krinkle reopened this task as Open.EditedJun 30 2022, 8:27 PM
Krinkle added subscribers: thcipriani, Arlolra.
@Arlolra wrote at T311731:
Cannot declare class Wikimedia\MWConfig\Profiler, because the name is already in use in Profiler.php

#0 /srv/mediawiki/wmf-config/PhpAutoPrepend.php(30): require_once()

Notes:

  • The exception happens rarely—22 times in the past 90 days
  • Always on parsoid servers (wtp*.eqiad.wmnet)
  • Only happened on wikipedia wikis and once on commons (over the past 90 days)
  • Always with the pagebundle endpoint; e.g., /w/rest.php/<wikipedia>/v3/page/pagebundle/<etc>
Krinkle renamed this task from Spike of fatal error "Cannot declare class Wikimedia\MWConfig" on mw1379 (2020-06-01) to Brief spikes of fatals "Cannot declare class".Jun 30 2022, 8:29 PM

It's not an opcache revalidation issue afaik. For two reasons.

  1. We've disabled opcache's live revalidation mode as of last week (finally!).
  2. The Grafana host graphs indicate there was no reset or memory full or other notable event around this time.

https://grafana.wikimedia.org/d/000000550/mediawiki-application-servers?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=parsoid&var-node=wtp1044&from=1656585961300&to=1656619500500

Screenshot 2022-06-30 at 13.12.53.png (1×1 px, 286 KB)

Logstash entries for the affected server (wtp1044, Logstash query) show:

  • 12:19 RequestTimeoutException: The maximum execution time was exceeded
  • 13:42 RequestTimeoutException: The maximum execution time was exceeded
  • 14:23:51 RequestTimeoutException: The maximum execution time was exceeded
  • 14:23:55 PHP Fatal error: Cannot declare class Wikimedia\MWConfig\Profiler
  • 14:24:58 PHP Fatal error: Cannot declare class Wikimedia\MWConfig\Profiler
  • 14:24:01 PHP Fatal error: Cannot declare class Wikimedia\MWConfig\Profiler
  • 14:57 RequestTimeoutException: The maximum execution time was exceeded

Yet, there are no relevant entries in the SAL around 14:20-14:30, and in the above graph also no change in opcache levels.

This is similar to a number of other incidents we attributed to opcache over the past two years, in that they do not correspond to a scheduled opcache "anticipatory" reset by us, nor do they correspond to opcache reaching a threshold and performing its own unattended reset. I believe we previously still believed it was an opcache revalidation issue and blamed it on a behaviour or storage model difference associated to the revalidation flag being enabled at all. But, this is something we can now rule out.

On top of that, I should note that the reported prod error here is (much like T254209) of the most absurd category I can imagine.

The very first PHP file we execute, is PhpAutoPrepend. There is nothing before it, and we do start cleanly given we're not on php7.4 yet, and even if we were, we don't use php-preload or other VM snapshots yet. And the very first statement in PhpAutoPrepend is require_once Profiler.php. Putting aside the fact that require_once is something you can safely do from multiple place (it'll only do it "once"), we in fact only include this class from this file, and it's the first statement in the entire PHP process.

In Profiler.php, the first statement is to declare the Profiler class. And it's there that we immediately fatal. This is before mediawiki, before wmf-config, before multiversion. It's the first statement in the first file and it fatals with PHP claiming the class is already declared.

Krinkle renamed this task from Brief spikes of fatals "Cannot declare class" to Spike of impossible "Cannot declare class" fatal errors.Jun 30 2022, 8:30 PM
Krinkle renamed this task from Spike of impossible "Cannot declare class" fatal errors to Spike of impossible "Cannot declare class" fatal errors (opcache).Aug 25 2022, 3:55 PM