Page MenuHomePhabricator

[Skins] Uncaught Error: Call to a member function isLocal() on boolean
Closed, DeclinedPublic

Description

Today during chores, I noticed a few of these errors:

PHP Fatal error:  Uncaught Error: Call to a member function isLocal() on boolean in /srv/mediawiki/php-1.35.0-wmf.28/includes/page/WikiFilePage.php:125
Stack trace:
#0 /srv/mediawiki/php-1.35.0-wmf.28/includes/skins/SkinTemplate.php(995): WikiFilePage->isLocal()
#1 /srv/mediawiki/php-1.35.0-wmf.28/includes/skins/SkinTemplate.php(459): SkinTemplate->buildContentNavigationUrls()
#2 /srv/mediawiki/php-1.35.0-wmf.28/skins/MinervaNeue/includes/skins/SkinMinerva.php(143): SkinTemplate->prepareQuickTemplate()

https://logstash.wikimedia.org/goto/6ad426cb291c181870b6f992224fffda

logstash.png (1×2 px, 426 KB)

Acceptance Criteria

  • Error does not appear in Logstash

Developer notes

  • There are about 6 reports on the 20th April. Isolated incident?
  • Update: Several more reports on May 2

Event Timeline

@nray It looks like these were captured from the internal syslog channel which is lacking a lot of valueable information you'd need to determine how this is impacting users and how to reproduce it (e.g. which wikis, which URLs, HTTP method, referer, etc.).

The link to Logstash doesn't work unfortunately as it's not the sharable link but your personal session link. I do note from the screenshot that it appears to be based on a permalink to an outdated version of the Reading Web dashboard. The latest uses type:mediawiki instead which has all the added information.

Let me know if you'd like a refresher session with myself and/or CPT on Logstash. A lot has changed in recent years. Happy to help :)

Jdlrobson renamed this task from [Minerva] Uncaught Error: Call to a member function isLocal() on boolean to [Skins] Uncaught Error: Call to a member function isLocal() on boolean.May 1 2020, 3:11 PM
Jdlrobson lowered the priority of this task from Medium to Low.
Jdlrobson updated the task description. (Show Details)
Jdlrobson edited projects, added MediaWiki-User-Interface; removed MinervaNeue.
Jdlrobson updated the task description. (Show Details)
Jdlrobson subscribed.

The error occurred in buildContentNavigationUrls inside SkinTemplate so have dropped the Minerva tag.

I assumed it was Minerva-related because the message is only seen with traces that include SkinMinerva in the callstack, or with a mobile url from a post-send stack trace (after the SkinMinerva call is unwound and thus no longer visible). There are no matches with other skins in the call stack. (I've looked for the past 30 days and found none after excluding SkinMinerva and post-send).

The actual point where it blows up is indeed in the core SkinTemplate->buildContentNavigationUrls method, which suggests that either SkinMinerva has prepared the instance in a way that core doesn't expect, or some other extension code that only runs in combination with Minerva (e.g. MobileFrontend or one of its hook) has broken the Skin object indirectly.

In SkinTemplate $page = $this->canUseWikiPage() ? $this->getWikiPage() : false;

getWikiPage comes from ContextSource::getWikiPage this calls $this->getContext()->getWikiPage();. getContext returns a RequestContext (always) so that brings us to RequestContext:: getWikiPage

RequestContext::getWikiPage will always return a WikiPage::factory or throw an exception.

The factory will either return the result of the WikiPageFactory hook or an instance of WikiFilePage, WikiPage or WikiCategoryPage depending on the title so context matters here. WikiCategoryPage and WikiPage isLocal always returns true. WikiFilePage should always return some kind of WikiPage.

The error is occuring inside WikiFilePage so something is going wrong inside WikiFilePage:loadFile and mFile is not being defined.

Looking at the function body the method should account for mFile being defined using the following method:

$this->mFile = $services->getRepoGroup()->getLocalRepo()
				->newFile( $this->mTitle ); // always a File

However the comment there is wrong. It's not always a File.

According to https://github.com/wikimedia/mediawiki/blob/master/includes/filerepo/FileRepo.php#L413

newFile @return File|null A File, or null if passed an invalid Title

There is no accounting for the situation when this is null.

how null becomes a boolean I'm not sure. It's possible the function signature here is also wrong. call_user_func either calls UnregisteredLocalFile::class, 'newFromTitle'

This feels like it might relate to the recent changes to Title in T246284 so I'm gonna cc @DannyS712 and @Ppchelko @Pchelolo

There was a recent change in 21e2d71560c by Alangi Derick @D3r1ck01 (I9437494de003f40fbe591321da7b42d16bb732d6) which could also account for this.

how null becomes a boolean I'm not sure. It's possible the function signature here is also wrong.

$mFile has been initialized as a boolean.

When mFile is set via loadFile(), the call ends with $this->mRepo = $this->mFile->getRepo(); which would not pass if mFile weren't a File. So either mFile is set to false via setFile() (doesn't happen in core,m at least, but a type hint wouldn't hurt) or it's never initialized in the first place.

Change 595500 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Typehint WikiFilePage::setFile

https://gerrit.wikimedia.org/r/595500

... or it's never initialized in the first place.

But then isLocal does call loadFile so that can't happen either.

Change 595500 merged by jenkins-bot:
[mediawiki/core@master] Typehint WikiFilePage::setFile

https://gerrit.wikimedia.org/r/595500

It appears to be passed a boolean. The question is why...

I don't see anything relevant in the logs after May 2.

Jdlrobson claimed this task.

Huh. This seems logically impossible:

  • WikiFilePage->isLocal() calls loadFile() before trying to call isLocal() on mFile (which is apparently a boolean)
  • unless mFileLoaded is true, loadFile() will load the file and then call a method on it (so we'd get a different error if this were a loading failure)
  • mFileLoaded can only be set true by loadFile (which does load a file) and setFile (which is also guaranteed to load the file, we have a typehint for that now)
  • there is nothing that could unset mFile once it is loaded

It's a bit cheap to handwave at opcache corruption, but I don't have any other theory.

I can't speak to the impossible nature of the error, but in terms of opcache-iness, there's two signs that lead me to think it isn't opcache corruption:

  • We have seen opcache cause method calls on the wrong object, typically one "nearby" within the same function. However I've not yet seen it do that on a non-object. This would be a first, to my knowledge. See T245183 for details.
  • This error was seen on multiple servers in the same hour in Eqiad, and is now still seen on several servers in Codfw. The probability of the same exact corruption happening twice in the same hour on different servers seems extremely small (that is, so far I've yet to see any corruption happen on more than one server), much smaller for it to also happen in Codfw some weeks later, on multiple servers, spread out of over three separate days.

It seems like this error isn't happening anymore, but I wanted to take a look at this code as well. For the most part, I agree with what @Tgr wrote above, but there's one possibility that hasn't been mentioned so far: loadFile() sets mFileLoaded() before attempting the lookup, which is not going to work if anything after that throws an exception that is caught somewhere earlier in the call chain. Determining which methods in there can throw an exception is a chore given the lack of @throws annotations, and so much so because dynamic callables are involved, but for the sake of simplicity:

class WikiFilePage {
   protected function loadFile() {
      if ( $this->mFileLoaded ) {
         return true;
      }
      $this->mFileLoaded = true;
      throw new MWException;
}

function doMagic() {
   try {
      $wikiFilePage->loadFile();
   } catch ( MWException $lookImSwallowingThis ) {
   }
   $wikiFilePage->loadFile(); // Already loaded, yay!
   assert( $wikiFilePage->mFile instanceof File ); // If only.
}

I see two remediations for this:

  • mFileLoaded should be set after loading the file. This is the most important thing, and not doing so indicates a clear programming error.
  • As already pointed out above, loadFile is assuming that LocalRepo::newFile will always return a File, although this is simply false. The proper fix would be changing the code to do something in case of invalid titles, but it's not clear what this should be: throw an exception? Use a fallback? Something else? In the meanwhile, we can enforce the current assumptions by throwing a hard error if mFile is still false, which would cause a failure immediately, and not later on.

Change 669835 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/core@master] Guard against null file in WikiFilePage

https://gerrit.wikimedia.org/r/669835

Recent hit:

Call to a member function isLocal() on boolean
in /srv/mediawiki/php-1.36.0-wmf.22/includes/page/WikiFilePage.php:125
trace
#0 /srv/mediawiki/php-1.36.0-wmf.22/includes/skins/SkinTemplate.php(897): WikiFilePage->isLocal()
#1 /srv/mediawiki/php-1.36.0-wmf.22/includes/skins/SkinMustache.php(198): SkinTemplate->buildContentNavigationUrls()
#2 /srv/mediawiki/php-1.36.0-wmf.22/includes/skins/SkinMustache.php(190): SkinMustache->getPortletsTemplateData()
#3 /srv/mediawiki/php-1.36.0-wmf.22/skins/MinervaNeue/includes/Skins/SkinMinerva.php(73): SkinMustache->getTemplateData()
#4 /srv/mediawiki/php-1.36.0-wmf.22/includes/skins/SkinMustache.php(128): SkinMinerva->getTemplateData()
#5 /srv/mediawiki/php-1.36.0-wmf.22/includes/skins/SkinTemplate.php(138): SkinMustache->generateHTML()
#6 /srv/mediawiki/php-1.36.0-wmf.22/includes/OutputPage.php(2626): SkinTemplate->outputPage()
#7 /srv/mediawiki/php-1.36.0-wmf.22/includes/exception/MWExceptionRenderer.php(153): OutputPage->output()
#8 /srv/mediawiki/php-1.36.0-wmf.22/includes/exception/MWExceptionRenderer.php(65): MWExceptionRenderer::reportHTML(WMFTimeoutException)
#9 /srv/mediawiki/php-1.36.0-wmf.22/includes/exception/MWExceptionHandler.php(106): MWExceptionRenderer::output(WMFTimeoutException, integer)
#10 /srv/mediawiki/php-1.36.0-wmf.22/includes/exception/MWExceptionHandler.php(185): MWExceptionHandler::report(WMFTimeoutException)
#11 /srv/mediawiki/php-1.36.0-wmf.22/includes/MediaWiki.php(581): MWExceptionHandler::handleException(WMFTimeoutException, string)
#12 /srv/mediawiki/php-1.36.0-wmf.22/index.php(53): MediaWiki->run()
#13 /srv/mediawiki/php-1.36.0-wmf.22/index.php(46): wfIndexMain()
#14 /srv/mediawiki/w/index.php(3): require(string)
#15 {main}

The trace indicates that it happened in the midst of rendering a different fatal error first. The logs indeed show that this is WMFTimeoutException from line 41 of /srv/mediawiki/wmf-config/set-time-limit.php: the execution time limit of 60 seconds was exceeded, but the rendering of that (skinned, localised) error page produced this error instead, so it had to fallback to a plaintext HTTP 500 response.

The file in question, however, has not been changed for ten years, and indeed has a valid title.

Like the previous reports, it seems to only happen in Minerva. Still not clear on why it only happens there, or what it does differently to cause or expose this issue.

The trace indicates that it happened in the midst of rendering a different fatal error first. The logs indeed show that this is WMFTimeoutException from line 41 of /srv/mediawiki/wmf-config/set-time-limit.php: the execution time limit of 60 seconds was exceeded, but the rendering of that (skinned, localised) error page produced this error instead, so it had to fallback to a plaintext HTTP 500 response.

Perhaps things went like this:

  • Call loadFile() for showing the file, as usual
  • mFileLoaded is set to true
  • Retrieving the file took to long (somewhere in RepoGroup), which caused a WMFTimeoutException to bubble up
  • The skin code then retried showing the file
  • mFileLoaded is already true, so mFile stays false
  • The next call fails

This could happen for any file.

Like the previous reports, it seems to only happen in Minerva. Still not clear on why it only happens there, or what it does differently to cause or expose this issue.

Perhaps other skins don't have a code path leading to WikiFilePage::isLocal?

Change 669835 merged by jenkins-bot:

[mediawiki/core@master] WikiFilePage: Guard against null file in loadFile()

https://gerrit.wikimedia.org/r/669835