Page MenuHomePhabricator

Fatal from CirrusSearch\Job\LinksUpdate: "Call to a member function getLogVariables() on null"
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

Request URL:
Request ID: INSERT_ID

message
Fatal Error:
Call to a member function getLogVariables() on null
trace
#0 /srv/mediawiki/php-1.34.0-wmf.13/extensions/CirrusSearch/includes/BuildDocument/RedirectsAndIncomingLinks.php(178): CirrusSearch\ElasticsearchIntermediary->failure(Elastica\Exception\RuntimeException)
#1 /srv/mediawiki/php-1.34.0-wmf.13/extensions/CirrusSearch/includes/BuildDocument/RedirectsAndIncomingLinks.php(95): CirrusSearch\BuildDocument\RedirectsAndIncomingLinks->realFinishBatch(array)
#2 /srv/mediawiki/php-1.34.0-wmf.13/includes/Hooks.php(174): CirrusSearch\BuildDocument\RedirectsAndIncomingLinks::finishBatch(array)
#3 /srv/mediawiki/php-1.34.0-wmf.13/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#4 /srv/mediawiki/php-1.34.0-wmf.13/extensions/CirrusSearch/includes/Updater.php(475): Hooks::run(string, array)
#5 /srv/mediawiki/php-1.34.0-wmf.13/extensions/CirrusSearch/includes/Updater.php(236): CirrusSearch\Updater->buildDocumentsForPages(array, integer)
#6 /srv/mediawiki/php-1.34.0-wmf.13/extensions/CirrusSearch/includes/Updater.php(114): CirrusSearch\Updater->updatePages(array, integer)
#7 /srv/mediawiki/php-1.34.0-wmf.13/extensions/CirrusSearch/includes/Job/LinksUpdate.php(51): CirrusSearch\Updater->updateFromTitle(Title)
#8 /srv/mediawiki/php-1.34.0-wmf.13/extensions/CirrusSearch/includes/Job/JobTraits.php(137): CirrusSearch\Job\LinksUpdate->doJob()
#9 /srv/mediawiki/php-1.34.0-wmf.13/extensions/EventBus/includes/JobExecutor.php(64): CirrusSearch\Job\CirrusTitleJob->run()
#10 /srv/mediawiki/rpc/RunSingleJob.php(76): JobExecutor->execute(array)

Impact

(To be determined.)

Presumably some pages are not having their search index updated and thus may be stale (with no known correction or retry).

Notes

From Logstash:

  • Started yesterday (2019-07-09 14:31 UTC).
  • Affecting both 1.34-wmf.11 and 1.34-wmf.13.
  • Only seen from PHP 7.2 processes.

Event Timeline

  • Only seen from PHP 7.2 processes.

Ping T219148, @Pchelolo, @jijiki.

Given "Affecting both 1.34-wmf.11 and 1.34-wmf.13." can we move forward with the train without an explicit fix for this? Or should we wait?

The error here will only block a class of secondary numerical data updates, the primary content updates still get pushed into elasticsearch. I've not found the root cause for why this is null in 7.2, but will look into it more. I don't think this needs to block the train unless the log spam is excessive.

These messages are paired with "finishRequest called without staring a request" messages. At a minimum the method is documented as possibly returning null so the call site needs to handle it. More directly we need to figure out why it thinks no request was started.

Change 521948 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] RedirectsAndIncomingLinks: succeede or fail, but not both

https://gerrit.wikimedia.org/r/521948

Change 521948 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] RedirectsAndIncomingLinks: succeede or fail, but not both

https://gerrit.wikimedia.org/r/521948

Change 521956 had a related patch set uploaded (by Jforrester; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@wmf/1.34.0-wmf.13] RedirectsAndIncomingLinks: succeede or fail, but not both

https://gerrit.wikimedia.org/r/521956

Change 521960 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Log failing RedirectsAndIncomingLinks searches

https://gerrit.wikimedia.org/r/521960

Change 521956 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@wmf/1.34.0-wmf.13] RedirectsAndIncomingLinks: succeede or fail, but not both

https://gerrit.wikimedia.org/r/521956

Mentioned in SAL (#wikimedia-operations) [2019-07-10T23:16:29Z] <jforrester@deploy1001> Synchronized php-1.34.0-wmf.13/extensions/CirrusSearch/includes: T227691 RedirectsAndIncomingLinks: succeede or fail, but not both (duration: 01m 02s)

Change 521960 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Log failing RedirectsAndIncomingLinks searches

https://gerrit.wikimedia.org/r/521960

I've not found the root cause for why this is null in 7.2, but will look into it more. I don't think this needs to block the train unless the log spam is excessive.

Whats happening here is that hhvm uses it's own curl pool implementation, and php 7 uses an external pool via an nginx proxy that runs on the mediawiki servers. This server is giving a low volume of intermittent 504 Gateway Timeout errors. This has a connect timeout of 1s. Doing some testing, i sent 50k requests to https://search.svc.eqiad.wmnet:9243/ from mw1293.eqiad.wmnet. 2 requests tooks longer than 1s, at 1.013s and 3.831s.

It seems there is an underlying network issue to look into here about how it can take a full second for some packets to make it around the DC network.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:05 PM