Page MenuHomePhabricator

logspam: ReindexTask.php causing a bunch of Undefined index notices
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

MediaWiki version: 1.35.0-wmf.25

message
PHP Notice: Undefined index: requests_per_second


ErrorException from line 150 of /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Elastica/ReindexTask.php: PHP Notice: Undefined index: total

ErrorException from line 151 of /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Elastica/ReindexTask.php: PHP Notice: Undefined index: updated

...

ErrorException from line 158 of /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Elastica/ReindexTask.php: PHP Notice: Undefined index: retries

ErrorException from line 159 of /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Elastica/ReindexTask.php: PHP Notice: Undefined index: throttled_millis

Impact

Logspam in production

Notes

This is from a maintenance job so it's easy to filter out with kibana, nonetheless, it should be fixed.

Details

Request ID
fec6b9ec0e41045065ce4fbf
Request URL
n/a
Stack Trace
exception.trace
#0 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Elastica/ReindexTask.php(161): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Elastica/ReindexTask.php(116): CirrusSearch\Elastica\ReindexTask->mergeStatusWithChildren(array, array)
#2 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Maintenance/Reindexer.php(458): CirrusSearch\Elastica\ReindexTask->getStatus()
#3 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Maintenance/Reindexer.php(190): CirrusSearch\Maintenance\Reindexer->monitorReindexTask(CirrusSearch\Elastica\ReindexTask, Elastica\Type)
#4 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Maintenance/Validators/SpecificAliasValidator.php(94): CirrusSearch\Maintenance\Reindexer->reindex(NULL, integer, integer, double)
#5 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/includes/Maintenance/Validators/IndexAliasValidator.php(98): CirrusSearch\Maintenance\Validators\SpecificAliasValidator->updateIndices(array, array)
#6 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(458): CirrusSearch\Maintenance\Validators\IndexAliasValidator->validate()
#7 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(411): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->validateSpecificAlias()
#8 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(269): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->validateAlias()
#9 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(61): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->execute()
#10 /srv/mediawiki/php-1.35.0-wmf.25/maintenance/doMaintenance.php(99): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute()
#11 /srv/mediawiki/php-1.35.0-wmf.25/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(70): require_once(string)
#12 /srv/mediawiki/multiversion/MWScript.php(101): require_once(string)
#13 {main}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

appears to be coming from this job running on mwmaint1002:

USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
mstyles pts/10 tmux(110183).%0 18Feb20 36days 15.57s 0.00s /bin/bash /usr/local/bin/mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki wikidatawiki --cluster cloudelastic

This is two thirds of all logspam in production right now; maybe raise this up from Low?

mmodell raised the priority of this task from Low to Medium.EditedMar 26 2020, 8:29 PM

The curious thing is that the job has apparently been running for a long time (idle 36 days?) and yet this logspam appear to be new phenomena.

Probably the logspam is caused by the new version of CirrusSearch?

Krinkle raised the priority of this task from Medium to High.Apr 8 2020, 3:23 PM
Krinkle moved this task from Untriaged to Mar2020/1.35-wmf.22 on the Wikimedia-production-error board.
Krinkle added a subscriber: Krinkle.

This should have blocked the train. An undefined variable can cause a huge amount of corruption due to silent failure.

Until we know otherwise we should assume that CirrusSearch is replacing all every page's search index with empty strings with every edit it is reacting to; until its maintainers declare otherwise. And looking at it briefly, that might actually be a real possibility for the subset of pages it is affecting and maybe it explains why some of the index is out of date by a week (T249435).

Change 587607 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[mediawiki/extensions/CirrusSearch@master] reindex: Skip incomplete statuses to avoid logspam

https://gerrit.wikimedia.org/r/587607

Change 587607 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] reindex: Skip incomplete statuses to avoid logspam

https://gerrit.wikimedia.org/r/587607

This will still send a lot of logs next time we do a reindex. We should be doing one in the near future. I can leave this open until the extra logs are removed.