Page MenuHomePhabricator

extensions/CirrusSearch/includes/Sanity/Checker.php:369 Cannot fetch ids from index
Closed, ResolvedPublicPRODUCTION ERROR

Description

id
AWqeP542m2VjIW06Z09x
trace
#0 /srv/mediawiki/php-1.34.0-wmf.4/extensions/CirrusSearch/includes/Sanity/Checker.php(122): CirrusSearch\Sanity\Checker->loadPagesFromIndex(array)
#1 /srv/mediawiki/php-1.34.0-wmf.4/extensions/CirrusSearch/includes/Job/CheckerJob.php(217): CirrusSearch\Sanity\Checker->check(array)
#2 /srv/mediawiki/php-1.34.0-wmf.4/extensions/CirrusSearch/includes/Job/Job.php(100): CirrusSearch\Job\CheckerJob->doJob()
#3 /srv/mediawiki/php-1.34.0-wmf.4/extensions/EventBus/includes/JobExecutor.php(66): CirrusSearch\Job\Job->run()
#4 /srv/mediawiki/rpc/RunSingleJob.php(77): JobExecutor->execute(array)
#5 {main}

Impact

Over 12,000 failed JobQueue jobs per day, logged as "ERROR" severity in the production exception channel.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Today I have seen some alarms firing for mediawiki exceptions due to this error :)

Krinkle updated the task description. (Show Details)
Krinkle triaged this task as High priority.EditedMay 19 2019, 9:42 AM
Krinkle subscribed.

Seen since 1.34-wmf.1 in the Logstash. Tentatively triaging as High priority due to it being one of the top 5 most frequent production errors, which is making is making it hard to reliably detect new regressions that are less frequent than this one.

If these jobs are not required to succeed (e.g. they just try something and that's it, no further action to be taken), then the Job class should presumably catch and ignore all exceptions and still return true.

If the internal failure rate is of interest to the maintainers, one could consider a Statsd metric or INFO-severity message in its stead.

Change 513585 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Don't spam the logs with errors from saneitizer jobs when elastic is down

https://gerrit.wikimedia.org/r/513585

Change 513585 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Don't spam the logs with errors from saneitizer jobs when elastic is down

https://gerrit.wikimedia.org/r/513585

Krinkle closed this task as Resolved.EditedJun 17 2019, 6:33 PM

Confirmed fixed in prod on 1.34-wmf.8:

capture.png (856×2 px, 69 KB)

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM