Checker: Cannot fetch ids from index
Closed, ResolvedPublicPRODUCTION ERROR
Actions

Assigned To

Authored By

	Krinkle
	Mar 21 2019, 8:45 PM

Description

Error

Request ID: 3638258ddb473017d85b8a28

message

[{exception_id}] {exception_url}   Exception from line 369 of /srv/mediawiki/php-1.33.0-wmf.22/extensions/CirrusSearch/includes/Sanity/Checker.php: Cannot fetch ids from index

trace

#0 /srv/mediawiki/php-1.33.0-wmf.22/extensions/CirrusSearch/includes/Sanity/Checker.php(122): CirrusSearch\Sanity\Checker->loadPagesFromIndex(array)
#1 /srv/mediawiki/php-1.33.0-wmf.22/extensions/CirrusSearch/includes/Job/CheckerJob.php(214): CirrusSearch\Sanity\Checker->check(array)
#2 /srv/mediawiki/php-1.33.0-wmf.22/extensions/CirrusSearch/includes/Job/Job.php(100): CirrusSearch\Job\CheckerJob->doJob()
#3 /srv/mediawiki/php-1.33.0-wmf.22/extensions/EventBus/includes/JobExecutor.php(65): CirrusSearch\Job\Job->run()
#4 /srv/mediawiki/rpc/RunSingleJob.php(77): JobExecutor->execute(array)

Impact

Uncertain. I'm not familiar with what the CirrussSearch CheckerJob does. But I assume that from the fatal error, it means the job is skipped, which usually means that in part or in whole the intended work is not being performed.

Notes

Seen for at least 30 days. Oldest currently available records show it on 1.33-wmf.18.

Details

	Subject	Repo	Branch	Lines +/-
	Allow retries for CheckerJob	mediawiki/extensions/CirrusSearch	master	+16 -11

Customize query in gerrit

Event Timeline

Krinkle created this task.Mar 21 2019, 8:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 21 2019, 8:45 PM

Krinkle added a project: CirrusSearch.Mar 21 2019, 8:45 PM

Restricted Application added a project: Discovery-Search. · View Herald TranscriptMar 21 2019, 8:45 PM

debt triaged this task as Medium priority.Mar 28 2019, 5:12 PM

debt moved this task from needs triage to elastic / cirrus on the Discovery-Search board.

Not sure what to do here, the errors are legit since Elasticsearch was unreachable, should this more explicit in the error message?

Additionally this error is logged twice by the EventBus JobExecutor:

once when calling MWExceptionHandler::rollbackMasterChangesAndLog( $e );
and a second time with the Exception executing job prefix at https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventBus/+/master/includes/JobExecutor.php#95

This Job had retries disabled, I'll switch allowRetries() to true to circumvent this scenario (even if losing this job is not a problem since it's a verification process that is constantly restarted).

Change 500965 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Allow retries for CheckerJob

https://gerrit.wikimedia.org/r/500965

gerritbot added a project: Patch-For-Review.Apr 3 2019, 2:17 PM

dcausse moved this task from elastic / cirrus to Current work on the Discovery-Search board.Apr 3 2019, 2:20 PM

dcausse edited projects, added Discovery-Search (Current work); removed Discovery-Search.

@dcausse If this is something that can happen under normal operation and doesn't require an improvement to the code to prevent or do something with, then it probably should not be a top-level uncaught exception.

It should probably instead be caught in the job and the job marked as success. Possibly with an info/warning message logged in a CirussSearch-specific channel, if it is something you want to be able to find in Logstash. The current exception is meant to be indicative of a JobRunner or MediaWiki-level problem, and is used as such to inform automatic rollbacks during deployments and SRE pages about MediaWiki availability.

Awesome! I didn't realise see the patch does exactly that… :)

Change 500965 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Allow retries for CheckerJob

https://gerrit.wikimedia.org/r/500965

ReleaseTaggerBot added a project: MW-1.33-notes (1.33.0-wmf.25; 2019-04-09).Apr 8 2019, 4:02 PM

EBernhardson moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.Apr 9 2019, 3:15 PM

debt closed this task as Resolved.Apr 15 2019, 6:00 PM

debt claimed this task.

• mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM

Maintenance_bot removed a project: Patch-For-Review.Aug 28 2019, 11:38 PM

Exception from CirrusSearch/Sanity/Checker: Cannot fetch ids from indexClosed, ResolvedPublicPRODUCTION ERRORActions