Page MenuHomePhabricator

Exception executing job: EchoNotificationJob "No EchoEvent found with ID ..."
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

Request ID: W6K7VQrAEHAAAKn0538AAADW

message
Exception executing job: EchoNotificationJob ##### eventId=809751 requestId=W6K7VQrAEHAAAKn0538AAADW : MWException: No EchoEvent found with ID: 809751
stacktrace
0 /srv/mediawiki/php-1.32.0-wmf.22/extensions/Echo/includes/model/Event.php(339): EchoEventMapper->fetchById(integer, boolean)
#1 /srv/mediawiki/php-1.32.0-wmf.22/extensions/Echo/includes/model/Event.php(382): EchoEvent->loadFromID(integer)
#2 /srv/mediawiki/php-1.32.0-wmf.22/extensions/Echo/includes/jobs/NotificationJob.php(13): EchoEvent::newFromID(integer)
#3 /srv/mediawiki/php-1.32.0-wmf.22/extensions/EventBus/includes/JobExecutor.php(64): EchoNotificationJob->run()
#4 /srv/mediawiki/rpc/RunSingleJob.php(67): JobExecutor->execute(array)
exception
[{exception_id}] {exception_url}   MWException from line 55 of /srv/mediawiki/php-1.32.0-wmf.22/extensions/Echo/includes/mapper/EventMapper.php: No EchoEvent found with ID: 809751

Notes

Seems to only affect www.mediawiki.org. Filing retroactively given it has recorded from previous branches as well, going back to 1.32.0-wmf.18.

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

(Still seen on 1.32.0-wmf.24)

For reference, the code in the job goes like this:

MWEchoDbFactory::newFromDefault()->waitForSlaves();
$event = EchoEvent::newFromID( $this->eventId );
EchoNotificationController::notify( $event, false );

with the exception being thrown by the second line.

I have verified a handful of event_id from occurrences of this exception in logstash and they all exist in the database. Also, many of them have associated echo_notification rows so it look like the job eventually works but it spams the logs.

waitForSlaves() in the first line is clearly not working. I think LogBalancer->getMasterPos() is returning false since no connection is open at this point of the job execution.

EchoEventMapper->fetchById() has a "auto-retry to master" feature that doesn't seem to be kicking in (we would see re-entrance to the same function in the stack trace). It is controlled by EchoDbFactory->canRetryMaster(), which relies on LoadBalancer->hasOrMadeRecentMasterChanges().

My guess is that since a new LoadBalancer is instantiated in the job, it doesn't know about the recent insertion of the EchoEvent. While those operations are part of the same conceptual action, they are technically disconnected.

I think the way to fix this is to change the implementation of canRetryMaster() to be true when running from a job (it was previously changed to ensure it never kicks in from a GET request). I don't know if there's something like wfIsJob() or if wfIsCLI() would work?

We could also waitForSlaves in a deferred update before enqueuing the job. I don't know if there are undesirable consequences but it might work with minimal efforts.

Change 503404 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/Echo@master] NotificationJob: make sure we retry to load the event from master

https://gerrit.wikimedia.org/r/503404

Change 503404 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] NotificationJob: make sure we retry to load the event from master

https://gerrit.wikimedia.org/r/503404

Change 530701 had a related patch set uploaded (by Paladox; owner: Sbisson):
[mediawiki/extensions/Echo@REL1_33] NotificationJob: make sure we retry to load the event from master

https://gerrit.wikimedia.org/r/530701

Change 530701 merged by jenkins-bot:
[mediawiki/extensions/Echo@REL1_33] NotificationJob: make sure we retry to load the event from master

https://gerrit.wikimedia.org/r/530701

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM
brennen added a project: User-brennen.
brennen subscribed.

Just noticed 3 of these in 1.35.0-wmf.38 - all for ID 1173394:

message
No EchoEvent found with ID: 1173394
stacktrace
#0 /srv/mediawiki/php-1.35.0-wmf.38/extensions/Echo/includes/jobs/NotificationJob.php(11): EchoEventMapper->fetchById(integer, boolean)
#1 /srv/mediawiki/php-1.35.0-wmf.38/extensions/EventBus/includes/JobExecutor.php(79): EchoNotificationJob->run()
#2 /srv/mediawiki/rpc/RunSingleJob.php(76): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#3 {main}
exception
[{exception_id}] {exception_url}   MWException from line 51 of /srv/mediawiki/php-1.35.0-wmf.38/extensions/Echo/includes/mapper/EventMapper.php: No EchoEvent found with ID: 1173394
Etonkovidova subscribed.

I did not see such errors in logstash for wmf.39/40, but I cannot be 100% sure that I was queuing correctly for the error. Since @brennen has moved the task to [[ https://phabricator.wikimedia.org/project/board/4463/ | Logs/Train ]]to monitor, I'm moving it to Growth team External.

Did various additional substring queries based on the exception defined in the Echo code and there are no matches for at least 30 days. LGTM.