Page MenuHomePhabricator

[BETA] Two push notification request jobs are executed for a single event
Closed, ResolvedPublic

Description

On the Beta Cluster, two EchoPushNotificationRequest jobs are executed to handle a single notification. This does not happen locally, so it may be a Beta Cluster configuration issue rather than a bug in the code.

2020-06-18 13:52:23 [XutxlqwQBHcAAGHNyrIAAAAI] deployment-jobrunner03 enwiki 1.35.0-alpha JobExecutor DEBUG: Beginning job execution {"job":"EchoPushNotificationRequest Special: centralId=28524 eventId=870366 eventType=edit-user-talk agent=29683 requestId=XutxlqwQBHcAAGHNyrIAAAAI","job_type":"EchoPushNotificationRequest"} 
2020-06-18 13:52:24 [XutxlqwQBHcAAGHNyrIAAAAI] deployment-jobrunner03 enwiki 1.35.0-alpha JobExecutor DEBUG: Beginning job execution {"job":"EchoPushNotificationRequest Special: centralId=28524 eventId=870366 eventType=edit-user-talk agent=29683 requestId=XutxlqwQBHcAAGHNyrIAAAAI","job_type":"EchoPushNotificationRequest"} 
2020-06-18 13:52:24 [XutxlqwQBHcAAGHNyrIAAAAI] deployment-jobrunner03 enwiki 1.35.0-alpha JobExecutor INFO: Finished job execution {"job":"EchoPushNotificationRequest Special: centralId=28524 eventId=870366 eventType=edit-user-talk agent=29683 requestId=XutxlqwQBHcAAGHNyrIAAAAI","job_type":"EchoPushNotificationRequest","job_status":true,"job_duration":0.4562199115753174} 
2020-06-18 13:52:24 [XutxlqwQBHcAAGHNyrIAAAAI] deployment-jobrunner03 enwiki 1.35.0-alpha JobExecutor INFO: Finished job execution {"job":"EchoPushNotificationRequest Special: centralId=28524 eventId=870366 eventType=edit-user-talk agent=29683 requestId=XutxlqwQBHcAAGHNyrIAAAAI","job_type":"EchoPushNotificationRequest","job_status":true,"job_duration":0.3405311107635498}
mholloway-shell@deployment-mwmaint01:~$ mwscript sql.php enwiki
> select * from echo_notification where notification_event = 870366;
stdClass Object
(
    [notification_event] => 870366
    [notification_user] => 14656
    [notification_timestamp] => 20200618135223
    [notification_read_timestamp] => 
    [notification_bundle_hash] => d3a30db6d760342e83adfcf854c3848e
)
>

Event Timeline

Mholloway renamed this task from [BETA] Push notification request jobs are enqueued twice for a single event to [BETA] Two push notification request jobs are enqueued for a single event.Jun 10 2020, 7:51 PM
Mholloway renamed this task from [BETA] Two push notification request jobs are enqueued for a single event to [BETA] Two push notification request jobs are executed for a single event.Jun 10 2020, 7:54 PM
Mholloway updated the task description. (Show Details)

FWIW, looking at JobExecutor.log on deployment-flourine02, this behavior appears to be quite common across all job types.

As best I can tell, this is NOT an issue in production. I'll leave it open with low priority for now.

What indicates that these are the "same" job?

If we know they are the same, do we know if one failed and the other is a re-try from ChangeProp?

Change 606251 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/Echo@master] Push: Add additional job params for logging

https://gerrit.wikimedia.org/r/606251

Change 606251 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Push: Add additional job params for logging

https://gerrit.wikimedia.org/r/606251

I've updated the description with additional evidence that the same notification is being handled twice. I don't think this is a failure and retry, because both requests succeed and result in the client application receiving a message.

@Pchelolo I notice that there are two cpjobqueue instances in the Beta Cluster (deployment-cpjobqueue and deployment-docker-cpjobqueue01) and the service appears to be active in both. Are both of them reading and processing every job event? That could explain what's going on here.

This is an artifact of the transition to k8s. @hnowlan should remove the old one.

I've shut off the older instances of change-propagation and cpjobqueue which were most likely causing this. Please let me know if it has stopped occurring

Mholloway claimed this task.

Yep! Thanks, @Pchelolo and @hnowlan!