Page MenuHomePhabricator

Make EchoNotification job JSON-serializable
Closed, ResolvedPublic

Description

The EchoNotificationJob contains non-scalar value in it's parameters, the event property is an instance of EchoEvent, thus it's serialization relies on the redis-base JobQueue ability to use PHP serialization.

The kafka-based queue deliberately doesn't support PHP serialization and requires all the jobs to be JSON-serializable per T161647, so the EchoNotificationJob job should be changed and made JSON-serializable.

Event Timeline

Pchelolo created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
fdans raised the priority of this task from Medium to Needs Triage.Apr 26 2018, 4:33 PM
fdans moved this task from Incoming to Radar on the Analytics board.
Milimetric triaged this task as Medium priority.Apr 30 2018, 4:50 PM
Milimetric subscribed.

sorry - reverting accidental change of priority

jmatazzoni renamed this task from Make EchoNotification job JSON-serialiable to Make EchoNotification job JSON-serializable .May 1 2018, 6:09 PM

Change 431831 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/Echo@master] Make NotificationJob json-serializable

https://gerrit.wikimedia.org/r/431831

Change 431831 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Make NotificationJob json-serializable

https://gerrit.wikimedia.org/r/431831

Change 432574 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/Echo@master] Remove NotificationJob backward compatibility

https://gerrit.wikimedia.org/r/432574

Let's keep it open until this hits production. @SBisson could you SWAT the change to speed up the process?

Change 433441 had a related patch set uploaded (by 20after4; owner: 20after4):
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.4] Revert "Make NotificationJob json-serializable"

https://gerrit.wikimedia.org/r/433441

Change 433441 merged by jenkins-bot:
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.4] Revert "Make NotificationJob json-serializable"

https://gerrit.wikimedia.org/r/433441

Change 433670 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/Echo@master] Use static newFromID instead of loadFromID

https://gerrit.wikimedia.org/r/433670

Change 433670 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Use static newFromID instead of loadFromID

https://gerrit.wikimedia.org/r/433670

This still didn't quite help. Although the event property is not there anymore, the job is still not serializable, because masterPos property contains an instance of Wikimedia\Rdbms\MySQLMasterPos class.

Pchelolo raised the priority of this task from Medium to High.Jun 13 2018, 12:23 PM

I've raised the priority to HIGH since this is the only job left on the old queue and it's blocking a bunch of work related to decommissioning Redis instances, stopping the jobrunner services etc.

After a while of digging through the Echo code, here's what I've found: the notification job includes an instance of DBMasterPos for both main and echo DB, and then before execution, it waits for replication to the master position at the time of submitting the job. This is done to avoid cases where the JobQueue executed the event so quickly that replicas still couldn't catch up, so the job would fail.

The only implementation of the DBMasterPos we have right now in core, MySQLDBMasterPos is quite easily serializable into JSON, but we obviously can not assume that the DBMasterPos is MySQL at the time of deserialization, so making it properly serialize/deserialize to/from JSON requires quite a lot of changes to really high-level MediaWiki interface, for example IDatabase, which is not a great idea for such a small use-case.

Alternatively, we probably could wait for master position at the time of executing the event, but that's not quite nice since we would need to open a possibly unnecessary connection to the master, and might wait a bit longer.

Another possibility is to remove this logic altogether and rely on job queue retry semantics if a particular job fails because of the replication lag. Although, that would add some delays in case the failure happens.

Any ideas what the best course of action would be here? I lack product knowledge for Echo, how bad would a slight delay in notification delivery be?

[...]
Any ideas what the best course of action would be here? I lack product knowledge for Echo, how bad would a slight delay in notification delivery be?

I think you've described accurately what's happening and why.

Echo events can be delivered immediately or using the job queue (this is configurable per event type). Adding a slight delay in the latter case shouldn't cause any problem since there is no expectation that they will arrive quickly to start with.

Alternatively, we probably could wait for master position at the time of executing the event, but that's not quite nice since we would need to open a possibly unnecessary connection to the master, and might wait a bit longer.

I think this is the less bad option. I'd be willing to try it and see if anything goes wrong in the DB metrics (I assume we have those). It's fairly easy to try for a few days and re-evaluate but I'll let @Catrope make the call.

Change 440894 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/extensions/Echo@master] Remove masterPos from the job specification.

https://gerrit.wikimedia.org/r/440894

Change 441818 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.8] Remove masterPos from the job specification.

https://gerrit.wikimedia.org/r/441818

Change 440894 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Remove masterPos from the job specification.

https://gerrit.wikimedia.org/r/440894

Change 441818 merged by jenkins-bot:
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.8] Remove masterPos from the job specification.

https://gerrit.wikimedia.org/r/441818

Mentioned in SAL (#wikimedia-operations) [2018-06-25T13:34:41Z] <hashar@deploy1001> Synchronized php-1.32.0-wmf.8/extensions/Echo: Remove masterPos from the job specification - T192945 (duration: 00m 58s)

Change 441868 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.999] Remove masterPos from the job specification.

https://gerrit.wikimedia.org/r/441868

Change 441868 merged by jenkins-bot:
[mediawiki/extensions/Echo@wmf/1.32.0-wmf.999] Remove masterPos from the job specification.

https://gerrit.wikimedia.org/r/441868

Mentioned in SAL (#wikimedia-operations) [2018-06-25T14:17:13Z] <hashar@deploy1001> Synchronized php-1.32.0-wmf.999/extensions/Echo: Remove masterPos from the job specification - T192945 (duration: 00m 59s)

Vvjjkkii renamed this task from Make EchoNotification job JSON-serializable to qbeaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed SBisson as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: SBisson; removed: gerritbot, Aklapper.
Mainframe98 renamed this task from qbeaaaaaaa to Make EchoNotification job JSON-serializable.Jul 1 2018, 7:54 AM
Mainframe98 closed this task as Resolved.
Mainframe98 assigned this task to SBisson.
Mainframe98 edited subscribers, added: gerritbot, Aklapper; removed: SBisson.

Change 432574 abandoned by Sbisson:
Remove NotificationJob backward compatibility

https://gerrit.wikimedia.org/r/432574

CommunityTechBot renamed this task from Make EchoNotification job JSON-serializable to Make EchoNotification job JSON-serializable .Jul 3 2018, 3:28 AM