Page MenuHomePhabricator

Dynamic bundle: non-bundle_base notifications need a read timestamp
Closed, ResolvedPublic

Description

Notifications that were part of a bundle that was read need a read timestamp to be set. Otherwise, they are all going to appear as unread.

Details

Related Gerrit Patches:
mediawiki/extensions/Echo : wmf/1.28.0-wmf.6Allow the primary link to set all bundled notifications as read
mediawiki/extensions/Echo : masterAllow the primary link to set all bundled notifications as read
mediawiki/extensions/Echo : masterbackfillReadBundles.php: Use sub-select to target rows to update
mediawiki/extensions/Echo : masterUpdate only required rows
mediawiki/core : masterAdd column alias support in BatchRowIterator
mediawiki/extensions/Echo : masterMark bundled notifications as read
mediawiki/extensions/Echo : masterMaintenance: Backfill read timestamp on bundled notifications

Event Timeline

SBisson created this task.May 26 2016, 10:34 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 26 2016, 10:34 PM

Here's one way to do it:

  1. Change the current code to set the read timestamp on all elements of a bundle when marking as read (so we stop creating data that needs to be migrated)
  2. Set read_timestamp = bogus date on all bundle_base = 0 (because it's quick)
  3. (here the new dynamic bundling code can be merged and deployed)
  4. Loop through all read bundle bases, set their bundled notification's read timestamp to the base read timestamp (slow but it's the right data)

Change 292359 had a related patch set uploaded (by Sbisson):
Mark bundled notifications as read

https://gerrit.wikimedia.org/r/292359

Change 292365 had a related patch set uploaded (by Sbisson):
Maintenance: Backfill read timestamp on bundled notifications

https://gerrit.wikimedia.org/r/292365

  1. Change the current code to set the read timestamp on all elements of a bundle when marking as read (so we stop creating data that needs to be migrated)

https://gerrit.wikimedia.org/r/292359
This would need to be deployed to all wikis before moving on to the next step.

  1. Set read_timestamp = bogus date on all bundle_base = 0 (because it's quick)
  2. (here the new dynamic bundling code can be merged and deployed)

Not sure if this can be as quick as I was expecting. We need to only mark the notifications for which the bundle base is read. That's similar to the script in https://gerrit.wikimedia.org/r/292365. WE may be able to do steps 1 and 4 only.

  1. Loop through all read bundle bases, set their bundled notification's read timestamp to the base read timestamp (slow but it's the right data)

https://gerrit.wikimedia.org/r/292365

Change 292365 merged by jenkins-bot:
Maintenance: Backfill read timestamp on bundled notifications

https://gerrit.wikimedia.org/r/292365

Change 292359 merged by jenkins-bot:
Mark bundled notifications as read

https://gerrit.wikimedia.org/r/292359

As discussed on IRC, the maintenance script needs some changes to work more efficiently. As written, it issues many UPDATE queries that will change zero rows (because most base=1 rows do not have any base=0 rows associated with them), and each of those UPDATE queries involves a full table scan. What the script should do instead is query/compute a list of (user, event_id) pairs that need updates, because that combination uniquely identifies a notification row and UPDATE queries based on that will execute quickly.

Change 292550 had a related patch set uploaded (by Sbisson):
Add column alias support in BatchRowIterator

https://gerrit.wikimedia.org/r/292550

Change 292551 had a related patch set uploaded (by Sbisson):
Update only required rows

https://gerrit.wikimedia.org/r/292551

SBisson claimed this task.Jun 3 2016, 5:30 PM

Change 292550 merged by jenkins-bot:
Add column alias support in BatchRowIterator

https://gerrit.wikimedia.org/r/292550

Change 293100 had a related patch set uploaded (by Sbisson):
[WIP] Use usb-select to target rows to update

https://gerrit.wikimedia.org/r/293100

Change 292551 abandoned by Catrope:
Update only required rows

Reason:
Superseded by https://gerrit.wikimedia.org/r/#/c/293100/

https://gerrit.wikimedia.org/r/292551

Change 293100 merged by jenkins-bot:
backfillReadBundles.php: Use sub-select to target rows to update

https://gerrit.wikimedia.org/r/293100

SBisson reassigned this task from SBisson to Catrope.Jun 7 2016, 2:39 PM

Both patches have been merged.

Next step is to run the backfill script.

testwiki (14K notfiication rows) took <10s to update 2K rows. test2wiki (62K notification rows) took 14s to update 50K. mediawikiwiki (2.5M notification rows) took 2m50s to update 200K.

enwiki is not that much bigger than mediawikiwiki in this respect, it has about 14M rows of which about 65K need updating. So it sounds like it shouldn't take longer than half an hour to run this on enwiki, and no more than a few hours to run on all wikis.

The bad news is that the verification query (P3215#15008) still returns a nonzero number of rows on testwiki and test2wiki (I didn't check mediawikiwiki because it's big). But rerunning the script quickly prints "Updated 0 notifications".

The bad news is that the verification query (P3215#15008) still returns a nonzero number of rows on testwiki and test2wiki (I didn't check mediawikiwiki because it's big). But rerunning the script quickly prints "Updated 0 notifications".

This seems to be a false positive: inspecting the results does show what appear to be multiple different bundles (belonging to different users) with the same hash and display hash.

inspecting the results does show what appear to be multiple different bundles (belonging to different users) with the same hash and display hash.

I'm guessing this could happen when multiple users are notified of the same series of events.

I ran this on enwiki on Friday and it took about 8 hours. Going to start running it on the rest of group2 now.

Change 294120 had a related patch set uploaded (by Sbisson):
Allow the primary link to set all bundled notifications as read

https://gerrit.wikimedia.org/r/294120

Change 294120 merged by jenkins-bot:
Allow the primary link to set all bundled notifications as read

https://gerrit.wikimedia.org/r/294120

Change 294339 had a related patch set uploaded (by Catrope):
Allow the primary link to set all bundled notifications as read

https://gerrit.wikimedia.org/r/294339

Change 294339 merged by jenkins-bot:
Allow the primary link to set all bundled notifications as read

https://gerrit.wikimedia.org/r/294339

Mentioned in SAL [2016-06-14T23:15:22Z] <ori@tin> Synchronized php-1.28.0-wmf.6/extensions/Echo: If07369cb1: Allow the primary link to set all bundled notifications as read (T136368) (duration: 00m 34s)

Mentioned in SAL [2016-06-21T12:49:35Z] <RoanKattouw> Running extensions/Echo/maintenance/backfillReadBundles.php on all Echo-enabled wikis for T136368

Catrope closed this task as Resolved.EditedJun 21 2016, 1:09 PM

Mentioned in SAL [2016-06-21T12:49:35Z] <RoanKattouw> Running extensions/Echo/maintenance/backfillReadBundles.php on all Echo-enabled wikis for T136368

Done. Only took 16 minutes :) (to be fair, it was a rerun, we first ran it on June 13)