Multiple "You made your <milestone> edit!" notifications
Closed, ResolvedPublic

Description

I'm editing Wikidata using the Wikdata-game (via Widar) and when I visit Wikidata page, I was met by multiple notification entries of "You just made your hundredth edit". Attached is my Special:Notifications on wikidata. I also tried editing manually on Wikidata itself and still met with new entry of "You just made your hundredth edit".

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptFeb 27 2016, 5:34 AM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript

....wow.

Most likely, this had the same root cause as T123843: Making a null edit triggers a new message notification although null edit is not recorded in page history: when an article is saved, we look at your edit count, and if it's 99 (when we see your edit count, the new edit hasn't been added yet) we generate a hundredth-edit notification. If you make a series of null edits when your edit count is 99, you would probably get multiple hundredth-edit notifications.

T123843: Making a null edit triggers a new message notification although null edit is not recorded in page history is fixed in wmf15, which will be deployed to Wikidata on March 2nd, so let's wait until then and see if that fixes it.

Is this still happening, i.e. are new hundredth-edit notifications still being generated? The database tells me that User:Kenrick95 on wikidatawiki has an editcount of 121, so in theory we shouldn't be creating new notifications any more.

Kenrick95 added a comment.EditedFeb 27 2016, 7:29 AM

Yup, I tried editing another item, and the notification is still being triggered.

Luke081515 triaged this task as Unbreak Now! priority.Feb 27 2016, 1:43 PM

This is very annoying for users, so triaging this as UBN.

Luke081515 renamed this task from Editing Wikidata triggered multiple notification of hundredth edit to Multible "You made your <milestone> edit!" notifications.
Luke081515 added a subscriber: Elitre.
Elitre renamed this task from Multible "You made your <milestone> edit!" notifications to Multiple "You made your <milestone> edit!" notifications.Feb 27 2016, 7:13 PM

I'm baffled. Both the database and eval.php agree that User:Kenrick95's edit count is 238. Yet they have lots of thank-you-edit notifications whose timestamps correspond to their most recent edits on Feb 27. It doesn't seem to have stopped, either, because the timestamp of the most recent thank-you-edit notification is the same as the timestamp of the most recent edit.

No other user seems to be affected this badly. I looked at which users on wikidatawiki have had more than 3 thank-you-edit notifications in their lifetimes and found:

  • Kenrick95 with 134 notifications and edit count 238
  • Someone with 25 notifications and an edit count just over 100. This person got 10th-edit notification for most (but not all!) of their actions for about two hours, and then it stopped. An hour later they got their 100th-edit notification (only once).
  • Someone with 15 notifications and an edit count between 10 and 100. They got a whole bunch of 1st-edit notifications, and they got their 10th-edit notification twice, but in between those two 10th-edit notifications was a 1st-edit notification!
  • Two people whose edit count is below 10 who got 9 and 6 notifs respectively. These were all 1st-edit notifications
  • Two people with 4 notifications whose edit counts are between 10 and 100. One of them got four 1st-edit notifications, the other got four 10th-edit notifications.

Does Wikidata do weird things around computation of edit counts? I read the code over a few times and it seems to me that the only way this could happen is if User::getEditCount() 1) keeps returning the same value even after the user makes a bunch of edits and 2) behaves in a spectacularly non-monotonic manner in some cases. Also, are there perhaps cases where multiple page saves (ArticleSaveComplete hook invocations) happen in the same request?

Change 273836 had a related patch set uploaded (by Catrope):
Temporarily disable thank-you-edit notifications

https://gerrit.wikimedia.org/r/273836

I've also found users on enwiki who got many 1st-edit notifications (though not as many as Kendrick95, the highest numbers I saw were between 15 and 20 notifications), so this is not a Wikidata-specific issue. One of them was using The Wikipedia Adventure, so I thought it might be related to automated edits, but then I also found a user who didn't make any automated edits and still got a whole bunch of repeat 1st-edit notifications for most but not all of their edits.

This is a very strange issue and I have no idea what's going on. At the first opportunity (Monday Feb 29 at 16:00 UTC) we'll deploy a config change that disables these Nth-edit notifications so we stop spamming people while we investigate what's going on. I'm also going to go in and remove duplicate thank-you-edit notifications from the database so people like @Kenrick95 won't have a useless notifications list any more, but that's delicate work that I don't want to do on a Sunday with nobody around, so I'll do that tomorrow a few hours after the config change.

[14:11:39] <legoktm> RoanKattouw: there's been a bunch of changes to the caching model of User objects in the past 3-4 days, so I wouldn't be surprised if User::getEditCount() is broken in some way

[14:11:39] <legoktm> RoanKattouw: there's been a bunch of changes to the caching model of User objects in the past 3-4 days, so I wouldn't be surprised if User::getEditCount() is broken in some way

Oh that makes a ton of sense actually.

Change 273849 had a related patch set uploaded (by Legoktm):
Add some debug logging to thank-you-edit notifications

https://gerrit.wikimedia.org/r/273849

@Catrope You can check my account at dewikivoyage to, this affected my too: T128271. I get double notifications at vagrant too.

Change 273836 merged by jenkins-bot:
Temporarily disable thank-you-edit notifications

https://gerrit.wikimedia.org/r/273836

Change 273849 merged by jenkins-bot:
Add some debug logging to thank-you-edit notifications

https://gerrit.wikimedia.org/r/273849

Change 273981 had a related patch set uploaded (by Catrope):
Add some debug logging to thank-you-edit notifications

https://gerrit.wikimedia.org/r/273981

Change 273981 merged by jenkins-bot:
Add some debug logging to thank-you-edit notifications

https://gerrit.wikimedia.org/r/273981

I have cleaned up all duplicate thank-you-edit notifications on Wikidata, so @Kenrick95 should now have a usable notifications feed again :) . Going to clean this up on the other wikis next.

Thank you very much 😃

Catrope added a comment.EditedMar 1 2016, 1:29 AM

Cleanup finished for the other wikis as well. I messed around with temporary tables, and that approach turned out to be very slow and lock the Echo tables, resulting in lock wait timeouts and dropped events. I ended up just running the following query against all wikis, which only took half a second on enwiki (with the temporary table approach, building the tmp table took >10 mins, even though it would only contain ~960 rows and the SELECT to get those rows only took 300ms). This query is generally applicable for deleting duplicate rows that have the same type and event_extra blob.

DELETE FROM echo_event, echo_notification
USING echo_event
JOIN echo_notification ON notification_user=event_agent_id AND notification_event=event_id
LEFT JOIN (
	SELECT MIN(event_id) AS m
	FROM echo_event
	WHERE event_type='thank-you-edit'
	GROUP BY event_agent_id, event_extra
) AS t ON t.m=event_id
WHERE m IS NULL
AND event_type='thank-you-edit';
Legoktm lowered the priority of this task from Unbreak Now! to High.Mar 3 2016, 3:21 AM

Now that all wikis are on wmf15, I looked at the post-deploy logs and saw that this is still happening, but only in one special circumstance. All occurrences involved 2 or 3 repeats, at the same timestamp (i.e. in the same second) on different machines. For example:

2016-03-03 21:26:06 mw1116 commonswiki 1.27.0-wmf.15 Echo DEBUG: Thanking Norbertobobbio (id: 1677951) for their 1 edit {"user":"Norbertobobbio","id":1677951,"count":1} 
2016-03-03 21:26:06 mw1223 commonswiki 1.27.0-wmf.15 Echo DEBUG: Thanking Norbertobobbio (id: 1677951) for their 1 edit {"user":"Norbertobobbio","id":1677951,"count":1} 
2016-03-03 21:26:06 mw1203 commonswiki 1.27.0-wmf.15 Echo DEBUG: Thanking Norbertobobbio (id: 1677951) for their 1 edit {"user":"Norbertobobbio","id":1677951,"count":1}

This makes me think these are innocent race conditions, as opposed to weird scary bugs related to User object caching. Additionally, all of these were on commonswiki (except one on idwiki), which makes me suspect something specific is going on. Spot checks of the affected users on Commons show that their first contributions were uploads of multiple files using UploadWizard. The one case on idwiki involved Global Replace because of a file rename on Commons.

It seems straightforward that if you create a new account and then send two parallel API requests to make an edit, both of those requests will see an edit count of 0 and trigger a thank-you-edit notification. Right now I have no brilliant ideas about how we could deduplicate these.

It seems straightforward that if you create a new account and then send two parallel API requests to make an edit, both of those requests will see an edit count of 0 and trigger a thank-you-edit notification. Right now I have no brilliant ideas about how we could deduplicate these.

User::incEditCount() only commits once the transaction is finished, so if we *think* the user is ready for a notification at post-save time (getEditCount() + 1), we queue a deferred update which I think always run after the main transaction commits, and in the update do User::newFromId(...)->getEditCount() and check that it is equal to the value we expected, and then trigger the notification. Downside is that there might be one request in between the user seeing the notification after their save, it won't be instant anymore.

Catrope added a comment.EditedMar 4 2016, 2:36 AM

(off-topic) While writing the above, I visited Special:Contributions on idwiki to look at the one duplicate notification there (the one that turned out to be a pair of Global Rename edits), so my account got autocreated there. I then got an email about how a welcome template had been posted to my user talk page by a bot. That welcome bot is operated by ... @Kenrick95 :D

It seems straightforward that if you create a new account and then send two parallel API requests to make an edit, both of those requests will see an edit count of 0 and trigger a thank-you-edit notification. Right now I have no brilliant ideas about how we could deduplicate these.

User::incEditCount() only commits once the transaction is finished, so if we *think* the user is ready for a notification at post-save time (getEditCount() + 1), we queue a deferred update which I think always run after the main transaction commits, and in the update do User::newFromId(...)->getEditCount() and check that it is equal to the value we expected, and then trigger the notification. Downside is that there might be one request in between the user seeing the notification after their save, it won't be instant anymore.

We already defer sending the notification, see Hooks.php line 472. We could add if ( $user->getEditCount() === $editCount ) there, but aside from the question of whether that even works (like, is the return value of getEditCount() cached in an in-process cache?), I think it would still be vulnerable to a race condition where both requests' main transactions commit before the first deferred update runs, both see an edit count of 2 and both decide not to notify.

Change 274904 had a related patch set uploaded (by Legoktm):
Try and avoid race conditions with thank-you-edit notifications

https://gerrit.wikimedia.org/r/274904

We already defer sending the notification, see Hooks.php line 472. We could add if ( $user->getEditCount() === $editCount ) there, but aside from the question of whether that even works (like, is the return value of getEditCount() cached in an in-process cache?),

It is in process cached AFAIS, but creating a new User object should work around that.

I think it would still be vulnerable to a race condition where both requests' main transactions commit before the first deferred update runs, both see an edit count of 2 and both decide not to notify.

Right...but having false negatives is better than false positives in this case right?

I think it would still be vulnerable to a race condition where both requests' main transactions commit before the first deferred update runs, both see an edit count of 2 and both decide not to notify.

Right...but having false negatives is better than false positives in this case right?

Yeah, I suppose it's not worth the effort trying to eliminate false negatives...

Change 274904 merged by jenkins-bot:
Try and avoid race conditions with thank-you-edit notifications

https://gerrit.wikimedia.org/r/274904

Change 274908 had a related patch set uploaded (by Catrope):
Try and avoid race conditions with thank-you-edit notifications

https://gerrit.wikimedia.org/r/274908

Change 274908 merged by jenkins-bot:
Try and avoid race conditions with thank-you-edit notifications

https://gerrit.wikimedia.org/r/274908

Change 274908 merged by jenkins-bot:
Try and avoid race conditions with thank-you-edit notifications

https://gerrit.wikimedia.org/r/274908

This was deployed about 20 hours ago. I looked at the logs and it looks like this change prevents all thank-you-edit notifications completely :/ . I'm going to add logging for cases where the race condition detection trips, because that appears to be every case.

Change 275898 had a related patch set uploaded (by Catrope):
Add debug logging for when the thank-you-edit code detects a race condition

https://gerrit.wikimedia.org/r/275898

Change 275898 merged by jenkins-bot:
Add debug logging for when the thank-you-edit code detects a race condition

https://gerrit.wikimedia.org/r/275898

Change 275909 had a related patch set uploaded (by Catrope):
Add debug logging for when the thank-you-edit code detects a race condition

https://gerrit.wikimedia.org/r/275909

Change 275910 had a related patch set uploaded (by Catrope):
Add debug logging for when the thank-you-edit code detects a race condition

https://gerrit.wikimedia.org/r/275910

Change 275909 merged by jenkins-bot:
Add debug logging for when the thank-you-edit code detects a race condition

https://gerrit.wikimedia.org/r/275909

Change 275910 merged by jenkins-bot:
Add debug logging for when the thank-you-edit code detects a race condition

https://gerrit.wikimedia.org/r/275910

(Update from March 8th) The plot thickens. Echo.log is full of lines like:

2016-03-10 20:18:51 mw1181 enwiki 1.27.0-wmf.15 Echo DEBUG: thank-you-edit race condition detected: REDACTED (id: NNNNNNN) should have had 10 edits but has 10 {"user":"REDACTED","id":NNNNNNN,"expectedCount":10,"actualCount":10}

This seems impossible: the logs are claiming that 10 != 10. The only thing I could think of is that the current code calls getEditCount() twice and it might return different values; to exclude that, Kunal wrote https://gerrit.wikimedia.org/r/#/c/276080/ which I'm SWATting out later today.

After deploying this change, all thank-you-edit events still get flagged as race conditions, but now the errors make sense. Everything is now shown as "should have had 1 but has 0" or "should have had 10 but has 9".

I'm really confused as to why calling getEditCount() twice mattered, given that its implementation uses an in-process cache. I can see how the edit count could be laggy though, given that it's fetched from a slave DB.

Maybe shadowing is broken in PHP and the fact that we use $user both inside and outside the closure has something to do with it? It's confusing in any case, so I think we should rename that variable.

Alternatively, is there a way to schedule a deferred update for "when my writes are readable from a slave" rather than "after my writes are done"?


00:43 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.16/extensions/Echo/Hooks.php: Try fixing the thank-you-edit bug again (duration: 00m 26s)

2016-03-11 00:42:29 mw1129 frwiki 1.27.0-wmf.16 Echo DEBUG: thank-you-edit race condition detected: REDACTED (id: NNNNNNN should have had 10 edits but has 10 {"user":
"REDACTED","id":NNNNNNN,"expectedCount":10,"actualCount":10} 
2016-03-11 00:45:11 mw1222 enwiki 1.27.0-wmf.16 Echo DEBUG: thank-you-edit race condition detected: REDACTED (id: NNNNNNN) should have had 1 edits but has 0 {"user":
"REDACTED","id":NNNNNNN,"expectedCount":1,"actualCount":0} 
2016-03-11 00:45:17 mw1213 eswiki 1.27.0-wmf.16 Echo DEBUG: thank-you-edit race condition detected: REDACTED (id: NNNNNNN) should have had 10 edits but has 9 {"user":
"REDACTED","id":NNNNNNN,"expectedCount":10,"actualCount":9}

Change 302722 had a related patch set uploaded (by Sbisson):
Keep $user->mEditCount up to date

https://gerrit.wikimedia.org/r/302722

Change 302759 had a related patch set uploaded (by Sbisson):
Prevent duplicate thank-you-edit notifications

https://gerrit.wikimedia.org/r/302759

Change 302722 merged by jenkins-bot:
Keep $user->mEditCount up to date

https://gerrit.wikimedia.org/r/302722

Change 302759 merged by jenkins-bot:
Prevent duplicate thank-you-edit notifications

https://gerrit.wikimedia.org/r/302759

Change 303957 had a related patch set uploaded (by Sbisson):
Enabling thank-you-edit on beta for testing.

https://gerrit.wikimedia.org/r/303957

Change 303957 merged by jenkins-bot:
Enabling thank-you-edit on beta for testing.

https://gerrit.wikimedia.org/r/303957

Checked in betalabs - works as expected quite limited set of 'thank-you-edit' notifications).

  • no weird cases when user_editcount ># of notifications
  • first, 10th, and 100th 'thank-you-edit' duly come in proper order and displayed in UI.
  • Echo.log shows success for all 'thank-you-edit' notifications.
2016-08-10 20:43:57 [V6uSCwpEEaoAAG49i68AAAAF] deployment-mediawiki01 enwiki 1.28.0-alpha Echo DEBUG: Thanking ET95 (id: 11836) for their 1 edit {"user":"ET95","id":11836,"count":1} 
2016-08-10 21:20:22 [V6ualApEEaoAABGRbhwAAAAL] deployment-mediawiki01 enwiki 1.28.0-alpha Echo DEBUG: Thanking ET50 (id: 7365) for their 10 edit {"user":"ET50","id":7365,"count":10} 
2016-08-10 21:43:10 [V6uf7ApEEaoAACNF30QAAAAJ] deployment-mediawiki01 enwiki 1.28.0-alpha Echo DEBUG: Thanking ET10 (id: 6503) for their 100 edit {"user":"ET10","id":6503,"count":100}

In production, all wmf.13 entries in Echo.log are race condition warnings, and all wmf.14 entries are successes. So I think we can turn this feature back on once wmf.14 finishes rolling out :)

Change 304267 had a related patch set uploaded (by Catrope):
Re-enable thank-you-edit notifications

https://gerrit.wikimedia.org/r/304267

Change 304267 merged by jenkins-bot:
Re-enable thank-you-edit notifications

https://gerrit.wikimedia.org/r/304267

Mentioned in SAL [2016-08-11T23:24:33Z] <dereckson@tin> Synchronized wmf-config/CommonSettings.php: Re-enable thank-you-edit notifications (T128249) (duration: 00m 58s)

jmatazzoni closed this task as Resolved.Aug 12 2016, 12:34 AM