Page MenuHomePhabricator

Investigate drop in revision_create events / apparent high rate of null edits
Closed, ResolvedPublic

Description

We have recently switched the Event-Platform extension to emit revision_create events and dropped page_edit ones. Judging by the documentation of the associated hooks, the production rate should have stayed the same, or even slightly in favour of revision_create (since it covers cases which page_edit doesn't), but the opposite happened - we are witnessing a drop in the event creation rate: the extension was producing around 20 page_edit events per second prior to the switch, against the current rate of 5 revision_create events per second. We need to investigate whether we are correctly emitting all the needed events.

Event Timeline

I've dig in MW code a bit today. The PageContentSaveComplete hook is called from two places: in doCreate and in doEdit In both places if the revision was changed the call is preceded with $revision->insertOn, which in turn calls the RevisionInsertionComplete hook. There's nothing in these functions what could possible interrupt the execution earlier than hooks are called, so judging by the code, the only difference between RevisionInsertComplete and PageContentSaveComplete is that the former is not called if the page content was not changed.

However, I'm not 100% comfortable with stating that null edits are responsible for the difference in event rate. The difference is ~10/s, and it's surprising that there's so many null edits. On the other hand, we have ~25 serenaders with unchanged HTML in RESTBase, which kind of concerted with 10 null edits per second.

I've also played with vagrant today for a while and I didn't find any circumstance when the revision_create event is not emitted.

We looked into this a bit further last Friday. The documentation for PageContentSaveComplete mentions that the $revision parameter *can* be null if the content was not changed. There is an early return in the WikimediaEvents extension that is intended to trigger in that case. However, looking at the MediaWiki side, it looks like [the $revision parameter actually ends up being defined even if the content has not changed](https://github.com/wikimedia/mediawiki/blob/996b7350f3355f24abb55394ced108c0c9b8ef3a/includes/page/WikiPage.php#L1869).

So, it looks like a part of the discrepancy can be explained by one hook counting null-edits, while the other does not. The large percentage of null edits is quite surprising, so we should verify that those are indeed the source of this issue, and if so where they are coming from.

If this theory is substantiated, then we should go through PageContentSaveComplete subscribers to see if they actually intend to include null edits, and fix those that do not. This could potentially make our save metrics more accurate, by decoupling them from the percentage of null edits.

GWicke renamed this task from Investigate drop in revision_create events to Investigate drop in revision_create events / apparent high rate of null edits.Mar 7 2016, 9:00 PM
GWicke triaged this task as Medium priority.

For RESTBase update purposes, we should look into the reasons for those null edits. In the past, automated null edits have been used by the community to work around change propagation bugs in MediaWiki. We should establish whether

a) there are still concrete issues those null edits are working around, and
b) see if those would affect RESTBase as well.

If the answer to both a) and b) is 'yes', then we could consider representing those null edits as purge requests, in line with action=purge.

@Ottomata made a query today and got 100000 edit events generated by EventLogging from a PageContentSaveComplete hook. These events contained a revisionId, userAgent and timestamp. I've filtered out events for wikipedia sites, leaving me with 64000 events. Then, for each revision I've found it's timestamp from MW API. If the timestamp of a revision was significantly different from the timestamp in the event, it's a null edit. Filtering only null edits left me with 24453 events, which is ~40% and is consistent with what we see in production. So, the experiment confirmed that the drop in event creation rate is because of the null edits.

Then I've sorted null edit user agents by number of edits, here're top 6:

  • Peachy MediaWiki Bot API Version 2.0 (alpha 8) 19536
  • Mozilla/5.0 MaboMwFramework/1.3 (w:de:MerlBot) 816
  • DotNetWikiBot/3.15 (Unix 3.2.0.75; Mono 2.10.8.1; .NET CLR 2.0.50727.1433) 789
  • members (wikipedia:en; User:Reports_bot) Pywikibot/2.0b3 (-1 (unknown)) requests/2.9.1 Python/3.4.2.final.0 604
  • php wikibot classes 190
  • Java Bot (https://github/vlsergey/Secretary; vlsergey@gmail.com) BasedOnJWBF 180

Vast majority of edits is done with Peachy MediaWiki Bot API, which is just a library, so it doesn't tell us anything about the purpose of these edits

By looking at the titles of the null-edit articles I found the reason of such a high rate of them. Null-edits of these pages are ~90% of all the null edits and ~30% of all wikipedia edits:

3446 User:Cyberbot_I/Status
3445 User:Cyberbot_II/Status
1665 Template:Cratstats
575 صارف:Cyberbot_I/Run/Adminstats
575 User:Cyberbot_I/Run/Adminstats
575 Defnyddiwr:Cyberbot_I/Run/Adminstats
574 User:Cyberbot_I/Run/Datefixer
507 User:Cyberbot_I/adminrights-admins.js
204 User:Cyberpower678/Tally

Although we still don't know the purpose of other 10% of null edits, at list the mystery with it's rate is resolved now.
We can generate a resource_change event for null edits and blacklist rerendering for those cyberbot pages.

Change 279282 abandoned by Ppchelko:
Temprorarily sample logs for null edit.

Reason:
The data was gathered in a different way.

https://gerrit.wikimedia.org/r/279282

Change 279676 had a related patch set uploaded (by Ppchelko):
Emit resource_change events.

https://gerrit.wikimedia.org/r/279676

Change 279681 had a related patch set uploaded (by GWicke):
Exclude null edits from edit metrics

https://gerrit.wikimedia.org/r/279681

Wow, 30% of all Wikipedia edits are null edits by bots updating their own
pages? Amazing. Nice work!

The mystery was solved, bots making all the null edits blocked in RESTBase, null edits addressed in the Event-Platform extension.

Change 279676 merged by jenkins-bot:
Emit resource_change events.

https://gerrit.wikimedia.org/r/279676

Has anyone reached out to the operator of Cyberbot_I to get him or her to stop?

Edit: done in https://en.wikipedia.org/wiki/User_talk:Cyberpower678#You_are_responsible_for_30.25_of_all_edit_activity_on_Wikipedia

Has anyone reached out to the operator of Cyberbot_I to get him or her to stop?

Edit: done in https://en.wikipedia.org/wiki/User_talk:Cyberpower678#You_are_responsible_for_30.25_of_all_edit_activity_on_Wikipedia

Based on investigation done by @ori, I have boldly proposed a pull request for the Cyberbot_I scripts: https://github.com/cyberpower678/Cyberbot_I/pull/1

Has anyone reached out to the operator of Cyberbot_I to get him or her to stop?

Edit: done in https://en.wikipedia.org/wiki/User_talk:Cyberpower678#You_are_responsible_for_30.25_of_all_edit_activity_on_Wikipedia

Based on investigation done by @ori, I have boldly proposed a pull request for the Cyberbot_I scripts: https://github.com/cyberpower678/Cyberbot_I/pull/1

I put in an even better patch which should reduce null edits down to 1 per script. That means the only null edits are from script startup and should be no more after that.

I deployed this yesterday so it should be visible in the logs that the usage went down and the 30% should be gone.

As a reach out, has the stress caused by Cyberbot disappeared?

Change 279681 abandoned by Krinkle:
Exclude null edits from edit metrics

Reason:
Superseded by Ic74a9d5601b8cd1801f66b462099460f823ecde4

https://gerrit.wikimedia.org/r/279681