We have recently switched the Event-Platform extension to emit revision_create events and dropped page_edit ones. Judging by the documentation of the associated hooks, the production rate should have stayed the same, or even slightly in favour of revision_create (since it covers cases which page_edit doesn't), but the opposite happened - we are witnessing a drop in the event creation rate: the extension was producing around 20 page_edit events per second prior to the switch, against the current rate of 5 revision_create events per second. We need to investigate whether we are correctly emitting all the needed events.
I've dig in MW code a bit today. The PageContentSaveComplete hook is called from two places: in doCreate and in doEdit In both places if the revision was changed the call is preceded with $revision->insertOn, which in turn calls the RevisionInsertionComplete hook. There's nothing in these functions what could possible interrupt the execution earlier than hooks are called, so judging by the code, the only difference between RevisionInsertComplete and PageContentSaveComplete is that the former is not called if the page content was not changed.
However, I'm not 100% comfortable with stating that null edits are responsible for the difference in event rate. The difference is ~10/s, and it's surprising that there's so many null edits. On the other hand, we have ~25 serenaders with unchanged HTML in RESTBase, which kind of concerted with 10 null edits per second.
I've also played with vagrant today for a while and I didn't find any circumstance when the revision_create event is not emitted.
We looked into this a bit further last Friday. The documentation for PageContentSaveComplete mentions that the $revision parameter *can* be null if the content was not changed. There is an early return in the WikimediaEvents extension that is intended to trigger in that case. However, looking at the MediaWiki side, it looks like [the $revision parameter actually ends up being defined even if the content has not changed](https://github.com/wikimedia/mediawiki/blob/996b7350f3355f24abb55394ced108c0c9b8ef3a/includes/page/WikiPage.php#L1869).
So, it looks like a part of the discrepancy can be explained by one hook counting null-edits, while the other does not. The large percentage of null edits is quite surprising, so we should verify that those are indeed the source of this issue, and if so where they are coming from.
If this theory is substantiated, then we should go through PageContentSaveComplete subscribers to see if they actually intend to include null edits, and fix those that do not. This could potentially make our save metrics more accurate, by decoupling them from the percentage of null edits.
For RESTBase update purposes, we should look into the reasons for those null edits. In the past, automated null edits have been used by the community to work around change propagation bugs in MediaWiki. We should establish whether
a) there are still concrete issues those null edits are working around, and
b) see if those would affect RESTBase as well.
If the answer to both a) and b) is 'yes', then we could consider representing those null edits as purge requests, in line with action=purge.
@Ottomata made a query today and got 100000 edit events generated by EventLogging from a PageContentSaveComplete hook. These events contained a revisionId, userAgent and timestamp. I've filtered out events for wikipedia sites, leaving me with 64000 events. Then, for each revision I've found it's timestamp from MW API. If the timestamp of a revision was significantly different from the timestamp in the event, it's a null edit. Filtering only null edits left me with 24453 events, which is ~40% and is consistent with what we see in production. So, the experiment confirmed that the drop in event creation rate is because of the null edits.
Then I've sorted null edit user agents by number of edits, here're top 6:
- Peachy MediaWiki Bot API Version 2.0 (alpha 8) 19536
- Mozilla/5.0 MaboMwFramework/1.3 (w:de:MerlBot) 816
- DotNetWikiBot/3.15 (Unix 18.104.22.168; Mono 22.214.171.124; .NET CLR 2.0.50727.1433) 789
- members (wikipedia:en; User:Reports_bot) Pywikibot/2.0b3 (-1 (unknown)) requests/2.9.1 Python/3.4.2.final.0 604
- php wikibot classes 190
- Java Bot (https://github/vlsergey/Secretary; firstname.lastname@example.org) BasedOnJWBF 180
Vast majority of edits is done with Peachy MediaWiki Bot API, which is just a library, so it doesn't tell us anything about the purpose of these edits
By looking at the titles of the null-edit articles I found the reason of such a high rate of them. Null-edits of these pages are ~90% of all the null edits and ~30% of all wikipedia edits:
Although we still don't know the purpose of other 10% of null edits, at list the mystery with it's rate is resolved now.
We can generate a resource_change event for null edits and blacklist rerendering for those cyberbot pages.
Has anyone reached out to the operator of Cyberbot_I to get him or her to stop?
I put in an even better patch which should reduce null edits down to 1 per script. That means the only null edits are from script startup and should be no more after that.
I deployed this yesterday so it should be visible in the logs that the usage went down and the 30% should be gone.