For technical and legal reasons, we should calculate the edit count bucket in frontend EventLogging. Otherwise, we can't preserve any information about the edit count beyond 90 days because the fine-grained information is often uniquely identifying. The bucketed count is safe to keep beyond the data retention window.
While updating these schemas, let's also change our bucketing to match the intervals used by other extensions: {0, 1-4, 5-99, 100-999, 1000+}
Be cautious about the migrations required:
* EventLogging must be kept backwards-compatible, an old event should still validate under the new schema (for silly reasons).
* Stale aggregations and Graphite metrics will conflict with the new enums, so must be removed.
* Old events must not break aggregation, but should be skipped. This can be accomplished by checking the schema revision level.
* Each patch must be robust against a rollback of the others.
Table of components affected:
TBD.
Acceptance criteria:
[] Edit count bucket should always be sent along with front-end events. Try to re-use core bucketing code.
[] Front-ends events use the MediaWiki-core bucket labels ("1-4 edits", "1000+ edits", etc.)
[] Some of the aggregations should be segmented by edit count bucket (TBD: document which)
[] Front-end should send a `null` bucketed edit count for anonymous users.
[] Aggregations should include anonymous as its own edit count bucket.
[] Aggregations skip events with old schemas (migration code should be marked as temporary). Old events may still be encountered, for example in the case that logging patches are reverted.
[] Cached aggregations and Graphite metrics should be purged, and the reporting start date pushed forward to correspond to new schema deployment.