Page MenuHomePhabricator

Support including edits to deleted pages in editing metrics
Open, LowPublic

Description

I recently learned that editing metrics in Wikistats actually exclude edits to deleted pages, for consistency with the old Wikistats.

For many years now, researchers have generally agreed that it is preferable to include these edits in editing metrics, to avoid the confusion that occurs when page deletions cause historical metrics to change long after the fact (this is called "deletion drift").

For this reason, all the metrics produced by Product Analytics (including the key product metrics and wiki comparison tool) include these edits. This adds another layer of confusion; for example, I spent a couple of hours investigating T293660 before realizing that the differing treatment of deleted edits had caused the discrepancy.

To help minimize this confusion, Wikistats should have the option to switch between with-deleted and without-deleted versions of editing metrics. I strongly recommend that with-deleted be the default, but if there's disagreement about that, at least providing the option would be very helpful.

Event Timeline

Thank you for bringing this up @nshahquinn-wmf :)
This topic has been discussed when we built wikistats2, and it was agreed at the time that metrics with "deletion drift" were less precise than adding a "deleted" column a the way to filter/split by.
From the wikimedia-analytics community perspective however, the ask was to replicate what wikistats was doing, and statistics with deletion drift was all they ever had (due to stats being compiled our of dumps).

My remembering of that period is that we decided not add the "deleted" dimension but to go with as-close-as-possible metrics to what the original wikistats was providing to reduce friction with data changes for the community.

Now from the feasibility perspective of now adding the dimension: this is quite some work!
This would require updates of every layer of the stack (serving backend, API, and front-end). And to speak about the side I know best, the backend update is tight to a larger problem that we have now postpone, about transforming the way we precompute some data for AQS to serve edits data (see https://phabricator.wikimedia.org/T181703 for instance, but there is more to it).