Page MenuHomePhabricator

Monthly editors in wiki comparison tool do not match Wikistats
Closed, ResolvedPublic

Description

@KCVelaga_WMF noticed that the "monthly editors" metric in the wiki comparison tool are considerably higher than the equivalent number in Wikistats, in some cases by as much as 20%.

We should figure out what's going on here.

periodwikiWikistatsWiki comparison
Jan–Dec 2020English Wikipedia131,860139,341 (+5.7%)
Jan–Dec 2020Spanish Wikipedia17,15719,042 (+11.0%)
Jan-Dec 2020Wikimedia Commons34,70241,917 (+20.8%)

To investigate:

Event Timeline

This is an important question, but we're juggling some other high priority requests this week. We'll review again and triage next week.

It turns out this discrepancy exists because Wikistats excludes edits to deleted pages from its calculation for consistency with the old Wikistats, as @kzimmerman noted. I hadn't realized that this was the case; including deleted pages is the preferred way to construct metrics these days, to prevent numbers from shifting long after the fact.

My investigation is in this notebook.

The main thing that would prevent confusion like this is having the option to see both with-deleted and without-deleted versions of editing metrics in Wikistats. Just having the option will increase awareness of the importance of that parameter and give users the capability to quickly determine when it's the source of a discrepancy like this. I've filed T295212 for that.

Reopening while I make some documentation updates to wiki comparison and possibly our data dictionary as well.

I've added notes about this to the "metric definitions" tab of the Wiki Comparison Tool and to mw:Wikimedia Product/Data glossary.