Page MenuHomePhabricator

Drop of editor numbers for earlier months
Closed, ResolvedPublic

Assigned To
Authored By
Samat
Jun 23 2019, 9:19 PM
Referenced Files
F29633559: image.png
Jun 24 2019, 8:25 PM
F29633622: editors.csv
Jun 24 2019, 8:25 PM
F29633706: editors_apr.csv
Jun 24 2019, 8:25 PM
F29633924: diff.ods
Jun 24 2019, 8:25 PM
Tokens
"Like" token, awarded by Samat.

Description

I am preparing statistics, and I downloaded the csv file a month ago from here:
https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|editor_type~user|monthly
and I downloaded today again.

I expected, that there will be only one extra line there (Apr 2019), but other values are the same, but I realized that there is significant difference between the editor numbers for the previous months (the new numbers are 5-10% smaller).

[removed dead links]

Could you explain this? Did you make a change recently, how you calculate the number of users?

Event Timeline

Samat renamed this task from Drops of editor numbers for earlier months to Drop of editor numbers for earlier months.Jun 23 2019, 9:45 PM

Numbers are recalculated every month. I know, probably not what you would expect. My guess is that there are some users that are now identified as "bots", Ping @JAllemandou that can add more detail

I know that the numbers are recalculated, and small difference (few editors) would not be a surprise. But the old and new curves are around parallel with 5-10% difference. I believe that there is a methodological change behind this.

(Sorry, the files are apparently expired since yesterday evening, and not available. I will upload them again later today.)

Hi @Samat, thanks for reaching out.
It would be interesting if you could upload the files again, and also possibly confirm the URL you downloaded data from, as my tests/checks don't show differences that big.
I have checked the number of users only editors for huwiki over 4 years, looking for differences in our last 3 snapshots (we call monthly recomputations snapshots), and while there a very small deletion-drift (difference due to pages being deleted, as they are excluded from statistics computation), they are really not a 5%/10% change, more like -0.05% to -0.10%, and only for 3/4 month before last month.

mforns triaged this task as Medium priority.Jun 24 2019, 3:57 PM
mforns moved this task from Incoming to Ops Week on the Analytics board.

Thank you @Nuria and @JAllemandou. I tried to answer your questions below.

The source file (csv) downloaded on 7th May:


from the following link: https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|editor_type~user
(The link does not work anymore, because there is a new required parameter monthly or daily.)

The source file (csv) downloaded on 23rd June:


from the following link: https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|editor_type~user|monthly

The two series in one ods file:


The figure below depicts the two graphs:
image.png (915×1 px, 87 KB)

The minimum difference is 0%, the maximum difference is 20%, the mean difference is 4.6%, and the median of the differences is 4.5%.

(I didn't know, it is so easy to add files here.)

Samat updated the task description. (Show Details)

Thanks a lot @Samat for the details.
Indeed you were right the difference is to be accounted for a methodological change. I'm sorry not to have noticed right away.
From the month 2019-05 onward, we have changed the way editors were computed by removing the edits on deleted pages.
We did this to be more homogeneous, as other metrics (edits and edited-pages for instance) were already computed with deleted-edits removal.

I've been looking for an announcement of that change as I thought we did one, but couldn't find any, so maybe I imagined we did :(

I hope this change doesn't prevent you from running your analysis.

@JAllemandou let's please document this here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats2/Metrics/FAQ and make sure that the metric page on wikistats is accurate (i think it is) : https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Editors, it will be good to record the date of the snapshot where the definition changed.

JAllemandou raised the priority of this task from Medium to High.Jun 27 2019, 7:41 AM
JAllemandou moved this task from In Progress to Done on the Analytics-Kanban board.