Page MenuHomePhabricator

Drop of editor numbers for earlier months
Closed, ResolvedPublic

Description

I am preparing statistics, and I downloaded the csv file a month ago from here:
https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|editor_type~user|monthly
and I downloaded today again.

I expected, that there will be only one extra line there (Apr 2019), but other values are the same, but I realized that there is significant difference between the editor numbers for the previous months (the new numbers are 5-10% smaller).

[removed dead links]

Could you explain this? Did you make a change recently, how you calculate the number of users?

Event Timeline

Samat created this task.Jun 23 2019, 9:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 23 2019, 9:19 PM
Samat renamed this task from Drops of editor numbers for earlier months to Drop of editor numbers for earlier months.Jun 23 2019, 9:45 PM

Numbers are recalculated every month. I know, probably not what you would expect. My guess is that there are some users that are now identified as "bots", Ping @JAllemandou that can add more detail

Samat added a comment.Jun 24 2019, 7:28 AM

I know that the numbers are recalculated, and small difference (few editors) would not be a surprise. But the old and new curves are around parallel with 5-10% difference. I believe that there is a methodological change behind this.

(Sorry, the files are apparently expired since yesterday evening, and not available. I will upload them again later today.)

Hi @Samat, thanks for reaching out.
It would be interesting if you could upload the files again, and also possibly confirm the URL you downloaded data from, as my tests/checks don't show differences that big.
I have checked the number of users only editors for huwiki over 4 years, looking for differences in our last 3 snapshots (we call monthly recomputations snapshots), and while there a very small deletion-drift (difference due to pages being deleted, as they are excluded from statistics computation), they are really not a 5%/10% change, more like -0.05% to -0.10%, and only for 3/4 month before last month.

mforns triaged this task as Medium priority.Jun 24 2019, 3:57 PM
mforns moved this task from Incoming to Ops Week on the Analytics board.
Samat added a comment.Jun 24 2019, 8:25 PM

Thank you @Nuria and @JAllemandou. I tried to answer your questions below.

The source file (csv) downloaded on 7th May:


from the following link: https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|editor_type~user
(The link does not work anymore, because there is a new required parameter monthly or daily.)

The source file (csv) downloaded on 23rd June:


from the following link: https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|editor_type~user|monthly

The two series in one ods file:


The figure below depicts the two graphs:

The minimum difference is 0%, the maximum difference is 20%, the mean difference is 4.6%, and the median of the differences is 4.5%.

(I didn't know, it is so easy to add files here.)

Samat updated the task description. (Show Details)Jun 24 2019, 8:26 PM
Samat updated the task description. (Show Details)

Thanks a lot @Samat for the details.
Indeed you were right the difference is to be accounted for a methodological change. I'm sorry not to have noticed right away.
From the month 2019-05 onward, we have changed the way editors were computed by removing the edits on deleted pages.
We did this to be more homogeneous, as other metrics (edits and edited-pages for instance) were already computed with deleted-edits removal.

I've been looking for an announcement of that change as I thought we did one, but couldn't find any, so maybe I imagined we did :(

I hope this change doesn't prevent you from running your analysis.

Nuria added a comment.Jun 25 2019, 3:06 PM

@JAllemandou let's please document this here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats2/Metrics/FAQ and make sure that the metric page on wikistats is accurate (i think it is) : https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Editors, it will be good to record the date of the snapshot where the definition changed.

Nuria assigned this task to JAllemandou.Jun 25 2019, 4:01 PM
Nuria added a project: Analytics-Kanban.
Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.
Samat awarded a token.Jun 25 2019, 5:27 PM
Nuria added a comment.Jun 26 2019, 3:26 PM

Ping about docs

Done ! Sorry for the delay.

JAllemandou raised the priority of this task from Medium to High.Jun 27 2019, 7:41 AM
JAllemandou moved this task from In Progress to Done on the Analytics-Kanban board.
Nuria closed this task as Resolved.Jun 27 2019, 3:10 PM