Research are working on core metrics guidelines for Communications, so this is a good opportunity to nail down all the details of this metric.
|Open||ezachte||T117221 [Epic] Update official Wikimedia press kit with accurate numbers|
|Open||leila||T144639 Propose metrics along with qualifiers for the press kit|
|Resolved||Neil_P._Quinn_WMF||T151507 Refine definition of active editors metric|
- I will document our consensus on Meta.
- I will change my calculations of active editors to match the refined definition
- @ezachte will look at the incidence of bot flags on multiple wikis, to help us decide how many flags an account should need to qualify as a bot globally.
- where feasible, @ezachte will change Wikistats to match the refined definition (it will not be feasible to account for edits to deleted pages, but counting edits to redirects is likely to have the largest impact here).
- @leila will inform Communications about the decision.
Remaining action items:
- Decide how the metric should respond when a wiki changes its selection of content namespaces (I sent an email about this). I'll follow up.
- I'm not sure if @DarTar has given his thoughts on the definition.
- @ezachte will update Wikistats's active editor numbers to match the new definition (I've spun that off into T153702).
Other than that, I believe my work here done.
Now that we've developed a consensus definition for active editors
Really? Was there a discussion somewhere on analytics or wiki-research-l or wikimedia-l or some other relevant discussion venue?
I've not had the opportunity to comment before, so I left some comments now: https://meta.wikimedia.org/wiki/Research_talk:Active_editor
I expect other people may have comments too, and it would be better to have a discussion now rather than later (e.g. when they are caught by surprise due to WikiStats changes). So, again, please notify/open a discussion at least on the main mailing lists.
@Nemo_bis I responded to your comments on the talk page.
On the larger question of having a broader discussion, this project (as I said just now on the talk page) was only about agreeing on minor technical details that had previously only been decided implicitly by the implementers of these metrics (such as me and @Erik_Zachte). It was not about making any major changes.
There are major changes I would like to make, like including non-content namespaces in this calculation or even moving from an edit-counts based metrics to a session-time based one. However, those would be much more disruptive, so I would absolutely propose those for a broader discussion first on the relevant mailing lists (analytics-l, wiki-research, wikitech, and maybe wikimedia-l as well).
Anyway, if you think I'm totally wrong about the significance of these changes, you're welcome to start a mailing list discussion and see if significant numbers of other people agree with you. If that turns out to be the case, we will definitely adjust.
I also responded on the talk page. And actually reconsidered while doing so. It seemed adding extensive code to detect internationalized redirects wasn't worth the trouble for filtering a few edits. But then I realized we need that detection anyway to filter redirect pages from article counts (or else English Wikipedia would have 13 million 'articles'). Hmm
Regretfully other urgent work got in the way this week and I won't be able to give detailed feedback on this until I am back on January 10. Thanks for leading this effort so far, y'all.
That's true, but consider that there is no good historical source of data about redirects other than manually going over the text of old revisions via the dumps or the API. You can't do it in the (MariaDB) application databases, because they don't contain the text of old revisions (that's in External Storage). You also can't do it even for present data in the Data Lake, because it does not have info on redirects or internal links.
So eliminating the redirect requirement from active editors still dramatically increases the number of ways it can be calculated. It doesn't help us with calculating historical article counts, but that's a separate issue.