Page MenuHomePhabricator

Refine definition of active editors metric
Closed, ResolvedPublic

Description

Research are working on core metrics guidelines for Communications, so this is a good opportunity to nail down all the details of this metric.

Details at meta:Research:Defining monthly active editors, 2016.

Event Timeline

leila raised the priority of this task from Medium to Needs Triage.Nov 24 2016, 12:18 AM
leila updated the task description. (Show Details)

@leila, @ezachte, and I just met on this, and we agreed how to resolve most of the inconsistencies.

Action items:

  • I will document our consensus on Meta.
  • I will change my calculations of active editors to match the refined definition
  • @ezachte will look at the incidence of bot flags on multiple wikis, to help us decide how many flags an account should need to qualify as a bot globally.
  • where feasible, @ezachte will change Wikistats to match the refined definition (it will not be feasible to account for edits to deleted pages, but counting edits to redirects is likely to have the largest impact here).
  • @leila will inform Communications about the decision.

@Neil_P._Quinn_WMF thanks for documenting it and your help. Just a note that we agreed to run the final proposal re the updated metric by Dario before sending it out. I'll take care of that as well.

nshahquinn-wmf added a subscriber: DarTar.

I've documented the new definition at meta:Research:Active editor and updated our metric calculations to match (results available at mw:Wikimedia Product # Editing.

Remaining action items:

  • Decide how the metric should respond when a wiki changes its selection of content namespaces (I sent an email about this). I'll follow up.
  • I'm not sure if @DarTar has given his thoughts on the definition.
  • @ezachte will update Wikistats's active editor numbers to match the new definition (I've spun that off into T153702).

Other than that, I believe my work here done.

@Neil_P._Quinn_WMF I haven't yet, thanks for flagging this. I'll set aside time for reviewing it this week.

Now that we've developed a consensus definition for active editors

Really? Was there a discussion somewhere on analytics or wiki-research-l or wikimedia-l or some other relevant discussion venue?

I've not had the opportunity to comment before, so I left some comments now: https://meta.wikimedia.org/wiki/Research_talk:Active_editor

I expect other people may have comments too, and it would be better to have a discussion now rather than later (e.g. when they are caught by surprise due to WikiStats changes). So, again, please notify/open a discussion at least on the main mailing lists.

@Nemo_bis I responded to your comments on the talk page.

On the larger question of having a broader discussion, this project (as I said just now on the talk page) was only about agreeing on minor technical details that had previously only been decided implicitly by the implementers of these metrics (such as me and @Erik_Zachte). It was not about making any major changes.

There are major changes I would like to make, like including non-content namespaces in this calculation or even moving from an edit-counts based metrics to a session-time based one. However, those would be much more disruptive, so I would absolutely propose those for a broader discussion first on the relevant mailing lists (analytics-l, wiki-research, wikitech, and maybe wikimedia-l as well).

Anyway, if you think I'm totally wrong about the significance of these changes, you're welcome to start a mailing list discussion and see if significant numbers of other people agree with you. If that turns out to be the case, we will definitely adjust.

I also responded on the talk page. And actually reconsidered while doing so. It seemed adding extensive code to detect internationalized redirects wasn't worth the trouble for filtering a few edits. But then I realized we need that detection anyway to filter redirect pages from article counts (or else English Wikipedia would have 13 million 'articles'). Hmm

Regretfully other urgent work got in the way this week and I won't be able to give detailed feedback on this until I am back on January 10. Thanks for leading this effort so far, y'all.

I also responded on the talk page. And actually reconsidered while doing so. It seemed adding extensive code to detect internationalized redirects wasn't worth the trouble for filtering a few edits. But then I realized we need that detection anyway to filter redirect pages from article counts (or else English Wikipedia would have 13 million 'articles'). Hmm

That's true, but consider that there is no good historical source of data about redirects other than manually going over the text of old revisions via the dumps or the API. You can't do it in the (MariaDB) application databases, because they don't contain the text of old revisions (that's in External Storage). You also can't do it even for present data in the Data Lake, because it does not have info on redirects or internal links.

So eliminating the redirect requirement from active editors still dramatically increases the number of ways it can be calculated. It doesn't help us with calculating historical article counts, but that's a separate issue.

@DarTar, @leila, @ezachte Can I close this? I don't think there's anything left to do except for Dario's review and it doesn't look like that'll happen :)

@Neil_P._Quinn_WMF from my point of view, we have converged here. If you and Erik agree, let's close it. thank you for all your work on it. :)

OK, I'll activate the patch in Wikistats to conform to this, and close once done

nshahquinn-wmf claimed this task.

@ezachte, we have T153702 for that :)

nshahquinn-wmf raised the priority of this task from Medium to Needs Triage.Mar 29 2018, 9:06 AM
nshahquinn-wmf moved this task from Blocked to Done on the Contributors-Analysis board.