Page MenuHomePhabricator

track editor numbers split by namespace
Closed, ResolvedPublic8 Estimated Story Points

Description

Problem:
We are tracking the number of editors who make at least 1/5/100 edits in the past 30 days at https://grafana.wikimedia.org/d/000000162/wikidata-site-stats?orgId=1. This is for all edits across all namespaces. It'd be useful to also keep this statistic split by namespace. The main namespace, Property namespace and the Lexeme namespace are of special interest but we will want to look at the others as well.

Acceptance criteria:

Notes:

  • We count only the edits an editor makes in a specific namespace: e.g. if a user makes 10 edits in Lexemes and 100 edits in the main namespace, then they should be counted as active (5 or more edits) for the Lexeme namespace and very active (100 or more edits) for the main namespace.
  • Testing locally cannot happen out of the box

Links that may be useful

Event Timeline

Adding Lucas comments from mattermost discussion - As a database query

MariaDB [wikidatawiki]> SELECT rc_namespace, COUNT(DISTINCT rc_actor) AS editors FROM recentchanges WHERE rc_namespace >= 0 GROUP BY rc_namespace ORDER BY editors DESC;

result with registered users only

MariaDB [wikidatawiki]> SELECT rc_namespace, COUNT(DISTINCT rc_actor) AS editors FROM recentchanges JOIN actor ON rc_actor = actor_id WHERE rc_namespace >= 0 AND actor_user > 0 GROUP BY rc_namespace ORDER BY editors DESC;

Change 671195 had a related patch set uploaded (by Silvan Heintze; owner: Silvan Heintze):
[analytics/wmde/scripts@master] Track editor numbers split by namespace

https://gerrit.wikimedia.org/r/671195

Change 671195 merged by jenkins-bot:
[analytics/wmde/scripts@master] Track editor numbers split by namespace

https://gerrit.wikimedia.org/r/671195

This will need to also land in the "production" branch in order to be deployed.
See https://wikitech.wikimedia.org/wiki/WMDE/Analytics#analytics/wmde/scripts_repo

Change 672823 had a related patch set uploaded (by Silvan Heintze; owner: Silvan Heintze):
[analytics/wmde/scripts@production] Track editor numbers split by namespace

https://gerrit.wikimedia.org/r/672823

Change 672823 merged by jenkins-bot:
[analytics/wmde/scripts@production] Track editor numbers split by namespace

https://gerrit.wikimedia.org/r/672823

waiting for some data to show up in graphite before the grafana panel can be added

waiting for some data to show up in graphite before the grafana panel can be added

Moving this back to Doing because there should already be enough data in graphite.

PHP error message in production, according to people in #wikimedia-analytics:

PHP Fatal error:  Uncaught TypeError: Argument 1 passed to WikidataActiveUsersByNamespace::collectNamespaces() must be of the type array, object given, called in /srv/analytics-wmde/graphite/src/scripts/src/wikidata/site_stats/active_users_by_namespace.php on line 26 and defined in /srv/analytics-wmde/graphite/src/scripts/src/wikidata/site_stats/active_users_by_namespace.php:55

Change 677871 had a related patch set uploaded (by Silvan Heintze; author: Silvan Heintze):

[analytics/wmde/scripts@master] Fix PHP Fatal error

https://gerrit.wikimedia.org/r/677871

Change 677722 had a related patch set uploaded (by Silvan Heintze; author: Silvan Heintze):

[analytics/wmde/scripts@production] Fix PHP Fatal error

https://gerrit.wikimedia.org/r/677722

Change 677871 merged by jenkins-bot:

[analytics/wmde/scripts@master] Fix PHP Fatal error

https://gerrit.wikimedia.org/r/677871

Change 677722 abandoned by Silvan Heintze:

[analytics/wmde/scripts@production] Fix PHP Fatal error

Reason:

more errors to fix, before this can be deployed

https://gerrit.wikimedia.org/r/677722

Change 677935 had a related patch set uploaded (by Silvan Heintze; author: Silvan Heintze):

[analytics/wmde/scripts@master] Fix SQL query field name

https://gerrit.wikimedia.org/r/677935

Change 677935 merged by jenkins-bot:

[analytics/wmde/scripts@master] Fix SQL query field name

https://gerrit.wikimedia.org/r/677935

Change 677975 had a related patch set uploaded (by Silvan Heintze; author: Silvan Heintze):

[analytics/wmde/scripts@master] Final fixes to get editors split by namespace

https://gerrit.wikimedia.org/r/677975

Change 677975 merged by jenkins-bot:

[analytics/wmde/scripts@master] Final fixes to get editors split by namespace

https://gerrit.wikimedia.org/r/677975

Change 677722 restored by Addshore:

[analytics/wmde/scripts@production] Fix PHP Fatal error

https://gerrit.wikimedia.org/r/677722

Change 677950 had a related patch set uploaded (by Addshore; author: Silvan Heintze):

[analytics/wmde/scripts@production] Fix SQL query field name

https://gerrit.wikimedia.org/r/677950

Change 677951 had a related patch set uploaded (by Addshore; author: Silvan Heintze):

[analytics/wmde/scripts@production] Final fixes to get editors split by namespace

https://gerrit.wikimedia.org/r/677951

Change 677722 merged by jenkins-bot:

[analytics/wmde/scripts@production] Fix PHP Fatal error

https://gerrit.wikimedia.org/r/677722

Change 677950 merged by jenkins-bot:

[analytics/wmde/scripts@production] Fix SQL query field name

https://gerrit.wikimedia.org/r/677950

Change 677951 merged by jenkins-bot:

[analytics/wmde/scripts@production] Final fixes to get editors split by namespace

https://gerrit.wikimedia.org/r/677951

added three panels (1/5/100) for editor numbers split by namespace (T275999) to the "Wikidata Site Stats" dashboard on grafana

Side note: the way user activity is fetched from the database does not only include edits, but also other activities with log entries, such as patrolling or deleting. We assume this is expected, so maybe the dashboard description should be changed, as it currently says:

Active Users - The number of users that have made more than 1, 5 or 100 edits in the past 30 days.

added three panels (1/5/100) for editor numbers split by namespace (T275999) to the "Wikidata Site Stats" dashboard on grafana

Yay!
Any chance we can add human readable names for the namespaces? Having to look up which ID goes with which namespace is a bit cumbersome.

Side note: the way user activity is fetched from the database does not only include edits, but also other activities with log entries, such as patrolling or deleting. We assume this is expected, so maybe the dashboard description should be changed, as it currently says:

Active Users - The number of users that have made more than 1, 5 or 100 edits in the past 30 days.

That's also the case for the existing data that we have for all namespaces, right? In this case I'd leave it as is because then by convention those are all also treated as edits. If it differs then we should change it yeah.

Any chance we can add human readable names for the namespaces? Having to look up which ID goes with which namespace is a bit cumbersome.

aka

aka

sortByName(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasSub(aliasByNode(daily.wikidata.site_stats.active_users_by_namespace.*.1, 4), '^-1$', '-1: Special'), '^0$', '0: Main'), '^1$', '1: Talk'), '^2$', '2: User'), '^3$', '3: User Talk'), '^4$', '4: Wikidata'), '^5$', '5: Wikidata Talk'), '^6$', '6: File'), '^7$', '7: File Talk'), '^8$', '8: MediaWiki'), '^9$', '9: MediaWiki Talk'), '^10$', '10: Template'), '^11$', '11: Template Talk'), '^12$', '12: Help'), '^13$', '13: Help Talk'), '^14$', '14: Category'), '^15$', '15: Category Talk'), '^120$', '120: Property'), '^121$', '121: Property Talk'), '^122$', '122: Query'), '^123$', '123: Query Talk'), '^146$', '146: Lexeme'), '^147$', '147: Lexeme Talk'), '^640$', '640: EntitySchema'), '^641$', '641: EntitySchema Talk'), '^828$', '828: Module'), '^829$', '829: Module Talk'), '^1198$', '1198: Translations'), '^1199$', '1199: Translations Talk'), '^2600$', '2600: Topic'), true)

(for lack of a better location to document and version-control this)