Page MenuHomePhabricator

Track and monitor items per editor ratio for Wikidata
Closed, ResolvedPublic3 Estimated Story Points

Description

Context:
Wikidata Analytics

User story:
As Wikidata PMs we want to get a better understanding of the relationship of Items and editors over time to inform our product strategy.

Problem:
We monitor the number of total Items and the number of active editors per day already (internal). We currently do not have a Graphana board and we do not monitor the number of active Items. The number of active Items is used as an indicator for potentially vandalized Items here.

Acceptance criteria:

  • Track number of active items (the number of items that were touched at least once in the last 30 days) per day
  • Create Graphana dashboard illustrating the following:
    • the number of Items (tracked already)
    • the number of active Items
    • the number of active editors (tracked already)
    • the proportion of items / active editors (with the three definitions of 1, 5, and 100 edits per editor)
    • the proportion of active items / active editors (with the three definitions of 1, 5, and 100 edits per editor)

Open questions:

  • Could we easily track this for segments of Wikidata?
    • Yes, but would be inefficient, so let's not do this for now.
  • Where should we put this?

Origin:
Community request

Tech note:
Take a look at this Gerrit repo https://gerrit.wikimedia.org/r/admin/repos/analytics/wmde/scripts (on github)
Docs at https://wikitech.wikimedia.org/wiki/WMDE/Analytics#analytics/wmde/scripts_repo too

Event Timeline

Manuel renamed this task from Track and monitor active items on Wikidata to Track and monitor editors per item ratio for Wikidata.Jul 19 2021, 12:20 PM
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)
Lydia_Pintscher renamed this task from Track and monitor editors per item ratio for Wikidata to Track and monitor active items on Wikidata.Jul 19 2021, 12:20 PM
Lydia_Pintscher updated the task description. (Show Details)
Lydia_Pintscher renamed this task from Track and monitor active items on Wikidata to Track and monitor editors per item ratio for Wikidata.Jul 19 2021, 12:24 PM
Lydia_Pintscher updated the task description. (Show Details)
Lydia_Pintscher updated the task description. (Show Details)
Manuel updated the task description. (Show Details)

That is a good question. Wikidata:Main_Page/Popular might be the one thing that is somewhat close, though it mostly highlights bicycle racing.

Manuel renamed this task from Track and monitor editors per item ratio for Wikidata to Track and monitor items per editor ratio for Wikidata.Aug 5 2021, 7:54 AM
Manuel added a project: Wikidata-Campsite.

Should be a straightforward query:

MariaDB [wikidatawiki]> SELECT COUNT(DISTINCT rc_title) FROM recentchanges WHERE rc_namespace = 0;
+--------------------------+
| COUNT(DISTINCT rc_title) |
+--------------------------+
|                  9493417 |
+--------------------------+
1 row in set (27.620 sec)
Manuel updated the task description. (Show Details)
Addshore updated the task description. (Show Details)
Addshore set the point value for this task to 3.

I tried out queries for counting active items for different thresholds (1, 5, 100), and the result seems interesting:

MariaDB [wikidatawiki]> SELECT COUNT(*) FROM (SELECT rc_title FROM recentchanges WHERE rc_namespace = 0 GROUP BY rc_title HAVING COUNT(*) >= 1) x;
+----------+
| COUNT(*) |
+----------+
|  9673250 |
+----------+
1 row in set (15.703 sec)
MariaDB [wikidatawiki]> SELECT COUNT(*) FROM (SELECT rc_title FROM recentchanges WHERE rc_namespace = 0 GROUP BY rc_title HAVING COUNT(*) >= 5) x;
+----------+
| COUNT(*) |
+----------+
|   438827 |
+----------+
1 row in set (9.089 sec)

MariaDB [wikidatawiki]> SELECT COUNT(*) FROM (SELECT rc_title FROM recentchanges WHERE rc_namespace = 0 GROUP BY rc_title HAVING COUNT(*) >= 100) x;
+----------+
| COUNT(*) |
+----------+
|      416 |
+----------+
1 row in set (9.182 sec)

Some 9.7 million distinct items are edited in 30 days, as we saw above; but only 439k of them see 5 or more edits, and only 416 of them 100 or more edits. We actually have more highly active users than highly active items, by that measure.

Is that worth tracking, or should we limit it to just the active items regardless of their number of edits?

Change 713464 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@master] Track number of active items

https://gerrit.wikimedia.org/r/713464

@Lucas_Werkmeister_WMDE thank you for pointing this out! Could you please provide me with the list of Items in the ">= 100" group? This would be helpful to better understand what kind of content this is.

Change 713464 merged by jenkins-bot:

[analytics/wmde/scripts@master] Track number of active items

https://gerrit.wikimedia.org/r/713464

Sure, I put them at P17044 with their English labels. Almost half are COVID-related.

Change 713850 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@production] Track number of active items

https://gerrit.wikimedia.org/r/713850

Change 713897 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@master] Add comment to active_items.sql

https://gerrit.wikimedia.org/r/713897

Change 713850 merged by jenkins-bot:

[analytics/wmde/scripts@production] Track number of active items

https://gerrit.wikimedia.org/r/713850

Change 713897 merged by jenkins-bot:

[analytics/wmde/scripts@master] Add comment to active_items.sql

https://gerrit.wikimedia.org/r/713897

Change 714153 had a related patch set uploaded (by Ladsgroup; author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@production] Add comment to active_items.sql

https://gerrit.wikimedia.org/r/714153

Change 714153 merged by jenkins-bot:

[analytics/wmde/scripts@production] Add comment to active_items.sql

https://gerrit.wikimedia.org/r/714153

Change 714803 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@master] Add active_items.php to daily.03.sh

https://gerrit.wikimedia.org/r/714803

Change 714680 had a related patch set uploaded (by Ladsgroup; author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@production] Add active_items.php to daily.03.sh

https://gerrit.wikimedia.org/r/714680

Change 714803 merged by jenkins-bot:

[analytics/wmde/scripts@master] Add active_items.php to daily.03.sh

https://gerrit.wikimedia.org/r/714803

Change 714680 merged by jenkins-bot:

[analytics/wmde/scripts@production] Add active_items.php to daily.03.sh

https://gerrit.wikimedia.org/r/714680

Change 715013 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@master] Make active_items.php executable

https://gerrit.wikimedia.org/r/715013

Change 715013 merged by jenkins-bot:

[analytics/wmde/scripts@master] Make active_items.php executable

https://gerrit.wikimedia.org/r/715013

Change 714856 had a related patch set uploaded (by Ladsgroup; author: Lucas Werkmeister (WMDE)):

[analytics/wmde/scripts@production] Make active_items.php executable

https://gerrit.wikimedia.org/r/714856

Change 714856 merged by jenkins-bot:

[analytics/wmde/scripts@production] Make active_items.php executable

https://gerrit.wikimedia.org/r/714856

Alright, I’ve added three new panels with the requested metrics to the Wikidata Site Stats dashboard. Feel free to reorganize them (e.g. separate dashboard).

(Side note: in Items / Active Editors, I’m counting all items. We could also change this to only the non-redirect items.)

@Lucas_Werkmeister_WMDE
Hi, I fail to understand the descriptions on 2 of the new dashboards.

Items / Active Editors:
Could this be better explained? What is the 1, 5, 100 metric? Users? Edits? Is rising better than falling?

Regarding: Active Items / Active Editors
What is the purpose of this metric? Is rising better than falling? Could you explain in the i-text what it means?