Page MenuHomePhabricator

track number of edits per namespace over time
Closed, ResolvedPublic5 Estimated Story Points

Description

Problem:
We are tracking the number of editors who make at least 1/5/100 edits in the past 30 days split by namespace at https://grafana.wikimedia.org/d/000000162/wikidata-site-stats?orgId=1.
We now also want a graph showing the number of edits per namespace over time to better understand how many edits are happening across the namespaces. This graph can be added to https://grafana.wikimedia.org/d/000000162/wikidata-site-stats?orgId=1 as well.

Acceptance criteria:

Notes:

  • See the total edits panel on the same board for similar tracking. The code is linked in the little info i in the corner of that panel.

Event Timeline

Lydia_Pintscher renamed this task from track editor numbers split by 'edits' to track edits per namespace over time.Apr 28 2021, 7:08 PM
Lydia_Pintscher renamed this task from track edits per namespace over time to track number of edits per namespace over time.
Lydia_Pintscher triaged this task as Medium priority.
Lydia_Pintscher updated the task description. (Show Details)

Change 692259 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[analytics/wmde/scripts@master] Introduce edit_count_by_namespace metric

https://gerrit.wikimedia.org/r/692259

Now that I'm trying to review the code, I got confused: How is this any different from what was added in T218901: Track number of Wikidata edits by namespace?

I didn't know that existed...

Here's a comparison of values:

image.png (542×356 px, 52 KB)

Notable differences I noticed:

  • The values of content namespaces don't differ that much, it can be just due to different timing of the query
  • The values of talk namespaces differ a lot. The old script says only 20 edits happened on talk pages for that day, while the new script says 1K edits happened.
  • The old code is limited to just a set of namespaces and it's pretty small. Maybe we can just remove that?

These questions need answering with @Lydia_Pintscher I assume.

[...]

Notable differences I noticed:

  • [...]
  • The values of talk namespaces differ a lot. The old script says only 20 edits happened on talk pages for that day, while the new script says 1K edits happened.
  • [...]

The existing SQL of the old script includes this line in its WHERE clause:

AND rc_source = 'mw.edit'

I'm not perfectly sure what it does, but maybe that explains the lower numbers?

Edit: rc_source records the type of the change. So it would seem to make sense to, for example, exclude log actions. Though that documentation is somewhat vague on the set of possible values and whether we maybe want to record more than only mw.edit?

I didn't know that existed...

Here's a comparison of values:

image.png (542×356 px, 52 KB)

Notable differences I noticed:

  • The values of content namespaces don't differ that much, it can be just due to different timing of the query
  • The values of talk namespaces differ a lot. The old script says only 20 edits happened on talk pages for that day, while the new script says 1K edits happened.
  • The old code is limited to just a set of namespaces and it's pretty small. Maybe we can just remove that?

These questions need answering with @Lydia_Pintscher I assume.

Sorry. I thought we had added all of this to this ticket during story time... We discussed this in story time because I wasn't aware of that board initially. We said the existing one doesn't quite do what we want, mainly due to it not covering all namespaces. We said we'd remove the existing one from https://grafana.wikimedia.org/d/000000170/wikidata-edits?orgId=1&refresh=1m and add a new one covering all namespaces to https://grafana.wikimedia.org/d/000000162/wikidata-site-stats?orgId=1.

[...]

Notable differences I noticed:

  • [...]
  • The values of talk namespaces differ a lot. The old script says only 20 edits happened on talk pages for that day, while the new script says 1K edits happened.
  • [...]

The existing SQL of the old script includes this line in its WHERE clause:

AND rc_source = 'mw.edit'

I'm not perfectly sure what it does, but maybe that explains the lower numbers?

Edit: rc_source records the type of the change. So it would seem to make sense to, for example, exclude log actions. Though that documentation is somewhat vague on the set of possible values and whether we maybe want to record more than only mw.edit?

Hmmm I'd say we don't want to count log actions as edits for the purpose of the stats required here.

Thank you for your explanation, I didn't get that from the acceptance criteria above.

[...]
We said the existing one doesn't quite do what we want, mainly due to it not covering all namespaces.
[...]

Out of curiosity: which namespace do you want to see that you are missing in the existing one?

/me goes back to the patch and gives proper feedback.

Out of curiosity: which namespace do you want to see that you are missing in the existing one?

So the existing one only tracks the entity namespaces. Things like the project/Wikidata and Help namespaces are also pretty useful to understand.

Change 692259 merged by jenkins-bot:

[analytics/wmde/scripts@master] Make recent_changes_by_namespace track all namespaces

https://gerrit.wikimedia.org/r/692259

The patch looks good - merged. Two minor issues are remaining:

  • the third AC "[ ] the graph is deleted from the board it comes from" isn't met yet
  • sortByMaxima() is inconsistent with the three panels above, that also list all namespaces but do sortByName(true)

Change 695040 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[analytics/wmde/scripts@production] Make recent_changes_by_namespace track all namespaces

https://gerrit.wikimedia.org/r/695040

Change 695040 merged by jenkins-bot:

[analytics/wmde/scripts@production] Make recent_changes_by_namespace track all namespaces

https://gerrit.wikimedia.org/r/695040

The patch looks good - merged. Two minor issues are remaining:

  • the third AC "[ ] the graph is deleted from the board it comes from" isn't met yet
  • sortByMaxima() is inconsistent with the three panels above, that also list all namespaces but do sortByName(true)

both are done. Please take a look

          #
  #      ##
  #     # #
#####     #
  #       #
  #       #
        #####