Page MenuHomePhabricator

Track number of Wikidata edits by namespace
Open, Needs TriagePublic

Description

This will be useful not only for Schemas, but also for other namespaces. Can probably be done as a daily cron job: count the number of revisions in the namespaces 0 (item), 120 (property), 146 (lexeme), tbd (schema), and the corresponding talk namespaces over the previous 24 hours and write that to Graphite.

It should be added as a new panel to the Wikidata Edits dashboard (which currently distinguishes between user/bot/anonymous/etc., but not between item/property/etc.).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 21 2019, 2:46 PM

That script uses the API with action=query&list=recentchanges, which doesn’t directly give us the namespace. We could try to parse it from the title; get the page IDs and ask for their namespaces in a separate query; or perhaps switch to SQL.

Is there a reason why that script doesn’t use SQL in the first place, by the way?

Change 500752 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[analytics/wmde/scripts@master] WIP: count number of Wikidata edits by namespace

https://gerrit.wikimedia.org/r/500752

Reading about this - Would delayed data be interesting? This information is accessible in hadoop :)

Change 500752 merged by jenkins-bot:
[analytics/wmde/scripts@master] Count number of Wikidata edits by namespace

https://gerrit.wikimedia.org/r/500752

Change 502169 had a related patch set uploaded (by Hoo man; owner: Lucas Werkmeister (WMDE)):
[analytics/wmde/scripts@production] Count number of Wikidata edits by namespace

https://gerrit.wikimedia.org/r/502169

Well, we currently run the cron job at 3AM each day and have it check the recent changes from midnight yesterday to midnight today (all UTC), so it’s already delayed by three hours. Would using Hadoop be advantageous to us? I’m not sure if any of the existing scripts use it.

Some queries are computed using hadoop for wikidata (see https://github.com/wikimedia/analytics-refinery/tree/master/oozie/wikidata). If SQL over recent-changes works for, that's great :)

Where can I see the result? :)

I don’t think it’s deployed yet (see https://gerrit.wikimedia.org/r/502169 above). And then we’ll need to add the new metrics to some Grafana board.

Should this go back to some “Doing” column until the deployment is done? Though I guess Shape Expressions Sprint 5 would no longer be the appropriate project.

Moving to stalled until the deployment is done.

Change 502169 merged by jenkins-bot:
[analytics/wmde/scripts@production] Count number of Wikidata edits by namespace

https://gerrit.wikimedia.org/r/502169