Page MenuHomePhabricator

Basic dashboards for Reference Previews tracking
Open, Needs TriagePublic5 Story Points

Description

Once we're tracking usage metrics, create dashboards to make the numbers useful.

Bridge our eventlogging (tbd: however we implement the metrics) numbers into summary statistics accessible from Grafana.

Acceptance criteria:

  • Graph comparing reference indicator clicks for people with ReferencePreviews enabled vs. baseline.
  • Graph comparing reference outbound link clicks for people with the feature vs. without.

Related research:
https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Citation_Usage

Dashboard:
https://grafana.wikimedia.org/d/aHQQ_20Wz/reference-previews-usage?orgId=1

Details

Related Gerrit Patches:
analytics/reportupdater-queries : masterFix another nonexistent field
operations/puppet : productionreport updater job: produce Reference Previews metrics
analytics/reportupdater-queries : masterNew reports for Reference Previews

Event Timeline

awight created this task.Sep 17 2019, 12:55 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2019, 12:55 PM
awight set the point value for this task to 8.Sep 17 2019, 1:06 PM
awight moved this task from Backlog to Ready for pickup on the Reference Previews board.

I confirmed with WMF Analytics that we can package these queries as a directory in reportupdater-queries, which will update TSV reports with daily granularity.

Note that I'm taking the phrase "summary statistics" from the task description and running with it. I'm avoiding the grafana dashboard because I think we can get what we need with far less complexity.

Change 542419 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/reportupdater-queries@master] New report for Reference Previews

https://gerrit.wikimedia.org/r/542419

awight updated the task description. (Show Details)Oct 22 2019, 8:43 AM

I learned that nobody else exports from Reportupdater to Graphite, they visualize using Dashiki instead (see below). Consider implementing our graphs there.

15:09 < nuria> awight: the other way would be dashiki  for things like: https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os
15:10 < nuria> awight:  reportupdater is mostly visualized with dashiki:  
               https://language-reportcard.wmflabs.org/#projects=ptwiki,idwiki,eswiki,viwiki,ukwiki,dewiki,trwiki,ruwiki,frwiki/metrics=Content%20Translation
15:10 < nuria> awight: for public dashboards

Change 542419 merged by Mforns:
[analytics/reportupdater-queries@master] New reports for Reference Previews

https://gerrit.wikimedia.org/r/542419

Change 547715 had a related patch set uploaded (by Awight; owner: Awight):
[operations/puppet@production] Install a cron job to produce Reference Previews metrics

https://gerrit.wikimedia.org/r/547715

Change 547715 merged by Elukey:
[operations/puppet@production] report updater job: produce Reference Previews metrics

https://gerrit.wikimedia.org/r/547715

awight claimed this task.Nov 6 2019, 12:48 PM
awight moved this task from Sprint Backlog to Doing on the WMDE-QWERTY-Sprint-2019-11-06 board.
awight changed the point value for this task from 8 to 5.Nov 6 2019, 1:13 PM
awight added a comment.Nov 7 2019, 9:27 AM

I'm debugging why baseline data isn't landing in Grafana yet. We're averaging c. 1.5 events/second on the baseline topic, which should come out to about 140k events per day. That was what we were seeing until Nov 1st, at which point the counts dropped to a few dozen.

awight added a comment.Nov 7 2019, 9:34 AM

I'm debugging why baseline data isn't landing in Grafana yet. We're averaging c. 1.5 events/second on the baseline topic, which should come out to about 140k events per day. That was what we were seeing until Nov 1st, at which point the counts dropped to a few dozen.

Oh. It's something simple in my query, because I see plenty of data in Hadoop.

Change 549437 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/reportupdater-queries@master] Fix another nonexistent field

https://gerrit.wikimedia.org/r/549437

Change 549437 merged by Mforns:
[analytics/reportupdater-queries@master] Fix another nonexistent field

https://gerrit.wikimedia.org/r/549437

The dashboard is ready for review. One think I feel uneasy about was my decision to calculate the references viewed for RP like this:

references views per pageview = reference popups rendered per pageview + footnote clicks per pageview.

The two terms on the right-hand side are rates calculated from different sources, so adding them together does ugly things to the error margins and has other mathematical implications that I don't fully understand.

Another thing that bothers me is that the pageview count is dramatically different between the Cite and Popups metrics streams, by a factor of about 2.5x as many pageviews recorded from Popups, across various wikis. This was unexpected, I would have thought the ratio would be close to 1:1 on wikis with Popups enabled, and overall Popups would be lower because many wikis don't have it enabled. My assumptions were wrong, and I haven't been able to explain the difference. Maybe the Popups tracking code is executed multiple times, in which case our "per pageview" stats are much lower than they should be.

This is concerning enough that I need to reopen the child task.