Understand current referencing behavior as baseline for ReferencePreviews
Closed, ResolvedPublic5 Estimated Story Points
Actions

Description

Motivation
This ticket is for creating a baseline we can compare ReferencePreviews data against. Maybe this data already exists :)

Acceptance Criteria

How often do people click on footnote indicators relative to the pages being opened? E.g. On average, there were 0.03 clicks on a footnote indicator per page opened where Reference Previews was NOT deployed.
How often do people click on a link in the reference if there is no reference previews enabled? (We should compare that number with the sum of clicks on links in the references pop up and clicks on links in the references section with beta feature enabled)

Preliminary outcome

For the first day of data, with N=13,570 we have:

0.006 footnote clicks per pageview.
0.003 reference content clicks per pageview

Details

	Subject	Repo	Branch	Lines +/-
	New reports for Reference Previews	analytics/reportupdater-queries	master	+206 -0
	Baseline reference interaction tracking	mediawiki/extensions/Cite	master	+65 -4

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T233108 Basic dashboards for Reference Previews tracking
Resolved	thiemowmde	T234605 Investigate if Cite ResourceLoader module for logging should be merged
Resolved	thiemowmde	T231529 Understand current referencing behavior as baseline for ReferencePreviews

Event Timeline

• Lea_WMDE created this task.Aug 29 2019, 9:31 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 29 2019, 9:31 AM

• Lea_WMDE mentioned this in T214493: Track interaction with ReferencePreviews.Aug 29 2019, 9:31 AM

• Lea_WMDE moved this task from Backlog to In preparation on the Reference Previews board.

Should probably use EventLogging as the backend. Sampling might be necessary to keep the volume manageable.

Since tracking outbound links will slow down the user's navigation, especially sample there. We have to wait until our metrics callback completes.

Look for precedents. Who do we ask? WMF Reading? Analytics?

We also need to track outbound clicks in T214493, so perhaps we split that feature out.

We're probably going to implement this in the Cite extension.

awight moved this task from In preparation to Ready for pickup on the Reference Previews board.Sep 17 2019, 12:44 PM

awight added a parent task: T233108: Basic dashboards for Reference Previews tracking.Sep 17 2019, 12:56 PM

awight added a project: WMDE-QWERTY-Sprint-2019-09-10.Sep 18 2019, 7:46 AM

In T231529#5499042, @awight wrote:

Since tracking outbound links will slow down the user's navigation, especially sample there. We have to wait until our metrics callback completes.

It turns out there is an industry-standard thing for this: https://developer.mozilla.org/en-US/docs/Web/API/Beacon_API

EventLogging supports beacons and Popups provides a nice wrapper. To illustrate:

Popups/src/getPageviewTracker.js:  const url = evLog.makeBeaconUrl( payload );
Popups/src/getPageviewTracker.js:  sendBeacon( url );

The trade-off is that our payload is delivered in the URL itself, by PUT rather than POST, so our schema and their values will have to fit into less than 2 000 chars.

Thiemo found this relevant deprecation task for the Popups instrumentation: T193051: Remove all page previews instrumentation code

There's a place in Schema:Popups for "reference" events, but it's unused, here's a breakdown of actual values:

select event_previewType, count(*) from Popups_16364296 group by event_previewType;
+-------------------+-----------+
| event_previewType | count(*)  |
+-------------------+-----------+
| NULL              | 291380619 |
| generic           |     60583 |
| page              |  19836700 |
+-------------------+-----------+

awight mentioned this in T193051: Remove all page previews instrumentation code.Sep 20 2019, 9:29 AM

EventLogging doesn't support a guaranteed-beacon mode, i.e. browsers without beacon support will execute <img> fallback logic which does block the page unload when clicking on an outbound link. I would prefer to skip non-beacon browsers, we should discuss as a team.

Meanwhile, I ran a query to estimate how many non-beacon browsers we see in eventlogging data:

select
    count(*),
    http_method
from webrequest
where
    uri_path = '/beacon/event'
    and webrequest_source = 'text'
    and year = 2019
    and month = 9
    and day = 19
group by
    http_method;

682     HEAD
8154329 GET
109671810       POST

8154329 / (8154329 + 109671810)
= .069

Crudely conflating eventlogging numbers with unique users, this would mean that c. 7% of our users don't have beacon support. This is a big number, we need to either run the fallback code for them, or create a new metric to compensate, for example by logging "no beacon" on page load.

Here's a draft schema which should cover our needs for both the baseline and ReferencePreviews metrics:
https://meta.wikimedia.org/wiki/Schema:ReferencePreviews

I'll try a naive implementation in Cite.

Change 538261 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/Cite@master] [WIP] Baseline reference interaction tracking

https://gerrit.wikimedia.org/r/538261

gerritbot added a project: Patch-For-Review.Sep 20 2019, 1:08 PM

awight claimed this task.Sep 23 2019, 10:10 AM

awight moved this task from Doing to Review on the WMDE-QWERTY-Sprint-2019-09-10 board.Sep 23 2019, 2:23 PM

awight moved this task from Ready for pickup to Doing on the Reference Previews board.

awight updated the task description. (Show Details)Sep 24 2019, 1:57 PM

awight added a project: WMDE-QWERTY-Sprint-2019-09-25.Sep 25 2019, 12:59 PM

thiemowmde moved this task from Sprint Backlog to Review on the WMDE-QWERTY-Sprint-2019-09-25 board.Sep 25 2019, 1:00 PM

awight removed awight as the assignee of this task.Sep 25 2019, 3:13 PM

Waiting on code review—although we seem to be stuck in an "impossible problems of antiquity" loop, I'm really not sure how to respond to recent reviews.
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Cite/+/538261/

Change 538261 merged by jenkins-bot:
[mediawiki/extensions/Cite@master] Baseline reference interaction tracking

https://gerrit.wikimedia.org/r/538261

awight mentioned this in rECITc12150082c4f: Baseline reference interaction tracking.Oct 4 2019, 9:45 AM

thiemowmde mentioned this in T234605: Investigate if Cite ResourceLoader module for logging should be merged.Oct 4 2019, 9:48 AM

thiemowmde added a parent task: T234605: Investigate if Cite ResourceLoader module for logging should be merged.

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.1; 2019-10-08).Oct 4 2019, 10:00 AM

Maintenance_bot removed a project: Patch-For-Review.Oct 4 2019, 10:10 AM

awight moved this task from Review to Done on the WMDE-QWERTY-Sprint-2019-09-25 board.Oct 7 2019, 8:40 AM

awight moved this task from Done to Watching on the WMDE-QWERTY-Sprint-2019-09-25 board.Oct 7 2019, 10:17 AM

awight claimed this task.Oct 7 2019, 10:22 AM

This will be deployed with T214493 in this week's train, so we should monitor client metrics and check initial data for health.

thiemowmde added a project: WMDE-QWERTY-Sprint-2019-10-09.Oct 9 2019, 2:10 PM

thiemowmde moved this task from Sprint Backlog to Watching on the WMDE-QWERTY-Sprint-2019-10-09 board.Oct 9 2019, 2:10 PM

We have a few hours of ReferencePreviewsBaseline in the hadoop event store, shaped like this:

select
  event.action as action,
  count(*)
from referencepreviewsbaseline
where
  year=2019
  and month=10
group by event.action;

action  _c1
pageview        194

That represents 194,000 pageviews, with few enough (roughly speaking, < 0.5%) reference interactions that none have been logged yet. All had referencePreviewsEnabled = false, unsurprisingly.

Everything looks good, let's wait for group2 deployment before increasing the sampling.

Looks like this sampling rate will be fine, we have enough data to start guessing:

clickedFootnote 83
clickedReferenceContentLink     44
pageview        13571

We even caught one person with reference previews enabled:

false   13697
true    1

I don't know how to estimate error margins, so I'll just present the naïve math with our small sample. I've removed the one sample with referencepageviews=true (it's just a pageview).

83 / 13 570 = 0.006 footnote clicks per pageview.
44 / 13 570 = 0.003 reference content clicks per pageview

awight updated the task description. (Show Details)Oct 11 2019, 8:39 AM

awight moved this task from Watching to Done on the WMDE-QWERTY-Sprint-2019-10-09 board.Oct 11 2019, 8:55 AM

Change 542419 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/reportupdater-queries@master] New report for Reference Previews

https://gerrit.wikimedia.org/r/542419

gerritbot added a project: Patch-For-Review.Oct 12 2019, 9:57 PM

We should slice this data by wiki, since reference usage probably varies between projects. If this is the case, then we would have to normalize or otherwise consider the site-specific baseline when measuring the impact of Reference Previews.

Slicing using reportupdater's explode_by does what we want, but doesn't come for free: the query has to be run for every wiki. My first reaction is that we should choose a small number of wikis to include in our analysis, for now.

In T231529#5570375, @awight wrote:

Slicing using reportupdater's explode_by does what we want, but doesn't come for free: the query has to be run for every wiki.

I'm moping about this... We could also tack a "group by" onto the query, but I don't think the results can be easily mapped for reportupdater.

Well. There is indeed a huge variation between wikis, at least one order of magnitude. I won't paste the summary yet, this feels like something that should be treated more carefully and the math should be right before publishing.

For our goals, what this means is that we do need to analyze the impact of Reference Previews on each wiki independently.

Change 542419 merged by Mforns:
[analytics/reportupdater-queries@master] New reports for Reference Previews

https://gerrit.wikimedia.org/r/542419

awight mentioned this in rARPQ6c94f07eb6cd: New reports for Reference Previews.Oct 31 2019, 3:58 PM

Maintenance_bot removed a project: Patch-For-Review.Oct 31 2019, 4:10 PM

thiemowmde added a project: WMDE-TechWish-Maintenance.Jan 19 2022, 11:23 AM

thiemowmde moved this task from Incoming to Analytics on the WMDE-TechWish-Maintenance board.Jan 20 2022, 3:33 PM

Removing task assignee due to inactivity, as this open task has been assigned for more than two years. See the email sent to the task assignee on February 06th 2022 (and T295729).

Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.

If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".

Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.

thiemowmde closed this task as Resolved.Mar 14 2022, 9:49 AM

thiemowmde claimed this task.

Understand current referencing behavior as baseline for ReferencePreviewsClosed, ResolvedPublic5 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Understand current referencing behavior as baseline for ReferencePreviews
Closed, ResolvedPublic5 Estimated Story Points
Actions

Related Objects
Search...