Page MenuHomePhabricator

Most common Wikipedia Preview clickthrough path on computers not instrumented
Closed, ResolvedPublic

Description

Background

I just revisited T313273 because I kept suspecting that the problem I found might not fully explain the weird data.

It turns out we still see the weirdly imbalanced data even when we factor out diff.wikimedia.org and wikimediafoundation.org, which are the only ones that would suffer significantly with the bug in tracking non-mainspace previews. The implied clickthrough rates are much more plausible than before, but it's still strange that they're so different for touch and non-touch devices.

device_typepreviewspageviewsimplied_clickthrough_rate
touch74112016%
non-touch127921020.80%

(Numbers are from the past 4 weeks, filtering out the Wikimedia sites and known test sites.)

This effect persists even when you look at individual sites:

device_type ➡non-touchnon-touchtouchtouch
website ⬇previewspageviewspreviewspageviews
framablog.org10587316
lumion.pl65204624
stehn-online.de2071383
swa.co.id2254142
xpressenglish.com6874556

Problem

I'm pretty sure I know why this is happening. When you encounter a Wikipedia Preview–enabled link on mobile, you essentially have only one option for clicking through to Wikipedia: clicking on the "read more on Wikipedia" link in the preview, which results in a pageview properly tagged with our wprov tag.

However, when you encounter a Wikipedia Preview–enabled link on desktop, you have two options for clicking through: click on the link itself or click on the "read more on Wikipedia" link after hovering. The first is much easier, even if you've opened and read the preview—your cursor is already hovered on that link. But that route means you follow a URL without the wprov tag!

So the source of the weird data is probably that we are failing to record most of the pageviews on non-touch devices.

Solution

The best solution is probably for the library to add the wprov tag to the in-page link once a pop-up opens (this ensures that we don't count clicks on Wikipedia links that don't have Wikipedia Preview enabled or when the user clicks the link without waiting for the pop-up to open, although these are probably pretty minor issues).

Alongside this, I recommend we create a new "version" of the tag (e.g. wppw2) so we can track which sites have rolled out this and previous instrumentation fixes. Note, however, that this will require updating the ETL job to capture the instrumentation version, which requires an annoying amount of work.

Event Timeline

@nshahquinn-wmf do you think the solution proposed above may need legal approval?

@SBisson I'm confident it doesn't need approval. It's just a minor tweak to our instrumentation and doesn't change the scope of our data collection.

SBisson triaged this task as Medium priority.Jan 26 2023, 6:25 PM
SBisson moved this task from Definition to Ready for Dev on the Inuka-Team (Kanban) board.

Alongside this, I recommend we create a new "version" of the tag (e.g. wppw2) so we can track which sites have rolled out this and previous instrumentation fixes. Note, however, that this will require updating the ETL job to capture the instrumentation version, which requires an annoying amount of work.

@nshahquinn-wmf are you suggesting that we completely replace wppw1 and wppw1t with wppw2 and wppw2t everywhere or we just use wppw2 for that special case where we alter the url of the link on the page?

@nshahquinn-wmf are you suggesting that we completely replace wppw1 and wppw1t with wppw2 and wppw2t everywhere or we just use wppw2 for that special case where we alter the url of the link on the page?

The first one, just completely changing the 1 to a 2 in all the wprov values. That way, we can extract an "instrumentation version" field and know whether a particularly set of stats are affected by this issue or not.

hueitan subscribed.

✅ add the wprov tag to the in-page link once a pop-up opens
✅ new "version" of the tag (e.g. wppw2)