Page MenuHomePhabricator

ReadingDepth sometimes initialises before PageIssues leading to incorrect or missing ReadingDepth events
Closed, ResolvedPublic5 Estimated Story Points

Description

In T191532#4575856 @Ryasmeen noticed that on iOS Safari, a ReadingDepth event can be sent without issues-a_sample or issues-b_sample set.

A sample event is:

{
"event": {
"pageTitle": "Misandry",
"namespaceId": 0,
"skin": "minerva",
"isAnon": false,
"pageToken": "b3533106e2ca571cfe0e",
"sessionToken": "9a570f564eb6d4932fc6",
"action": "pageLoaded",
"domInteractiveTime": 2711,
"default_sample": true
},
"revision": 18201205,
"schema": "ReadingDepth",
"webHost": "reading-web-staging.wmflabs.org",
"wiki": "wiki"
}

Replication steps

Set the following

$wgWMEReadingDepthSamplingRate = 1; $wgMinervaABSamplingRate = 1;

Load a page with issues.
Inspect the ReadingDepth event and check that 2 sample fields are set (default and a or b)

Replication steps (QA steps)

Make sure configured like so:

$wgWMEReadingDepthSamplingRate = 1;
$wgMinervaABSamplingRate = 1;

Repeat with the following config (ask a developer to do this!):

$wgWMEReadingDepthSamplingRate = 1;
$wgMinervaABSamplingRate = 0;

Expected: ReadingDepth events will be sent with 1 sample field
Actual: NO ReadingDepth events are sent

Developer notes

This is a race condition. It appears possible if ReadingDepth is enabled in WikimediaEvents before the Minerva Page issues test.

It can be mitigated by delaying the pageLoaded event, however it's impossible to fix this without introducing a hard dependency on WikimediaEvents to Minerva's module or vice versa.

Maybe loading it during document.ready could help?

The user doesn't get opted into the ReadingDepth schema (e.g. the skin code needs to load before ext.wikimediaEvents). I think this is the problem.

The only way to remedy this is to make sure on Minerva, the Minerva entry point skins.minerva.scripts is loaded before initialising ReadingDepth (mw.trackSubscribe calls before a mw.track event will always be honored). We have to be careful that this does not impact Vector.

I've written a patch to deal with this situation.

Acceptance criteria / testing

  • ReadingDepth is working on the Vector skin
  • When ReadingDepth is disabled I see ReadingDepth events where bucket sample field is set.
  • When both ReadingDepth and wgMinervaABSamplingRate is enabled to 100% of users I see ReadingDepth events where bucket and default sample fields are set

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.

Sorry. Creating subtasks copies across the projects by default.

Jdlrobson renamed this task from In iOS Safari ReadingDepth for events in the default sample, it is not specified whether the event is also in the A/B test sample to ReadingDepth sometimes initialises before PageIssues leading to incorrect or missing ReadingDepth events.Sep 13 2018, 9:38 PM

Change 460444 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/WikimediaEvents@master] Ensure Minerva has initialised before loading and executing ReadingDepth

https://gerrit.wikimedia.org/r/460444

Change 461245 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Minerva is responsible for turning on WikimediaEvents itself

https://gerrit.wikimedia.org/r/461245

Change 461246 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/WikimediaEvents@master] Minerva will load WikimediaEvents itself

https://gerrit.wikimedia.org/r/461246

Change 460444 abandoned by Jdlrobson:
Ensure Minerva has initialised before loading and executing ReadingDepth

Reason:
We will take the other approach

https://gerrit.wikimedia.org/r/460444

Change 461246 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Minerva will load WikimediaEvents itself

https://gerrit.wikimedia.org/r/461246

Change 461245 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Minerva is responsible for turning on WikimediaEvents itself

https://gerrit.wikimedia.org/r/461245

Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)
Jdlrobson edited projects, added Product-QA; removed Patch-For-Review.

The code is live on staging
http://reading-web-staging.wmflabs.org/wiki/Pharmacovigilance?debug=true
Config is set to

$wgWMEReadingDepthSamplingRate = 1;
$wgMinervaABSamplingRate = 1;

@Jdlrobson: This issue is still happening for me on staging.

Ohhh whoops i didnt pull the wikimedia events patch to staging... fixing!!

Alright! Checked with multiple articles on staging with iOS Safari. Both issues-a_sample or issues-b_sample are correctly being set now.

[mediawiki/extensions/WikimediaEvents@master] Minerva will load WikimediaEvents itself
https://gerrit.wikimedia.org/r/461246

This seems rather messy to me and constitutes technical debt that will likely lead to bugs later on. Note that I am not disagreeing with this (temporary) direction. Perhaps it is "worth it" in the short term. The problems are:

  • This is a radical change to a component that many teams collaborate on. It is not owned by Reading. I expect that when someone considers making such a radical change, they perhaps discuss it first with others, given this is also incurring costs on them.
  • Where are the notes describing the benefit? In other words, do we know why X was too expensive to get Y quickly done?
  • Where is expected consequence and cost ("interest") of this debt, and is there a timeline for paying it off?

A few examples to think about:

  • Confusion/Debugging: The loading and executing of these campaigns now works differently.
  • Performance: Additional http requests with (probably) less effective compression.
  • Compatibility: Campaigns from other products and teams that used to load early during a page view well before the load event, now load as the last thing on the page, 2-3 seconds after the load event fires. I don't see an indication that compatibility here was considered. Nor whether it is expected to impact other team's ability to capture data from page views as effective as before.

Perhaps everything was considered and worked out fine, and it's just missing communication, in which case I'd welcome some clarification :)

Thanks!

zeljkofilipin raised the priority of this task from High to Unbreak Now!.Sep 24 2018, 11:05 AM
zeljkofilipin subscribed.

Changing priority to UBN since it's blocking the train this week.

Confused: why is this blocking this week's train? There is a (suboptimal/temporary/something) fix in place and merged. Is it just that this fix needs to make the train (it will if it's merged before tomorrow morning SF time)? Or does this need a backport to last week's deployed version?

Perhaps everything was considered and worked out fine, and it's just missing communication, in which case I'd welcome some clarification :)

I can't comment on the bigger picture but I did want to apologize for any confusion this patch caused.

From my perspective, we ended up with two approaches. The first patch hardcoded a Minerva ResourceLoader module and skin name into WikimediaEvents, and actually loaded Minerva from WikimediaEvents, and was +1'd by you. The second patch still hardcoded the skin name but not the module and conditionalized whether WikimediaEvents itself was loaded so I merged it. I apologize that this happened with insufficient discussion.

The key point here is the technical debt delivered to other teams, and the very signifiant performance and compatibility impact on code. The impact is actually worse than I feared. Below arrows point to the previous point (before this patch) and current for when WikimediaEvents JS executes.

export.png (1×2 px, 358 KB)

This can't go to production (hence blocking the train).

I'd recommend to revert first, and then address the below. It can always be backported if done before next week.

From my perspective, we ended up with two approaches. The first patch hardcoded a Minerva ResourceLoader module and skin name into WikimediaEvents, and actually loaded Minerva from WikimediaEvents, and was +1'd by you.

Yep, this first patch made no significant modification to the WikimediaEvents code, nor the loading of its code. The change just adds a few lines of code that conditionally lazy-loads some module defined in Minerva. LGTM.

I +1'ed it because I had been doing a performance review on the initial version of this and wanted to express my support. This patch wouldn't have needed my approval as it doesn't affect code besides your own, and doesn't significantly alter the performance characteristics of the extension, either.

The second patch still hardcoded the skin name but not the module and conditionalized whether WikimediaEvents itself was loaded so I merged it.

From a technical perspective, this alternate patch has no relation to the first. The change disables the load procedure of the WikimediaEvents extension as a whole, with another patch in Minerva to lazy-load it at run-time. That is a significant change, with the impact shown in the above screen capture.

I don't currently understand the problem outlined in this task, but the presumably intended outcome of changing ReadingDepth.js to execute after Minerva scripts can be solved in any number of ways. If you'd like help in that area, let me know.

Change 462536 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/WikimediaEvents@master] Revert "Minerva will load WikimediaEvents itself"

https://gerrit.wikimedia.org/r/462536

We can certainly revert it and talk about it some more.

The problem we are trying to solve is load order of ext.wikimedia.events and skins.minerva.scripts (the latter needs to load first). We want to do so in a way that doesn't add a hard dependency from Minerva to ext.wikimedia.events.

From my understanding, the original patch had the exact same problem - it made a mw.loader.using call to ensure Minerva's code loaded first, however it was scoped to just the ReadingDepth schema - possibly where the second approach is more controversial.

Change 462536 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Revert "Minerva will load WikimediaEvents itself"

https://gerrit.wikimedia.org/r/462536

Jdlrobson lowered the priority of this task from Unbreak Now! to Medium.Sep 24 2018, 9:52 PM

Okay, so I've merged the revert to stop this being a deployment blocker. I now want to focus on getting this problem fixed.
Let's start with a quick recap of the two approaches that were proposed here:

Approach 1: Ensure Minerva has initialised before loading and executing ReadingDepth (https://gerrit.wikimedia.org/r/460444 )
This original proposal changed ext.wikimediaEvents.readingDepth.js (a file inside ext.wikimediaEvents.events) to ensure skins.minerva.scripts loaded BEFORE ReadingDepth was enabled and the pageLoaded event sent.

Given both skins.minerva.scripts and ext.wikimediaEvents.events were already loaded in the page , this only changed how ReadingDepth was initialised. The code was already queued to load, it just switched the order.

The downsides of this approach was added complexity to ext.wikimediaEvents.readingDepth.js - the team when I pitched them this idea, did not like the way loading differed between skins.

A plus side of this approach was that is only impacting ReadingDepth

Approach 2: Minerva will load WikimediaEvents itself ( https://gerrit.wikimedia.org/r/462536 )
This was the merged approach.
This also ensured skins.minerva.scripts loaded BEFORE ReadingDepth was enabled and the pageLoaded event sent, however it did so with the added cost of delaying the running of this code to inside the first available mw.loader loop.

The team felt this was cleaner as they were worried that changes in Minerva (e.g. a rename of skins.minerva.scripts) might inadvertently/unexpectedly break this.

The difference with this approach was that ext.wikimediaEvents.events was not added to the page directly with addModules so will always be pulled down separately after the rest of the code/page has loaded. This change delays all event handling code inside ext.wikimediaEvents.events from running which may have unexpected effects. On hindsight, we might have limited the deferred to the ReadingDepth schema to minimise this performance impact.

next steps

I'm happy to restore approach 1 if that's non-controversial and now better accepted.
A modified approach of 2 which only touches ReadingDepth also appears feasible, if we are happy with there being an additional ResourceLoader module and delaying the ReadingDepth events from being
@Krinkle your expertise here would be appreciated. This is the only thing blocking us from running an A/B test so we're keen to find a solution.

Change 460444 restored by Jdlrobson:
Ensure Minerva has initialised before loading and executing ReadingDepth

https://gerrit.wikimedia.org/r/460444

(Thanks for the further explanation on blocking status, @Krinkle (and @Jdlrobson). Godspeed to you all on resolving! :) )

@Jdlrobson Thanks, makes sense to me. I'd prefer not to prescribe a solution, so I'll share some thoughts instead.

Approach 1 seems fine. I agree with you that it'd be nice to avoid having the same code (ReadingDepth.js) execute in one of two different ways (based on the current skin). But, I think Approach 2 would raise essentially the same concern (for WikimediaEvents as a whole). Ultimately, given the requirement "ReadingDepth.js should execute after a specific module in Minerva, but directly in all other skins", this might not be avoidable.

As for how execution differs in Approach 1, I'm not actually seeing the difference but maybe the patch changed? The current revision defers execution via mw.loader.using in all skins, both before/after. The only difference is that the list of modules to fetch and/or await, would now conditionally include one additional module to wait for ("minerva.scripts"), whereas the other modules (EL core and Schema) would be fetched still.

If you'd prefer to avoid referring to the Minerva module by name inside WikimediaEvents code, I can think of some alternatives:

  • The WikimediaEvents file could expose a public function that is called directly in non-minerva, but called by "minerva.scripts" otherwise. This would require to instead to refer WikimediaEvents function name from Minerva (similar to your Approach 2). To avoid a hard dependency, the Minerva code would check based on mw.loader.getState() whether WikimediaEvent is installed, and if present, await it via using() and then call the exported ReadingDepth function.
  • Or (similar to the above), WikimediaEvents/ReadingDepth could expose a public function, and Minerva would add a dependency between "minerva.scripts" and WikimediaEvents if WikimediaEvents is installed. This avoids the hard dependency, and keeps Minerva's JavaScript code simpler by not needing to use mw.loader. Instead it would only check if the ReadingDepth function is defined, and call it if so.

or how execution differs in Approach 1, I'm not actually seeing the difference but maybe the patch changed?

No you're right. The schema still needs to be loaded, so there is still JS to be loaded so no change there.

The WikimediaEvents file could expose a public function

This would still require some checking inside the client against Minerva, though wouldn't it, at which point it feels like approach 1 is cleaner ?
Say the public function is enableReadingDepth, we'd want to call that inside WikimediaEvents for all other skins.

Inside WikimediaEvents in a php hook somewhere

if ( $skin !== 'minerva' ) { $out->addModule('ext.wikimedia.events.readingDepthInit' ); }

Inside Minerva

mw.loader.using('ext.wikimedia.events.readingDepthInit' ).then( function () {
  mw.wikimediaEvents.initReadingDepth();
} );

I believe my first pass at this - https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaEvents/+/460444/ - although imperfect - is still the best solution to this problem.

Change 460444 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Ensure Minerva has initialised before loading and executing ReadingDepth

https://gerrit.wikimedia.org/r/460444

Change 463139 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/WikimediaEvents@wmf/1.32.0-wmf.23] Ensure Minerva has initialised before loading and executing ReadingDepth

https://gerrit.wikimedia.org/r/463139

Change 463139 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.32.0-wmf.23] Ensure Minerva has initialised before loading and executing ReadingDepth

https://gerrit.wikimedia.org/r/463139

Mentioned in SAL (#wikimedia-operations) [2018-09-26T23:23:44Z] <thcipriani@deploy1001> Synchronized php-1.32.0-wmf.23/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.readingDepth.js: SWAT: [[gerrit:463139|Ensure Minerva has initialised before loading and executing ReadingDepth]] T204144 (duration: 00m 57s)

I've swatted this change clearing out all the blockers for the A/B test. Nothing unusual with the rate of events but it's only live on mediawiki.org right now.

Note, it's possible that the number of events for ReadingDepth on Latvian will increase when the deploy happens tomorrow as anyone inside the PageIssues A/B test who has sendBeacon support is now guaranteed to also be in the ReadingDepth sample group.

Let me know if you see anything unusual with the deploy tomorrow.

In kickoff we determined that this could be closed based on the existing (code-related) AC, without an additional data analysis about the effects of this change.

No impact on ReadingDepth related to this.