EventLogging subscriber module in ready state but not sending tracked events
Closed, ResolvedPublic2 Story Points

Description

Despite being in a ready state, the ext.eventLogging.subscriber module isn't being executed and so events logged via the subscriber protocol aren't being logged.

Background

tl;dr: https://www.youtube.com/watch?v=KkgUhV4gvno

The Page Previews (PP) A/B test on enwiki defines three buckets. The on bucket contains 3% of all anonymous users. Those users receive the PP experience with its instrumentation turned on. While pairing on T175377: [Spike 4hrs] Verify EventLogging instrumentation/bucketing for the enwiki/dewiki A/B test, @Tbayer and I noticed that PP wasn't logging events on every pageview when I manipulated my way into the on bucket. There didn't seem to be any obvious differences between those pages where PP did and did not log events.

For those pages that did log events, we know that:

  • The ext.eventLogging.subscriber is in the ready state and loaded from Local Storage.
  • PP is trying to send events using the subscriber protocol.
    • We verified this by executing
mw.trackSubscribe( 'event.', ( topic, data ) => console.log( topic, JSON.stringify( data, null, 2 ) ) );

Browsers Affected

I've tested and confirmed this in:

  • Chrome (60.0.3112.113) on macOS Sierra (10.12.6)
  • Firefox (54.0.1) on macOS Sierra (10.12.6)

Hypotheses

  1. The ext.eventLogging.subscriber module script is being executed some time (a long time) after the PP module script.

@phuedx: mw.track stores messages in an in-memory data structure to guarantee that late subscribers are delivered all messages that have been sent by one or more producers.

QA Notes

NOTE: This task requires technical QA.
  1. Navigate to https://en.wikipedia.beta.wmflabs.org/wiki/Special:AllPages.
  2. Log out, if necessary.
  3. Ensure that you're in the on/control Page Previews A/B test bucket. Use the below handy snippet:
(function () {
	var experimentGroupSize = mw.config.get( 'wgPopupsAnonsExperimentalGroupSize' ),
		control = experimentGroupSize / 2;

	console.log(mw.experiments.getBucket( {
		name: 'ext.Popups.visibility',
		enabled: true,
		buckets: {
			off: 1 - experimentGroupSize,
			control: control,
			on: control
		}
	}, mw.user.sessionId() ));
}());
  1. Observe that a pageLoaded Popups EL event is logged.
  2. Interact with a link
  3. Observe that another Popups EL event is logged.
  4. Navigate to another page.
  5. Observe that a pageLoaded Popups EL event is logged.
  6. Interact with a link.
  7. Observe that another Popups EL event is logged.

Repeat steps 7-10 a couple of times to prove that the subscriber is always being subscribed.

phuedx created this task.Sep 14 2017, 1:32 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 14 2017, 1:32 PM
Restricted Application added a project: Analytics. · View Herald TranscriptSep 14 2017, 1:32 PM
phuedx renamed this task from EventLogging subscriber in ready state but not sending tracked events to EventLogging subscriber module in ready state but not sending tracked events.Sep 14 2017, 1:33 PM
phuedx updated the task description. (Show Details)Sep 14 2017, 3:29 PM
phuedx updated the task description. (Show Details)Sep 14 2017, 4:01 PM

Just a quick note that I have been able to reproduce this in Chrome/Chromium 60 on Ubuntu Linux:

  1. Viewing https://de.wikipedia.org/wiki/Malaysia-Airlines-Flug_370 , events were being sent, including this one
  2. After clicking through to https://de.wikipedia.org/wiki/Golf_von_Thailand , no events are being sent from that page
  3. After following the link there to https://de.wikipedia.org/wiki/Nakhon_Si_Thammarat_(Provinz) , events were being sent again, e.g. this one
ovasileva triaged this task as High priority.Sep 19 2017, 4:34 PM
ovasileva moved this task from To Triage to Needs Analysis on the Readers-Web-Backlog board.
mpopov added a subscriber: mpopov.Sep 19 2017, 8:10 PM
phuedx claimed this task.Sep 21 2017, 12:13 PM

I'm going to take the time to pin down the root cause…

I can reproduce this in my local development environment very easily.

I suspect that this has been missed in development environments because developers have been end-to-end testing their EL implementation:

  • With ?debug=true set in the URL, which forces the ResourceLoader (RL) to fetch all modules from the server.
  • Whilst logged in.
  • With $wgResourceLoaderStorageEnabled = false; in LocalSettings.php.
    • Note well I can't reproduce this issue with the above in LocalSettings.php.

Additional notes:

  1. I can reproduce this issue with both jQuery@1.11.3 and jQuery@3.2.1.
phuedx added a subscriber: Krinkle.EditedSep 21 2017, 1:31 PM

I believe this is an instance of the late subscriber problem. If all of the page's resources have already been cached by the RL, then window.onload is fired before the subscriber in the ext.eventLogging.subscriber module is wired up. Since window.onload is an event rather than a state notification, the handler is never executed. This is backed up by [0].

Given the design of the module – that the logging of events is low priority and shouldn't impact the loading of resources – I would recommend changing from $( window ).one( "load", ... ) to mw.requestIdleCallback. Note well that this also avoids the Firefox-only issue of multiple load events being triggered by the BFCache (see T170018).

@Krinkle: Do you see any issue with this approach?


[0]
load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:140 JQMIGRATE: jQuery(window).on('load'...) called after load event occurred
migrateWarn @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:140
jQuery.event.add @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:147
(anonymous) @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:65
each @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:4
each @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:2
on @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:65
one @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:74
(anonymous) @ VM396:38
(anonymous) @ VM396:38
runScript @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:162
checkCssHandles @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:162
execute @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:163
implement @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:169
(anonymous) @ VM396:38
DOMEval @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:1
globalEval @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:4
(anonymous) @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:166
requestIdleCallback (async)
asyncEval @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:166
work @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:167
enqueue @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:164
load @ load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=0xrpoun:171
(anonymous) @ European_mink:10
startUp @ load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector:26
script.onload.script.onreadystatechange @ load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector:27

Change 379532 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/EventLogging@master] subscriber: Wait until browser idle before logging

https://gerrit.wikimedia.org/r/379532

@ovasileva: FYI I believe that I've figured out the problem here and have submitted a suggested fix (see the above). I'd appreciate a review from a member of the team but would also like to wait until we've received feedback from @Krinkle (or someone who understands the ResourceLoader module loading process better than I do! 🙂 ).

phuedx set the point value for this task to 2.

Large 1 or a small 2, mostly because of the risk involved in changing the way in which the subscriber in ext.eventLogging.subscriber is subscribed.

fdans moved this task from Incoming to Radar on the Analytics board.Sep 21 2017, 4:23 PM
fdans edited projects, added Analytics-Kanban; removed Analytics.

Change 378804 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/EventLogging@master] ext.eventLogging.subscriber: Don't wait for "load" if it already happened

https://gerrit.wikimedia.org/r/378804

Nuria moved this task from Next Up to Done on the Analytics-Kanban board.Sep 21 2017, 10:17 PM

Change 379532 abandoned by Phuedx:
subscriber: Wait until browser idle before logging

Reason:
A more performant fix is here: I774ea688

https://gerrit.wikimedia.org/r/379532

Change 378804 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] ext.eventLogging.subscriber: Don't wait for "load" if it already happened

https://gerrit.wikimedia.org/r/378804

phuedx updated the task description. (Show Details)Sep 22 2017, 9:29 AM

@bmansurov @pmiazga: This task requires technical QA /cc @ovasileva.

@bmansurov @pmiazga: This task requires technical QA /cc @ovasileva.

I'm onto it.

bmansurov updated the task description. (Show Details)Sep 22 2017, 4:43 PM
bmansurov updated the task description. (Show Details)

Macro votecat:  Baha©®™ approves
(Shame that meme wasn't available in a smaller size)

phuedx added a comment.EditedSep 22 2017, 5:08 PM

Wow… You might want to make that image a little smaller, @bmansurov!

Let's leave this open until Thursday, 28th so that we can verify this in production.

phuedx closed this task as Resolved.Sep 29 2017, 9:40 AM

🎉🎉🎉

Since the Page Previews A/B test has been disabled, I tested this with the ReadingDepth instrumentation. When I'd bucketed myself so that the instrumentation was enabled, I observed both pageLoaded and pageUnloaded ReadingDepth events being logged on all pages that I navigated to.

Tbayer added a comment.Oct 4 2017, 5:35 PM

Great work! Can someone summarize which schemas were affected (as far as known)?

Also, can we spell out in more detail how (if at all) the fix changes the logging method for those pageviews in a session that had not been affected by the bug?

Can someone summarize which schemas were affected (as far as known)?

At the very least, all of the instrumentation that Readers Web has implemented. Note well that this only affected the client-side instrumentation, so, for example, disabled Popups events for logged in users weren't affected. We can do an audit of other instrumentation implemented in the WikimediaEvents, if you'd like?

Also, can we spell out in more detail how (if at all) the fix changes the logging method for those pageviews in a session that had not been affected by the bug?

The fix will have no effect on the logging method for those pageviews unaffected by the bug.

Tbayer added a comment.Oct 6 2017, 7:21 PM

Can someone summarize which schemas were affected (as far as known)?

At the very least, all of the instrumentation that Readers Web has implemented. Note well that this only affected the client-side instrumentation, so, for example, disabled Popups events for logged in users weren't affected. We can do an audit of other instrumentation implemented in the WikimediaEvents, if you'd like?

That sounds very useful, yes. (We could check schemas appearing in these search results as well on whether they used WikimediaEvents.). It might also be valuable to post a heads-up on Analytics-l. - For our current work in Readers Web, I think the possible impact on Schema:Print and Schema:RelatedArticles is particularly relevant. I know @ovasileva and others are already aware of this, but I'll leave a note on the talk pages of these schemas too, for the record.

It might also be valuable to post a heads-up on Analytics-l.

Done.