Page MenuHomePhabricator

Events being lost in Chrome when navigating to an external URL
Closed, ResolvedPublic

Description

Chrome seems to be dropping events triggered on external links. This only seems to be a problem in Chrome. IT does not occur in Firefox.

replication steps

  1. Visit https://en.wikipedia.org in Chrome
  1. Run the following code in the console to add a link to the top of the page that triggers an event:
$('<a>').attr('href', 'https://wikimedia.org').on( 'click', function () { mw.eventLog.logEvent({ name: 'test' }) }).text('Click me!').prependTo('#bodyContent')
  1. Make sure the network tab preserves events.
  2. Click the link.

Expected: At some point the event an event should show up in the network tab.
Actual: No event ever shows

Note: If I change the URL to be relative this problem does not occur (in fact usually it sends instantly):

$('<a>').attr('href', '/test').on( 'click', function () { mw.eventLog.logEvent({ name: 'test' }) }).text('Click me!').prependTo('#bodyContent')

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Hm, I see the event when I prevent default, as in:

$('<a>').attr('href', 'https://wikimedia.org').on( 'click', function (e) { mw.eventLog.logEvent({ name: 'test' }); e.preventDefault(); }).text('Click me with prevent!').prependTo('#bodyContent')

As well as when I open the link in a new tab, as in:

$('<a>').attr('href', 'https://wikimedia.org').attr('target', '_new').on( 'click', function () { mw.eventLog.logEvent({ name: 'test' }); }).text('Click me new tab!').prependTo('#bodyContent')

But, as you report, in Chrome, if I let the navigation happen, the event is never fired. In Firefox, it is. Clearly something is not quite right here:

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventLogging/+/404ee772b8d5035f293e27cf73c3f661a068b5f7/modules/ext.eventLogging/BackgroundQueue.js#60

This is tricky to debug because breakpoints don't fire in neither Firefox nor Chrome. But you can clearly see the event on the network tab in Firefox. I remember a longer discussion about this when we were working on the queue task, I'll have to get that context again.

But, as you report, in Chrome, if I let the navigation happen, the event is never fired. In Firefox, it is. Clearly something is not quite right here:
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventLogging/+/404ee772b8d5035f293e27cf73c3f661a068b5f7/modules/ext.eventLogging/BackgroundQueue.js#60

This part seems ok actually, it doesn't seem like this is opting Chrome out of anything.

I tested it in Vagrant real quick and it looks like when reproducing the bug, Chrome is not firing visiblitychange or pagehide or unload, but it will fire beforeunload if you modify the EventLogging code (change this line to use beforeunload and see), and will send the event as desired. Not that this is a good fix necessarily -- something weird is going on here for sure.

According to the Chrome Page Lifecycle API, beforeunload should be intermediate between visibilitychange and pagehide -- at least a visibilitychange event should have been fired, but it isn't.

By contrast, Firefox is flushing the queue on the pagehide event, and after that fires visibilitychange.

I know in the past this has been an issue between vendors -- "does a page become hidden before unloading?" I thought from Chrome's documentation that their stance was yes, it does become hidden (and hence there should be a visibilitychange event), but it looks like it isn't doing that.

The other possibility could be that it's somehow not allowing execution to complete on any event besides beforeunload. They do treat beforeunload with special care compared to the other events, but at the same time their usual line is that visibilitychange is the last reliable time to execute code.

Yet another thing to keep in mind is that the very act of binding a handler to unload or beforeunload makes small changes in the way the page lifecycle functions with regards to caching for back and forward navigation. I don't know if this has anything to do with what's going on, but removing all *unload events didn't make Chrome magically work or anything.

Could it be that both browsers might be sending the request , it is just only visible in FF? Given that request is sent when page transition is a relative url ii think this might be an option

I tried:

$('<a>').attr('href', 'https://wikimedia.org').on( 'click', function () {navigator.sendBeacon('http://localhost:8888')}).text('Click me!').prependTo('#bodyContent')

And definitely chrome sends request to port

I think it will be of use to test that before digging further, can @Jdlrobson override the endpoint for eventlogging and send requests to some localhost (something like nc -l localhost 8888 on mac)

@Nuria that example works fine in both Chrome and Firefox - the issue is inside mw.eventLog.logEvent not sendBeacon which suggests the issue is within the EventLogging handler per @jlinehan theory.

that example works fine in both Chrome and Firefox

Right, i know.

I was trying to suggest that you do what you are doing to now while overriding the sendbeacon endpoint for EL such we can rule out the case in which the request might be sent but not displayed on the network tab. Makes sense?

I tested it in Vagrant real quick and it looks like when reproducing the bug, Chrome is not firing visiblitychange or pagehide or unload

Only when moving to a different domain? cause relative (full page) links seem to work. Seems super odd, but of course, nothing is impossible

Ok, i did tested with netcat in vagrant and I see the event being sent on this case:

$('<a>').attr('href', 'https://wikimedia.org').on( 'click', function () { mw.eventLog.logEvent({ name: 'test' }) }).text('Click me!').prependTo('#bodyContent')

but the events (despite being sent) are *not logged* on the network tab on the "cross-domain" case. So this is an issue with the logging to the network tab I think, rather than with firing of events. Now, probably someone should also confirm my findings to make sure I am not totally off.

What i did:

[nuriaruiz@nurieta][/workplace/vagrant/mediawiki/extensions/EventLogging]$ git diff . modules/ext.eventLogging/core.js
diff --git a/modules/ext.eventLogging/core.js b/modules/ext.eventLogging/core.js
index e79a3c7..56aee62 100644
--- a/modules/ext.eventLogging/core.js
+++ b/modules/ext.eventLogging/core.js
@@ -140,7 +140,7 @@
                 */
                makeBeaconUrl: function ( data ) {
                        var queryString = encodeURIComponent( JSON.stringify( data ) );
-                   return config.baseUrl + '?' + queryString + ';';
+                 return 'http://localhost:5000' + '?' + queryString + ';';
                },

and

nc -k  -l localhost 5000

Now, probably someone should also confirm my findings to make sure I am not totally off.

Yes, good, good, was able to reproduce this. On closer inspection with the performance tools in Chrome, I was able to recover the pagehide, visibilitychange, and unload events firing, but they fire after the response for the new page being navigated to has been received, but before that new page is parsed or loaded. In the logging UI at least, it seems that these events occur after the context has switched to the new domain. That is, pagehide etc for page A are happening after the context has switched to page B.

The handler and HTTP request in beforeunload are happening as a child of the click event that causes the navigation, as has probably been coded as a special case in Chrome. beforeunload takes place even prior to the request for the navigation.

ic1.png (1×1 px, 438 KB)

The handlers from pagehide or visibilitychange are executed later, after the navigation request has received a response. Depending on how Chrome is wired, their handlers may be executing in the context of that new domain or in some intermediate state.

ic2.png (1×1 px, 404 KB)

It seems possible as @Nuria says, that they at least become the victims of a bug or gap in the Chrome network logging code. Clearly they must be firing at least for the localhost case, because I can see the event with netcat as @Nuria points out, and I believe they are firing even in the base case (see UPDATE, below).

UPDATE:
Logging into vagrant with vagrant ssh and inspecting the meagre apache2 logs sudo tail -f /vagrant/logs/apache2.log, it is possible to see a proxy error being generated at around the time that the event should be firing. No such log entry is generated when directing sendBeacon to POST to localhost. So it seems it is in fact able to hit the wire and at least a bug exists in Chrome's logging?

Jul 22 23:28:17 vagrant apache2[11852]: [proxy:error] [pid 11852] (111)Connection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:8100 (127.0.0.1) failed
Jul 22 23:28:17 vagrant apache2[11852]: [proxy_http:error] [pid 11852] [client 10.0.2.2:42308] AH01114: HTTP: failed to make connection to backend: 127.0.0.1, referer: http://dev.wiki.local.wmftest.net:8080/wiki/Main_Page

@Jdlrobson Please look at comment above: {T258513#6324815}. The event is being sent, but is is not logged in chrome network tab * in the case of the cross domain navigation*. Closing this ticket. You can verify this in your environment by changing the EL code (see changes on comment above) to post to a local server on the machine where you are testing.

That's reassuring to know. My understanding was @Mayakp.wiki was saying that she wasn't seeing events in the database itself that should have been sent. I've asked her to follow up today as I'm out sick.

@Jdlrobson Please do test overriding url and seeing whether event gets sent to a local port and let us know what you find either here or in irc (once you are well, of course)

Yeh I can confirm this works fine. It doesn't explain the dropped events @Mayakp.wiki was seeing though. I'm hoping however these were due to a miscommunication/misunderstanding.

It's a good learning opportunity and I'm going to advise my team not to rely on the network tab for testing these sorts of things in future and to instead focus on the database. If I learn more about the problem Maya was reporting I'll report back.

I'm going to advise my team not to rely on the network tab for testing these sorts of things in future and to instead focus on the database

invalid events will not make it to the DB. If things do not appear on network tab a local server is best to asses they have been sent. netcat is normally always installed.

On July 23 @Edtadros helped fire many events to external links on Testwiki to check on this issue and I can confirm that all events have made it to the database.
Posted initial results here
We will be checking on the test wikis listed here T258058 in the next 2 weeks.