Page MenuHomePhabricator

Page Previews could load less JS on pageload
Closed, ResolvedPublic3 Estimated Story Points

Description

Note that the following only affected loads where the modules aren't in cache yet, but that's a pretty common scenario.

Page Previews introduces an additional 18kb of compressed JS on pageload (a 6% increase). Upon inspection, it seems like the feature is loading all it needs on pageload, including a shared dependency for which it happens to be the only feature requiring them on pageload (mediawiki.template.mustache). Details can be found in T175916

There is an existing best practice found in Media Viewer and VisualEditor, where the only code loaded during the pageview is the minimum amount of logic necessary to capture the user interaction with the feature. Clicking triggers the loading of the JS required for the actual functionality. The same can be done with Page Previews for hovering.

Now, the counter-argument to the above thinking is that Page Previews are used a lot. I've looked at the actual numbers and it seems like 70% of measured sessions result in a link hover interaction (including abandoned hovers, but excluding people who have turned the feature off). Which is why our position is that the content could still benefit from being preloaded, albeit after everything else happening on pageload, to avoid delaying the page's visual progress.

We suggest that:

  • JS/CSS of Page Previews should be broken up in 2 parts: one as small as possible loaded on pageload, which captures link interaction

Then either/and:

  • On first link hover detected by the new "bootstrap" module, load the needed JS/CSS. Here the "floor" of hovercard delay can be leveraged to both do the JS/CSS on-demand loading as well as loading the hovercard content right after, once the logic for that is loaded.
  • Schedule a preload of the remaining Page Previews JS/CSS (the bulk of it, really) 1-2 seconds after pageload.

If both are implemented, the load-on-hover merely acts as a fallback if the really-low-priority preloading of the JS/CSS hasn't completed by the time the user hovers a link.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson lowered the priority of this task from Medium to Low.Oct 11 2017, 4:36 PM

We need more data here to reach a decision.
@Gilles you mention "extra payload affects visual completion/SpeedIndex noticeably" however I'm not aware of a regression when we launched this on all but 2 wikis.
Although I appreciate the theory here, the work involved here to schedule a preload is pretty high risk, so this is a low priority task for us without knowing how this feature negatively impacts these metrics.

Did you bother to look at this link I provided in my last comment?
https://phabricator.wikimedia.org/T175916#3608031 How can you be "unaware of a regression related to the launch" after looking at this?

Sorry, I probably should have thought a bit more about my communication there, rather than firing off a comment during a conversation with my team.

Yes. We expected an increase in fully loaded time and saw this when we launched page previews and we felt this was justified given the anticipated usage. It's clear visual completion/SpeedIndex also increased from your comment but these numbers also seem low.

What's missing is guidelines on what is considered a regression from the performance perspective. To me the increase appears very low - please correct me if I'm wrong - SpeedIndex is 89ms which is tiny; and more importantly (at least seemingly to me) RUM was not impacted in any way, so I'm struggling to see a pressing need for us to do something here.

If we are to do this work we need to know what is being aimed for as acceptable from your point of view and weigh that against what is acceptable from a UX perspective.

Even if we implement the changes suggested there is a trade off either in discoverability of the feature (what if a user hovers over a link and we've introduced a 2 seconds delay in the load?) or performance. Regardless implementing these changes is going to be risky - it requires changing the architecture of code loading for something already in production.

Hope that's more helpful, just trying to dig into this request as much as possible.

We're talking about a 6.3% increase in SpeedIndex, that's nothing to sneeze at. Which these arguments it takes 3 teams who each consider their feature to be a special snowflake to get close to a 20% regression. The problem isn't the amount per se, but the fact that we can do something about it. In logical terms the bulk of that module doesn't need to be loaded and executed at that point in time.

There should be information in the data you collect about how long it takes for people to hover their first link, which should inform how much we can afford to delay the preloading to get out of the way of the initial page render. We might not need to delay it that much to get out of the module parsing/execution dogpiling happening on initial page render.

If done well, this should be inconsequential for most, if not all users. Remember that you have a floor delay for triggering the hovercard, which can be extra buffer time to load the JS/CSS. I.e. if by the time you hit the main logic that was just downloaded, you realize that the preload happened on hover, you can deduct the time the preload took from the initial delay you would have otherwise applied. Of course that delay is already crowded with the data load for people with a slow internet connections, but the cross-section of users with very slow internet connections, modules out of cache, pointing to a link very fast is probably small.

In addition to being able to make a good guess right now with the existing data, we can also verify very tightly how many users would get an extra delay on the first link they hover once the changes are deployed and check if our assumptions are incorrect. Rolling back to the status quo would be very simple, you'd just load the preload module along with the bootstrap one and skip the new preloading logic.

It's likely that the SpeedIndex issue is caused not so much by the network load of the content, but by the parsing/execution on the main thread at the same time as all other modules. You can wait for T176262: Consider adding mw.loader.preload / OutputPage::preloadModules() to be implemented, which will aim to provide a new preloading option, where you can load a module but not execute it right away. But you will still need to split your content into 2 modules, one that will capture the user action (hovering a link) and another that has the bulk of the logic. They'd both get loaded over the network as early as possible, but only the link hover detection will be executed on load (i.e. the JS is run, so that the hover events are captured). Only on the first hover will you "unpack" the big module and execute its JS on the main thread, loading its code. This technique should satisfy everyone in this situation. The data would be sent over the wire at the same time it is now, the SpeedIndex regression should go away, and in terms of UX the extra delay should be in the milliseconds (executing the JS of the big module), which if you really want to be thorough could be absorbed in the floor delay currently set up for the hovercard data.

@Jdlrobson To clarify, we're not asking to change popups to load (most) JS on-demand on the first use. We're recommending to not load it during the critical rendering path. You can still fire off an mw.loader.load() by default, and (given your data about how likely users will eventually hover a link), I would in fact recommend that you do indeed preload the module by default - just not by using addModules(), but by loading it client-side from your to-be-created lightweight init module.

So instead of one large module that does everything and it being loaded in the critical path (including various dependencies and CSS with embedded image binaries which get synchronously parsed), instead have a separate dependency-free init module (like VE, MMV and others have) that just handles the hover event and the preloading. E.g. basically just on hover => loader.using(rest).then =>popups.handle(event), and also unconditionally preload the rest of the code with loader.load() from a dom-ready/mw.requestIdleCallback-delayed callback. Thus making the loader.using() in the hover handler just a no-op in the common case. You could even replace the hover handler once loaded to remove the no-op using() call if you're worried about promises in event handlers.

Change 384600 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/Popups@master] Remove popups from critical rendering path

https://gerrit.wikimedia.org/r/384600

including a shared dependency for which it happens to be the only feature requiring them on pageload (mediawiki.template.mustache)

Just a note, this is being loaded by RelatedArticles, albeit not on pageload. Loading it on page load should actually improves the performance of this feature.

We're recommending to not load it during the critical rendering path

To make sure we are talking about the same thing I wrote a POC:
https://gerrit.wikimedia.org/r/384600

A few concerns with this:

  • If localStorage is not available or exhausted, will this not introduce an additional HTTP request?
  • I'm not convinced if it is necessary to have it also handle the hover event. If that happens it suggests that the user is hovering over a link on a slow connection. Maybe in this situation it's not better to overwhelm bandwidth with another HTTP request. What's your thinking there?
  • Anything we need to be wary of in such a rename to avoid JS errors in production code?
  • We need to check what kind of delay this can introduce to the showing of a card (worst case/average case).

Alternatives:

  • I had a look at the code and we could shave about 10kb (uncompressed) if we defer the loading of the settings panel which most users will not make use of (hopefully). I'm not sure if that is worth pursuing, but it seems to account for around 10% of the code.

We should have good interaction data to know how quickly users interact with links from the event logging no?

Maybe we can do some tests with the POC?

@Krinkle @Gilles is https://gerrit.wikimedia.org/r/384600 what you had in mind? if so we can take it from here and do some investigations on the user impact.

Niedzielski changed the task status from Open to Stalled.Oct 31 2017, 5:59 PM

I'm marking this stalled until the spike is resolved. Please unstall if you disagree.

Change 406837 had a related patch set uploaded (by Niedzielski; owner: Sniedzielski):
[mediawiki/extensions/Popups@master] Chore: add bundlesize test

https://gerrit.wikimedia.org/r/406837

Change 406837 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: add bundlesize test

https://gerrit.wikimedia.org/r/406837

ovasileva changed the task status from Stalled to Open.Feb 21 2018, 6:18 PM

My understanding of T186016#3990080 is that the majority of hovers are occurring within 0.5-2 seconds. If I'm not mistaken @Gilles and @Krinkle the proposal would introduce a similar timed delay to load the code. We can however avoid this by adding complexity and queuing events for later. Does this data change your opinions on how we should deal with this task?

My understanding of T186016#3990080 is that the majority of hovers are occurring within 0.5-2 seconds.

That's not correct - perhaps the cumulative version of the chart makes it clearer: T186016#4002923

That's not correct - perhaps the cumulative version of the chart makes it clearer: T186016#4002923

cumulative version:

Time to first link interaction - cumulative frequency (previews enabled, enwiki+dewiki, Dec 21 2017-Feb 7, 2018).png (554×744 px, 53 KB)

If I read and eyeball this chart correctly, it means that:

  • 0.4 ratio (40th percentile) of first Page Preview interaction: 5s. – 40% of views have a hover within 5s, 60% hover after 5s.
  • 20th percentile: 2.3s. – 20% hover within 2.3s, 80% hover after 2.3s.
  • 10th percentile: 1.5s. – 10% hover within 1.5s, 90% hover after 1.5s.
  • 3rd percentile: 1.0s. – 3% hover within 1.0s, 97% hover after 1.0s.

In other words, for 97% of page views, there was at least 1 second between the page being fully loaded (html, css, js), and the user first hovering a link. That should be plenty of time to lazy-load the popups code, which could actually start before the page is fully loaded, giving it actually more than a second to load.

For the <3% of page views where a hover happens within a second of pageloaded, there is still a good chance the popups code will arrive in full before the first hover, given that 1) the code fetch can start and end before pageloaded, and 2) on repeat views, it'll come from local storage without a network roundtrip.

For the unknown percentage of page views where a hover happens, and the module wasn't in cache, and took more than (firstHover - moduleFetchStart) to load, the preview will still work. The only difference will be that it will take slightly longer to appear (time to preview), given it'll first wait for the already-underway module to arrive.

To ensure a roll-out with good telemetry, one could add a (temporary) metric to the Popups schema that flags views that happened at a point where the module was still loading. E.g. by having the early-init module that handles the hover determine whether the preload/lazy-load promise has settled yet.

The proposal above is essentially https://gerrit.wikimedia.org/r/#/c/384600/ with this added:

To ensure a roll-out with good telemetry, one could add a (temporary) metric to the Popups schema that flags views that happened at a point where the module was still loading. E.g. by having the early-init module that handles the hover determine whether the preload/lazy-load promise has settled yet.

This would require a fair bit of refactoring our side.

ovasileva raised the priority of this task from Medium to High.Mar 7 2018, 11:50 AM

@Jhernandez was right. The browser test RL module name also needed updating. However.. it's still complaining and now I'm not sure why and unable to replicate.

I've fixed it with some "pauses". The requestIdleCallback seemed to be causing havoc. Adding some delays seems to accommodate it. I've spent too much time trying to work this out (PS22 was the only green one) and I'm seeing diminishing returns with trying to make sense and document what's going on.

The latest patchset is green.
I have no idea why. If anyone wants to dig more be my guest, otherwise I say we move on! ;-)

Change 384600 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Remove popups from critical rendering path

https://gerrit.wikimedia.org/r/384600

Jdlrobson added a subscriber: ABorbaWMF.

@ABorbaWMF I've updating staging with the latest code and set it up so that it uses real world content
http://reading-web-staging.wmflabs.org/w/index.php?title=Spain

Can you please do a generic QA of page previews to verify this task hasn't broken anything? Thank you!

Tested along with T168392. Looks good to me on Staging.

Screen Shot 2018-03-15 at 12.36.00 PM.png (1×2 px, 1 MB)

Screen Shot 2018-03-15 at 1.03.50 PM.png (1×2 px, 1 MB)

Screen Shot 2018-03-15 at 12.47.17 PM.png (1×2 px, 1 MB)

Screen Shot 2018-03-15 at 12.39.34 PM.png (1×2 px, 1 MB)

image.png (768×1 px, 239 KB)

image.png (768×1 px, 135 KB)

image.png (768×1 px, 179 KB)

image.png (768×1 px, 199 KB)

image.png (768×1 px, 144 KB)

looks good to me! @Jdlrobson - do we also need technical signoff or are we good to resolve?

As long as @Gilles is happy we can close..!

I've discovered a pre-existing bug, that this change makes worse: T190037: Page preview doesn't trigger when link already hovered at page load time

And aside from that particular bug, this implementation didn't follow our initial recommendation, which was:

JS/CSS of Page Previews should be broken up in 2 parts: one as small as possible loaded on pageload, which captures link interaction

As it stands, your implementation delays the point in time where hovers are captured, which isn't what we were proposing.

The link hover capture mechanism would be best placed in ext.popups instead of ext.popups.main, along with the fix for T190037. When a link is hovered, or if one already is when that code loads, ext.popups.main would be loaded immediately as a pre-requirement for the business logic needed to treat a link hover.

Essentially you want to keep the code in ext.popups to as little as you can on top of hooking up the event listeners. But you do want to listen to hovers as early as possible.

Would an acceptable solution be to process the active hovered link when we bootstrap like you suggest in T190037: Page preview doesn't trigger when link already hovered at page load time? (With $('a:hover'))

Here, something along the lines of:

$('a:hover').trigger('mouseover')

If the selector works as expected, this would give us the most with the least complexity of having to split the code into two parts making it more complex.

@Gilles Can you confirm this patch fixes the speedindex performance concerns we had above? We can look into fixing the bug for T190037: Page preview doesn't trigger when link already hovered at page load time additionally. @ovasileva can look into it and prioritize it.

We'll know for sure when it hits production regarding the impact on SpeedIndex. I don't know how "agressive" requestIdleCallback called at that point will be in practice and whether it will truly take things out of the critical path (versus, waiting X ms before calling it, for example).

I'm not sure what you're suggesting with triggering the mouseover event manually. Right now the listener is being set up way too late, in the lazy-loaded section. Any event happening before that, real or simulated, is just lost. This means that any link interaction between the minimal module (ext.popups) loading and the lazy-loaded bulk of the code (ext.popups.main) is just lost, where it would have been picked up before this change.

This was the situation before:

Page start loading [delay] ext.popups + ext.popups.main load [delay] hover link interactions are captured

This is the situation now:

Page start loading [delay] ext.popups loads [delay] ext.popups.main load [delay] hover link interactions are captured

This is what should happen:

Page start loading [delay] ext.popups loads, hover link interactions are captured [optional delay] ext.popups.main load

The merged change increases the likelihood that people hover links early and nothing happens, while the low-priority final ext.popups.main module hasn't been loaded and executed yet. And this extra delay is lazy by design, relying on requestIdleCallback, happening at a point in time where the DOM is very likely to be completely rendered already, particularly on a slow connection.

I know that separating the event listening and the business logic it calls is complicated to do after the fact, but it's absolutely necessary here, to have the performance optimization be consequence-free in terms of how early popups can show for someone who starts hovering things early. This early capture of the hovering is what will allow you to fast-track the loading of ext.popups.main, to start loading as soon as a link is hovered, making it happens much quicker than just waiting for requestIdleCallback to happen. Basically being functionally identical to before the change in a situation where someone hovers early, and better (lazy-loading, our of the critical path) when the user doesn't hover early.

It's possible our wires got crossed on our analysis of T186016: Analyze time to first link interaction which suggested early hovers wouldn't/shouldn't be a concern. Maybe a 30 minute sync might clear this up?

If you decided that they're ok to be ignored, that's fine by me. But given that initially hovered links at pageload never worked, one might wonder if the instrumentation was correct and we weren't just missing to measure the ones that either happened on pageload, or prior to the event listener being added. In which case (some) early hovers were missing in the data that led to this decision.

A decision was made, yes, but I think it would be prudent to revisit this decision in T190037.
In terms of this task, let's verify this addresses the speed index issue before resolving. Sound good?

I'm not sure what you're suggesting with triggering the mouseover event manually. Right now the listener is being set up way too late, in the lazy-loaded section. Any event happening before that, real or simulated, is just lost. This means that any link interaction between the minimal module (ext.popups) loading and the lazy-loaded bulk of the code (ext.popups.main) is just lost, where it would have been picked up before this change.

Oh, I understood that $( 'a:hover' ) would give us the currently hovered link (if any) when ext.popups.main loads so that we could manually trigger that popup interaction.

If all events are lost then this solution doesn't work.

This is what should happen:

Page start loading [delay] ext.popups loads, hover link interactions are captured [optional delay] ext.popups.main load

I disagree, there is more than meets the eye if we do this. There is state to initialize and event logging events to send on page load, not only hover link interactions. Also design has expressed a strong concern over variable delays on showing popups. They have asserted that consistency is really important to meet user expectations. If we optionally load ext.popups.main with the delay depending on how it is done first links could have longer delays than the rest of the links on the page, which breaks that concern. If we do it all parallelized, module loading, http requests, and timing delay, then there is more logic that needs to go to to ext.popups and be loaded upfront. The complexity starts to sky rocket as all interactions are very asynchronous and everything is very instrumented in different ways (EL, statsv, and page interactions now).

If the concern is payload size (11.8 kB on master right now), we could look into splitting the codebase in different ways that keep the code sane to work with, horizontally instead of vertically. Maybe we could lazy load the renderer code if it is really worth it, and that would be a good enough tradeoff. I don't know.

We can't have perfect be the enemy of good enough. The numbers are so small that I'm not sure if we're in micro-optimization land. Performance is really important and we have to do a great job at it, but it needs to be married with user experience, analytics, and maintainability. It is useless to have an extremely performance adapted codebase if it is unmaintainable, produces rare bugs, the metrics are not sound, and the user experience is inconsistent. There is a fine balance to strike here.

We can continue more on T190037: Page preview doesn't trigger when link already hovered at page load time, I think it is a great idea to focus on specific problems and solve them rather than ideal technical changes.

Let's solve the issue at hand here and continue with the other one separately like Jon said.

I was just playing devil's advocate here, seeing this issue had not been discussed on this task and the implementation differed from the original plan. The current way things are means as much of the payload is low priority as possible, it's already the best option for performance.