Page MenuHomePhabricator

Spike: Determine how to load CentralNotice RL modules and when to execute campaign and banner selection logic
Closed, ResolvedPublic2 Estimated Story Points

Description

CentralNotice runs on every pageview, on (nearly?) every WMF wiki. This makes it a prime target for performance improvements. It also makes any CN performance issues especially important.

As part of the refactoring of campaign and banner selection code (T100686) we should determine exactly where the RL modules that provide this code should load (top or bottom) and when selection logic should execute (before or after the DOM is ready). See this patch for the most recent version the refactored code.

Previous (currently deployed) code is top-loaded but only executes after the DOM is ready. This seems likely not to be the best option.

We should seek the best overall performance, balancing the gains and losses we'd get in different scenarios. The scenarios to consider are the feasible combinations of following criteria:

  • RL modules fetched from the server or from the LocalStoarge module cache.
  • Possible campaigns available or not in ChoiceData.
  • If there are possible campaigns available, campaign selected or not from ChoiceData.
  • If a campaign is selected, banner selected and fetched or not. (Even if a campaign is selected, a banner might still not be selected for several reasons, including a hide cookie or a flag set by a campaign mixin.)

Here are some options 'n' constraints:

  • No matter what, we should only load campaign and banner selection code if there are possible campaigns.
  • No matter what, GeoIP code is ready to be separated out (see T102848), so it may be loaded in a different way and run at a different time.
  • Selection code must run after GeoIP code.
  • Selection code may be loaded as a dynamic dependency of ChoiceData, or via a call to mw.loader.load() in a script.
  • Selection code may be in top- or bottom-loaded modules.
  • Selection code may run right away, or only after the DOM is ready (called from a $( function ).)
  • The request to fetch the banner content might happen right away, or only after the DOM is ready (called from a $( function ).)
  • No matter what, the banner will only be inserted after the DOM is ready.

Suggestions on how to benchmark would be much appreciated, BTW.

More notes:

  • For the time, being we aren't considering the option of requiring banners to overlay page content. (That option that has been suggested, but so far, it hasn't received enough discussion do make it viable. Anyone interested in pursuing it: perhaps suggest a specific route for doing so?) So, we are concerned about possible page bumps when a banner is displayed (T107967).
  • The solution we adopt may consider the possibility that in the future, GeoIP, device category and/or logged-in status may be made available in PHP (load.php endpoint), which would make client-side selection code smaller and require less users to run it. (See T103695.)
  • Once implemented, we'll test real life performance changes, as per T107503.

See also:

Proposal 1 (abandonded)

This proposal doesn't make sense, following RL queue changes in core. I'm just leaving it here for comparison purposes.

  • Keep the GeoIP module top-loaded and continue to execute GeoIP processing right away.
  • Make campaign and banner selection code bottom-loaded, but have it start processing as soon as it's loaded. (However, during a transition phase, it will be bottom-loaded but will wait for the DOM before executing most logic.)
  • When a middle RL queue becomes available, move campaign and banner selection there.
  • We may use animations and other means to minimize page bumps when banners are injected.

Proposal 2 (current)

  • Keep all code async-top-loaded, and have all of it run right away (except stuff that needs the DOM to be ready). The code currently in master works this way.
  • We may use animations and other means to minimize page bumps when banners are injected. Not implemented, requires further investigation.
  • Investigate performance implications and possible needs for future changes, as per T107503.

See also the detailed rollout plan.

Event Timeline

AndyRussG raised the priority of this task from to Needs Triage.
AndyRussG updated the task description. (Show Details)
AndyRussG subscribed.

Some notes from yesterday's meeting:

  • Page rendering is indeed blocked until top-loaded RL modules are fetched (either from the server or LocalStorage) and executed.
  • We definitely don't want to keep the current setup of top-loaded modules that don't do most of their stuff until the DOM is ready!
  • For now, we'll move processing to bottom-queue modules. Once a middle queue (earlier loading and execution than the bottom queue, but asynchronous) is implemented, processing will move there.
  • For now, GeoIP processing will move together with the CN modules. window.Geo will stay available for legacy access, but we'll have a new mw.geoIP.getGeo() function (or something like that). Location info will be available later than it now is, since that's the one thing that bannerController currently processes right away. Nonetheless, we'll try this and see how it goes!

Unless I'm mistaken, moving a module to a different queue only will take effect when the Varnish cache clears, so it'll take a full month to complete any switch.

If there are no objections to this approach, maybe let's make a detailed RL-module-adding-and-switching roadmap that includes the addition of new modules and the removal of GeoIP from CN?

Another note from the meeting: while we don't expect to require banners to overlay content or animate, we may provide facilities to help mitigate page bumps. Here's a possible approach:

  • Use a timer to see check how long it takes for the banner to be ready to insert.
  • Whenever the banner is ready, wait for the browser finish the current execution queue (setTimeout(fn, 0)), and check if page is scrolled such that the banner will be visible.
  • If the banner was ready quickly and the page is scrolled such that the banner will be visible, just insert it.
  • If the banner took a long time to be ready and the page is scrolled such that the banner will be visible, insert it using an animation that gradually moves the rest of the page down.
  • If the page is scrolled such that banner won't be visible, insert it and adjust the scroll position to prevent the page from jumping.

Do we have any idea how much this is likely to delay banner display on average? 'Page bump' has been one of the most consistent community complaints about CentralNotice, and there's been a lot of previous work to reduce it. It may be a worthwhile tradeoff though if a small delay means improving performance for everyone else (which I think is the aim?)

Also worth bearing in mind: fundraising banners use javascript to insert themselves right at the top of the screen, unlike most other banners.

Do we have any idea how much this is likely to delay banner display on average? 'Page bump' has been one of the most consistent community complaints about CentralNotice, and there's been a lot of previous work to reduce it.

Hi! Since the whole campaign and banner selection system currently only runs after the DOM is ready, I don't think we'll actually increase banner display time much by moving the CN RL modules to the bottom queue. Also, this will be a temporary measure, while we wait for the "middle queue" to be implemented and deployed (expected this quarter).

It may be a worthwhile tradeoff though if a small delay means improving performance for everyone else (which I think is the aim?)

Yes! Currently, the time taken to load CN RL modules and perform initial processing (i.e., the bits of logic that are not delayed until the DOM is ready) actually blocks page rendering for everyone, all the time. The only caveat is that for most returning users, that's still quite fast, since the modules can be loaded from a LocalStorage cache rather than from the server.

Also worth bearing in mind: fundraising banners use javascript to insert themselves right at the top of the screen, unlike most other banners.

Great point, thanks! Yeah, that complicates a bit how we should implement any trickery to detect and, if necessary, mitigate page bumps, though I think the trickery will still be possible.

Proposal:

  • Keep GeoIP top-loaded.
  • Only move CN display logic to before the DOM is ready as a final rollout step, to keep the changes as incremental as possible.

Detailed rollout proposal here.

This looks great, from what little I understand :D Maybe add a few words about how we plan to test for performance regression, and give current benchmarks?

Added a proposal to the task description...

+2, I think it's a good balance. We're certainly not going to make page load any slower with this refactor, and we're in a great position to experiment with dependencies and stuff.