Page MenuHomePhabricator

Wikipedia triggers multiple 20ms+ browser hangs for restyles for separate "addEmbeddedCSS" calls
Closed, ResolvedPublic

Description

(This is the first wikimedia bug I've filed; apologies if I do anything wrong. I'm a Mozilla Firefox developer reporting a performance issue aka opportunity-for-optimization.)

If you capture a Firefox or Chrome performance profile[1] of the https://en.wikipedia.org/wiki/Barack_Obama pageload, you'll see multiple 20-70ms restyles happening, as a result of separate calls to "addEmbeddedCSS".

SUGGESTION: This could be optimized quite a bit these addEmbeddedCSS calls all happened in the same tick (rather than spread out over time) -- then, they'd just produce a single restyle, which would likely still be about as long (or maybe slightly longer) than the duration of just one of these restyles we're already performing currently.

DETAILS/NOTES:
Here's a performance profile from Firefox Nightly, showing a portion of pageload:
https://perfht.ml/2jqZj5J

All of the "red bar followed by dark blue bar" patterns in the graphical waterfall there are from wikipedia's JS calling addEmbeddedCSS() to append a <style> element, and then styles are recalculated in the next refresh tick after the JS has yielded, in response to that new <style> element. If you hover those blue bars, you can see that each restyle takes on the order of ~40ms, and that's much longer than the frame budget for buttery-smooth 60fps performance. It's likely that wikipedia loading would be noticeably less janky if these many addEmbeddedCSS calls could be batched together.

I captured a Chrome profile as well, and noticed similar durations for post-addEmbeddedCSS restyles, as noted in https://bugzilla.mozilla.org/show_bug.cgi?id=1382311#c9 , but they don't have a web UI so I can't easily share that profile. But anyway, that indicates that this isn't Firefox-specific -- there are slow restyles in response to these calls in Chrome as well. So: my suggestion here would likely benefit load performance & responsiveness in all browsers.

[1] In Firefox, you can capture a profile using the extension from https://perf-html.io/ . In Chrome, you can use the built-in devtools.

Event Timeline

Here's a screenshot of the Firefox profiler UI, BTW, with me hovering the first of the blue restyle bars. As you can see, it shows "42ms Styles", and there are at least 5 such restyles shown in the profile.

The shorter red bar to the left of the blue bar is for script (which is where the addEmbeddedCSS call happened).

profiler-screenshot.png (338×335 px, 16 KB)

Here's a screenshot of Chrome's profiler UI for this pageload (showing only "rendering" samples).

Their first (51.5ms long) restyle is for the initial page rendering, I believe, but I think all the other long-ish restyles (33.5ms, 11.1ms, 16.8ms, 23.8ms, 86.2ms) are for style updates after addEmbeddedCSS calls. As in Firefox, I suspect a lot of the work in these restyles is ultimately redundant and could be collapsed/coalesced together if the addEmbeddedCSS calls happened all at once.

chrome-profile-screenshot.png (748×420 px, 62 KB)

Krinkle triaged this task as High priority.
Krinkle removed a project: Browser-Support-Firefox.

Thanks @dholbert

I reproduced the issue and notice two things:

  1. Cases where addEmbeddedCSS() is called twice within the same tick.
  2. Cases where addEmbeddedCSS() is called multiple times between navStart and loadEventEnd. 2.1. Some of these are very close to each other (but not the same tick). 2.2. Some of these are far from each other (close to, or after, loadEventEnd).

Point 1 is definitely a bug and shouldn't happen. I'm already working on a fix :)

Point 2 is harder to avoid due to dependency resolution and load order. A module's styles must be inserted before their JS is allowed to execute. As such, while we do fetch multiple modules at once over the network, their evaluation is deferred and sometimes this results in some styles not being added to the buffer until after the leaf nodes have been fully css-inserted/js-executed first.

Given a batch of:

  • Module A.
  • Module B. (Dependency: Module C)
  • Module C.

Firs round: A and C will execute given they have no unresolved dependencies. "Execute", here, means any styles it has will be inserted (an async process given we try to buffer/batch multiple stylesheet additions). And once style insertion is done, its script will be run, and the module marked as "ready". Then we do another round.
Second round: B will now execute, and thus queue its styles.

I suppose while a module must have its own (and dependency's) styles applied before its script may run, it should be safe to apply the later (unrelated) styles earlier as well. Right now this doesn't happen because the styles aren't discovered until the module is safe to execute in its entirety. I'll look into making this a bit smarter so that styles for modules with pending dependencies may be inserted ahead of time (whilst still holding back the module's JS resource).

I'll do another audit, but in general this shouldn't cause (visible) reflows given we strongly discourage use of JS to change rendering above the fold at page load time. All such logic should be in the HTML, possible with any needed styles guarded with .client-js in the selector (which is set in a <head> script).

Point 2.2. represents lazy-loaded modules that are intentionally separated to ensure more important scripts run to completion first.

I suppose [...] it should be safe to apply the later (unrelated) styles earlier as well. [...] I'll look into making this a bit smarter

Nice! Yeah, if it's possible to batch some style-appends together (whether that means deferring one or doing the other one "prematurely"), I think that should help here.

Thanks for looking into this!

Local debugging showed me that.

  • Batching is working as expected. There are only multiple calls to addEmbeddedCSS() if they were logically separated by a layer of unwrapping the dependency tree. In other words, addEmbeddedCSS() batch-inserts all styles that are ready, and more styles can only come available if this insertion first takes place – which in turn unblocks more modules and their styles.
  • The case of lazy-loaded modules inserting additional styles is rare.
  • The most common case of successive addEmbeddedCSS calls is when styles for modules have arrived, but aren't inserted into the buffer yet because they belong to a module that has unresolved dependencies.
 implement received styles for jquery.suggestions
 implement received styles for mediawiki.searchSuggest
 implement received styles for mediawiki.notification
 implement received styles for mediawiki.action.view.postEdit
 implement received styles for ext.visualEditor.desktopArticleTarget.init
 implement received styles for ext.uls.common
 implement received styles for ext.uls.interface
cssBuffer flush Set(3) {"jquery.suggestions", "mediawiki.notification", "ext.uls.common"}
.. execute js that is now unblocked
.. mark modules as ready
.. find pending modules now unblocked ..
cssBuffer flush Set(3) {"mediawiki.searchSuggest", "mediawiki.action.view.postEdit", "ext.uls.interface"}
.. execute js, resolve tree..
cssBuffer flush Set(1) {"ext.visualEditor.desktopArticleTarget.init"}
.. execute js ..

Where:

  • mediawiki.searchSuggest depends on jquery.suggestions.
  • ext.visualEditor.desktopArticleTarget.init depends on mediawiki.action.view.postEdit, depends on mediawiki.notification

I considered (and tried, in a local proof of concept) to have the execute() pipeline start immediate on arrival (instead of waiting for dependencies-allReady). The setting of messages and templates earlier is uncontroversial (no conflicts supported there). The CSS was loaded together and sooner (as expected). JS execution stayed the same (requiring dependencies to be resolved first).

Ran into two problems:

  • It changed the order of stylesheets, making it FIFO instead of based on dependency tree. This broke cascading expectations (makes sense, solvable problem).
  • Modules that result in failure can no longer propagate their error in a preventive manner, because the depending module's styles were already inserted and shown to the user.

For messages/template this is fine (simply remain unused). But for styles it seems like a problem to have the page adapt to a feature, but then not have its JS run.

This can already happen between with module's own styles and broken JS, given we always insert styles before JS execution (design requirement), but applying it to dependencies might make things worse.

When feature has its own JS broken, styles will have been applied already. But, currently, when a common JS utility has a problem (not relating to styles), the feature would have neither its styles nor its JS applied. Loading styles in separate dependency tree enables them to make a false start that is hard to recover from.

Not sure how common or how important this is, but worth thinking about.

Krinkle lowered the priority of this task from High to Medium.Oct 11 2017, 4:56 PM

Ran into two problems:

  1. It changed the order of stylesheets, making it FIFO instead of based on dependency tree. This broke cascading expectations (makes sense, solvable problem).
  2. Modules that result in failure can no longer propagate their error in a preventive manner, because the depending module's styles were already inserted and shown to the user.

I've decided problem 2 doesn't matter. We already set no guarantees that state is undone in case of an uncaught exception in a module. And it really shouldn't matter given that we already always do (and have to) apply the styles before script execution, so even in the current case, a broken module will have its styles still applied. The only thing that would change is that a separate dependant module currently doesn't have its styles inserted, which we would. I think that's a fair compromise and we should solve the first problem, which I suspect is solvable.

The main thing required to make this possible is to have the stylesheet buffer perform dependency resolution instead of assuming that everything in the buffer has already been resolved, so that the order is correct.

Rather than hacking in a second implementation of a dependency tree in mw.loader, I plan to first refactor the current jobs list in mw.loader a bit to reduce its overhead and (hopefully) make handlePending() obsolete (e.g. trigger jobs based on callback, by keying jobs by module name instead of as array; so that we can trigger specific jobs directly on state changes, instead of iterating/polling after each state change). We can then incorporate a new state within state=execute that indicates early resources (styles, messages, templates) have or may be inserted, and allow subsequent state progression based on that for dependant modules.

Rough sketch:

jobs : {
  <name of module being subscribed to>: {
    <state>: [ callback, .. ],
    <state>: [ callback, .. ]
  }, ..
}

Idea being that script execution and dependency resolution is queued as a callback for jobs[module].ready. But style insertion can be unlocked by jobs[module].execute, which means that as soon as a module has started to execute (read: synchronously queue up messages/templates/styles), other modules dependant can also start to enter that state.

For individual modules, their script execution is dependant on style insertion, and their state=ready progression dependant on script execution succeeding (already implemented). And dependant module's script execution is dependant on state=ready (already implemented).

Krinkle claimed this task.

Closing in favour of T202703. It's been reduced a fair bit, but the majority of it remains and seems mostly intentional, I'm not sure how it could be avoided.

That is, we are dynamically loading modules with stylesheets, which need to be inserted (expensive), and as part of resolving the dependency tree, some modules will need to be executed before others.

To some extent the synchronous overhead can be reduced by creating smaller batches, on the other hand, the overall overhead could be reduced by creating bigger batches including to consider inserting styles of yet-un-resolved modules of the dependency tree so long as they are linearly in the correct order.

That is something to consider but has the drawback of failing for lazy-loaded dependencies or dependencies still in-transit. And while overall time spent is important, I think it is more important to reduce the amount of time spent per cycle, so as to avoid freezing the browser. That is, I'd prefer 3x 40ms vs 1x 100, even though that means in total we spent 120ms, that seems worth it.