Page MenuHomePhabricator

Move some page- and user-specific mw.config data out of HTML head, to end of body
Open, LowPublicFeature

Description

mw.Title already exists though it needs more work and could use some caching.

mw.User doesn't exist yet as a constructor, but preliminary work has already been done with a lazy-loading system (with the introduction of mw.user.getGroups and mw.user.getRights - similar to in PHP where those methods call ->init() which does nothing if already loaded and loads it if not).

Each of these would be constructed (without being initialized) by default in their lowercase equivalents (mw.title, mw.user, mw.page).

I suggest we write an abstract class that implements:

  • Caching by unique identifier
  • Methods to clear cache
  • Methods to populate cache
  • etc.

Eventually this will also allows us to clean up stuff that doesn't belong in mw.config (wgUserName, wgUserGroups, wgCategories, wgCurRevisionId, wgTitle).

And if those vars provide sufficient information to fully populate a cache, we can even optimize for the current context instances and prepopulate them.

Or do it the other way around (populate mw.user/title/page/revision from the page output, and put a backwards compatible thing in mw.config that accesses those from mw.user/title/page/revision)


Version: 1.20.x
Severity: enhancement

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:51 AM
bzimport set Reference to bz39813.
Krinkle lowered the priority of this task from Medium to Low.Nov 24 2014, 2:57 PM
Krinkle set Security to None.
Phabricator_maintenance renamed this task from Implement immutable object constructors with caching for mw.User, mw.Title, mw.Page (tracking) to Implement immutable object constructors with caching for mw.User, mw.Title, mw.Page.Aug 13 2016, 11:29 PM

I'm reviving this and rephrasing it with the explicit goal of moving wgUserGroups and wgCategories values in mw.config from the <head> to the end of </body>, and (prior to that) give them documented and well-supported async JS methods to reliably access the data (asking users to manually wait for dom-ready and then access mw.config would be a fragile pattern to explain and support).

Prior art:

  • 2012: Rejected new addition of wgUserRights to mw.config data in page head. – T40151
  • 2012: Shipped mw.user with async methods .getRights() and .getGroups(). – T40151
  • 2016: Re-implement .getGroups() to make use of wgUserGroups. – https://gerrit.wikimedia.org/r/280753

I'm proposing the next steps:

  1. Deprecate access to wgUserGroups via mw.config, in favour of mw.user.getGroups (an async method).

I think this is rare enough to not cause widespread panic or take many years to migrate; and yet sufficiently common that it should get us through diverse and wide range of situations where this might be the first time a module dependency or async method is used, thus requiring us to address any unsolved challenges an extension, skin, or gadget might have.

After this first pilot, we can then look at further moving of data towards more async and anticipated workflows. This is loosely related to T183720 and T140664, in so far that they are also about improving page load performance, about letting data flow in one direction, about the server knowing what clients need and sending it to them for the client to wait for and react to in real-time; Instead of the traditional way (where we still are mostly) where as much as possible that might be needed is sent upfront and blocking rendering, then everything renders, and then JS starts looking combing through this stuff and making use of a few things that it was sent, which the user needlessly paid for several seconds ago but weren't used until now.

  1. Remove wgUserGroups from mw.config in OutputPage, and instead send it through an internal mw.hook() that mw.user.getGroups listens for. This would either be something specific like internal.user.groups, or something more general like internal.deferredConfig, which we could expose through something like mw.getDeferredConfig(). The latter might make sense so that extensions don't have to reinvent as much and can instead leverage more of the same infrastructure for their own use cases.
  1. Do the same for wgCategories.
Krinkle renamed this task from Implement immutable object constructors with caching for mw.User, mw.Title, mw.Page to Move some page- and user-specific mw.config data out of HTML head, to end of body.Apr 25 2021, 7:34 PM
Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:14 AM
Aklapper removed a subscriber: TrevorParscal.

Moved out of future goals. We're underr the 14KB mark now, so there's no longer an active need from my POV. As long as we don't add new things to this mechanism, what's there is accepted within the budget. Keeping it open as maintenance task or possibly as part of a higher-level goal or perf budget trade off where we need this space for something else.

In discussion of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/822669 it seems like an alternate version of ParserOutput::setJsConfigVars, called something like ParserOutput::setLateJsConfigVar() would be useful, which works just like setJsConfigVars but adds the data to the bottom of the page instead of the top. As long as your extension runs on onDocumentReady hook, this should be a "safe" change that would require minimal changes to the extension but effectively shift more stuff out of the <head>.

The debugi toolbar could use ::setLateJsConfigVar(), probably some maps and graph extension stuff as well? And of course DiscussionTools, although we don't actually expect the discussion metadata to be that large.

I've been thinking about this and I think that instead of a separate method like setLateJsConfigVars(), we should use setJsConfigVars() as before and have a static-ish list of variable names that go at the bottom instead of the top.

Reasoning:

  • Most users of ParserOutput don't actually care about this distinction. It only matters for rendering the MediaWiki view action pages, because this is the only place where browsers can handle the output in a "streaming" manner, so the 14 KB mark matters (right?). This is probably the biggest user by request volume, but it's just one place in the code.
  • Most of the code dealing with ParserOutputs handles processing them in various APIs and other server-side processing, where JS config vars are ignored or just passed along unchanged. All of them would have to handle the new kind of JS config vars, and make the decision to either pass the new one along, or merge them together. This seems error-prone.
  • Having two methods introduces new kinds of error conditions that would have to be handled – e.g. what if setLateJsConfigVars() and setJsConfigVars() are called with the same name on the same parser output? what if we try to merge two parser outputs where one used setLateJsConfigVars() and the other setJsConfigVars() with the same name?
  • There is a lot of code for setting/adding/merging JS config vars in just ParserOutput and OutputPage, much of which would have to be duplicated for "late" JS config vars (some of it is thankfully deprecated, though).

Change 849187 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Allow placing mw.config JS config vars late, at end of body

https://gerrit.wikimedia.org/r/849187

I am open to being convinced away from this approach, but I think it's really neat.

In discussion of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/822669 it seems like an alternate version of ParserOutput::setJsConfigVars, called something like ParserOutput::setLateJsConfigVar() would be useful, which works just like setJsConfigVars but adds the data to the bottom of the page instead of the top.

I agree with @matmarex that this specific implementation might be suboptimal. I think it involves higher complexity to understand for developers, and would be more costly to adopt and maintain for various layers that need to be aware of this.

Late mw.config

If we mechanism we end up building for this problem is to utilize mw.config in a deferred manner, then a declarative approach seems preferred, in which no additional transport layers or APIs are required, no changes to API consumers — except for the one change that is naturally needed. That is, based on whether keys are in the declared list, they will internally be sent from either the head or the body accordingly.

As long as your extension runs on onDocumentReady hook, this should be a "safe" change […]

As long as that's the case, yes. I think that's significant enough of quirk that I would not consider it safe. I'm also thinking about the long-term performance impact that would likely rise from entire code paths being deferred high up just because it does (or once did) involved a call to mw.config.get() and so in order for that synchronous call to work the entire initialisation logic must be deferred.

Of course in theory, for cases "where it really matters" one could create a utility function that wraps mw.config.get() in $.ready and not defer the rest of the calling code. But in practice people will likely just modify a reasonably high-level entrypoint and defer the entire thing. It's what we'd optimise for, make easy, and thus incentivize. I also see this leading to lack of confidence in the platform because people will inevitably forget and start wrapping things "just in case", either becuase they anticipate a mw.config.get call to be added during development, or because it was once there but is no longer. People tend to be very hesitant to remove a dom-ready deferral once it's there because it requires a very deep dive to know whether it is safe to remove, as anything indirectly could in some way depend on it down the stack.

Aside from just day to day code reviews and gadgets where I see this regular, two major past initiatives come to mind:

  • T107399: Make top queue fully asynchronous — we removed a ton of dom-ready closures as part of this.
  • mw.util.$content — an API I deeply regret that is not dissimilar, which requires upto 2 $() dom-ready closures and a using statement to be reliably readable.

My recommendation would be to add a 5-line wrapper function like mw.util.getLateConfig that would be the supported and recommended way to access this data. It would be an async function (Deferred/Promise) that returns $() -> mw.config.get().

Other approaches

This old task is being revived to help achieve a current performance improvement in DiscussionTools. Bartosz approached me for code review on DiscussionTools change 822669, where two weeks ago I left a number of suggestions for one could transport talkpage metadata from server to the client.

While there hasn't been a reply there, my understanding is these suggestions are not yet considered, as we're first exploring for a quick win by leveraging mw.config.

From chatting with @matmarex on IRC, I believe the reasons the other approaches are not yet considered is because it is understood those would likely require changes to other features that also perform edits or previews (WikiEditor, VE, MobileFrontend). Bartosz explained that we'd need new hooks and callbacks for DT to know when/where to read the data has been transported to.

I explained why such changes would not be needed if we approach it a certain way. The conversation helped me understand something else: DT intends to support other software that will change/re-render talk pages so long as they use the Parse API and store any new jsconfigvars data in mw.config. The assumption being that DT can statically read it from there, and will be notified by other means such as wikipage.content.

I consider the proposal of "late mw.config" a workaround, because it creates a stack that seems suboptimal in terms of stability and ease of getting it right and maintaining it. Let me summarise:

  • DT would store data in ParserOutput as jsconfigvars, but a declaration would have these go somewhere else than usual. This introduces a long-term source of mistakes/confusion/doubt.
  • Consumers of the config data must know that mw.util.getLateConfig exists and remember to use it for these keys. They are not naturally discoverable because using plain mw.config.get will likely work fine during development and for fast clients in production.
  • DT would either have to read from mw.config every time or be notified somehow to know to re-evaluate mw.config data.
  • As per first part of this comment, I believe this is likely to bring back unintentional patterns that up deferring and delaying large amounts of user interface code "just in case", which would worsen performance.
  • mw.config has unclear stable support policy toward gadgets and other extensions. It tends to be treated as public and stable, which is not great. It also feels prone to clashes long-term due to unclear ownership/namespacing.

At glance, mw.config seems an akward fit for data that will change over the lifetime of a page. It has no mechanism to inform calling code when a value has changed. Years ago I introduced mw.hook to solve exactly this and we've gradually moved toward that. mw.hook allows for asynchronisity, memorises the last emission, and naturally lets clients recompute their data after an edit to the data.

But, so long as the data is specific to ParserOutput and arrives via an API request handler that knows to also emit the exisiting mw.hook('wikipage.content') afterward, then you basically get the best of both. Tthis is essentially what we do already for page-specific data consumed by JS features, such as gallery/slideshow, tablesorter, collapsible, Maps/Kartographer, etc. You would effectively change "only" the first render, needing a new access pattern, for which my concerns could be addressed with an async method like mw.util.getLateConfig.

My bias againt this was on a feeling that DT happens to be at an intersection that I feel is rare: Where the data is both originating from the Parser, and too large for HTML head, and relating to a feature that supports ajax previews/edits. I originally filed this task for a very differnet purpose, for more generic seeding of anticipated API requests. For example how we sometimes use an inline script to call mw.loader.implement() ahead of a would-be RL request. This is inspired by the "Facebook BigPipe" model. This would help long-term after T183720 to e.g. preload certain API requests.

But... I realize this intersection is a lot bigger than I thought. And when we ge to these other kinds of data, we can always come up with a different delivery mechanism.

Change 851680 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Add ParserOutput::{set,append}JsLateConfigVar() and mw.util.lateConfig

https://gerrit.wikimedia.org/r/851680

Change 849187 merged by jenkins-bot:

[mediawiki/core@master] OutputPage: Support loading some mw.config vars late, at end of body

https://gerrit.wikimedia.org/r/849187

Change 851680 abandoned by C. Scott Ananian:

[mediawiki/core@master] WIP: Add ParserOutput::{set,append}JsLateConfigVar() and mw.util.lateConfig

Reason:

The other way was better. :)

https://gerrit.wikimedia.org/r/851680

Change 889258 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] ResourceLoader: Permit partial failure in OutputPage::getBottomScripts

https://gerrit.wikimedia.org/r/889258

Change 889258 merged by jenkins-bot:

[mediawiki/core@master] ResourceLoader: Permit partial failure in OutputPage::getBottomScripts

https://gerrit.wikimedia.org/r/889258