Page MenuHomePhabricator

Remove wgCategories from mw.config
Open, MediumPublic

Description

This task proposes the removal of wgCategories from mw.config.

MediaWiki [[ https://gerrit.wikimedia.org/g/mediawiki/core/+/f340c7271c020090e19e7dd5be1c68a2c02dcc33/includes/OutputPage.php#3205 | adds wgCategories ]] with a complete array of categories to the JavaScript config delivered with each page. For the Obama article, that's about 3400 characters in itself as an inlined head script. This seems like it would be more appropriate to be rendered as static HTML towards the bottom of the page or loaded asynchronously as needed. The mobile site already strips this data and there don't seem to be many consumers except BlueSpiceInsertCategory and possibly Minerva but that's probably a bug in itself.

Event Timeline

Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptOct 4 2018, 6:26 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Anomie added a subscriber: Anomie.

This seems to have nothing to do with MediaWiki-Action-API, removing the tag.

This seems like it would be more appropriate to be rendered as static HTML towards the bottom of the page or loaded asynchronously as needed.

+1 to that. Categories can be loaded from the action API with a query like action=query&prop=categories&titles=Barack%20Obama&cllimit=max.

The mobile site already strips this data and there don't seem to be many consumers except BlueSpiceInsertCategory and possibly Minerva but that's probably a bug in itself.

Also consider on-wiki scripts and gadgets:

anomie@mwmaint2001:~$ mwgrep --max-results 0 wgCategories

(total: 128, shown: 0)
anomie@mwmaint2001:~$ mwgrep --max-results 0 --user wgCategories

(total: 278, shown: 0)
Legoktm added a subscriber: Legoktm.

HISTORY says:

* New wgCategories JavaScript global variable for userscripts.

So would be worth mwgrep'ing to see what user scripts are using it. I do think removing it and replacing it with some mw.api.categories helper is a good idea (after some deprecation period of course). AFAIK though all the relevant information is already exposed via the API so removing that project.

Categories can be loaded from the action API with a query like action=query&prop=categories&titles=Barack%20Obama&cllimit=max.

The problem with this is, contrary to user state (as with wgUserRights), this is expected to be in sync with the currently viewed revision - which I don't think the API currently supports. We'll want to avoid repeating the mistakes from the citation/references cache.

While I've repeatedly expressed a desire to get rid of as many config-vars as possible in favour of dedicated interfaces, I'm also aware that many of them are commonly used in critical scripts and would make sense to preload as part of the HTML either way, as otherwise we'd end up with a worse overall user experience and a late time-to-interactive.

I suggest only changing the access format (mw.config vs a dedicated getter), and load position (top vs bottom), and to follow the example of wgUserGroups by initially wrapping the new interface around the old export format for backward-compatibility. This way gadget maintainers that do pay attention to Tech-News have a fair chance to migrate their scripts before we remove the old interface. The cost of that is practically zero given both formats would be equally contributing to the HTML payload (the new format isn't cheaper, and we don't need to do both at any time).

The problem with this is, contrary to user state (as with wgUserRights), this is expected to be in sync with the currently viewed revision - which I don't think the API currently supports.

Only by reparsing the page with something like action=parse&oldid=862438435&prop=categories. Unless you're also worried about a mismatch due to templates that have been updated since the revision was originally rendered.

OTOH, you could likely use action=query&prop=categories|info&titles=Barack%20Obama&cllimit=max and check that the lastrevid matches what you expect, only falling back to reparsing if it doesn't.

Of course, that also assumes the use case actually cares about categories for other than the current revision. That would have to be looked at on a case-by-case basis.

Jdlrobson renamed this task from Load categories asynchronously to Remove wgCategories from mw.config.Oct 5 2018, 3:39 PM
The problem with this is, contrary to user state (as with wgUserRights), this is expected to be in sync with the currently viewed revision - which I don't think the API currently supports. We'll want to avoid repeating the mistakes from the citation/references cache.

I imagine its not just current revision, but current parsed view - e.g. If the categories in the article vary with the user language, $wgCategories is presumably the categories corresponding with the current view, not the categories from the canonical parse options.

I'm less worried about a user explicitly viewing an old revision, although we should definitely not provide invalid data (either valid, or refuse).

My main concern is about the delay between viewing a page (which is based on viewed ParserOutput, and cached in Varnish) and the fetch for categories (which can happen many minutes, hours, or days later).

Anyhow, it's fairly complicated to "get right" a solution that works asynchronously for this in a way that is easy to use, scalable for us, and performant for the user. As such, I'm not sure effort toward that is justified right now.

The performance rationale of bytes in the HTML stream before rendering can happen is good, but doesn't mandate asynchronicity, and as @Niedzielski mentioned already, moving it to to the end of the HTML would work just as well to reduce rendering time. We can confirm this with a few static speed-test trials to see how it would perform in production.

HISTORY says:

* New wgCategories JavaScript global variable for userscripts.

So would be worth mwgrep'ing to see what user scripts are using it. I do think removing it and replacing it with some mw.api.categories helper is a good idea (after some deprecation period of course). AFAIK though all the relevant information is already exposed via the API so removing that project.

Global Search says ~500 usages, including in the MediaWiki:Common.js of 58 different sites (including enwiki).