Decide how to implement code splitting in Codex, and how to integrate it in ResourceLoader
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Catrope
	Aug 16 2023, 10:41 PM

Description

Please post feedback on this task by September 4th.

This is a proposal for how to approach code splitting in Codex, focused mainly on the impact that would have on the developer experience of using Codex in MediaWiki.

Current situation

Most features using Codex are encouraged to use the @wikimedia/codex ResourceLoader module. This module contains the entire Codex library, which is fairly large: 156 KB of JavaScript and CSS (transmitted over the network as 32.2 KB of compressed data); and this number will only grow as more components are added to Codex. Most features use only a subset of Codex components, so a substantial portion of this code is unused.

Some features use CSS-only components, and only load the codex-styles module, which contains the CSS without the JavaScript (68.8 KB of CSS, compressed to 9.6 KB). This module doesn't contain any JS, but it does contain the styles for all components in the library, including components that the feature might not use, and including styles that are only needed for the JS version of the components.

For the search feature in Vector, the Web team was very concerned about limiting the size of the code that is loaded, since the search feature appears on every page. To support this, the Design Systems Team created a special build of Codex, and made it available as the @wikimedia/codex-search and codex-search-styles modules in ResourceLoader. These modules only contain the TypeaheadSearch component and its dependencies. It's about half the size of the full library: the styles module is loaded at page load time and is 29.4 KB of CSS (4.5 KB compressed); the JS module is loaded when the user interacts with the feature, and is 36.7 KB of JS (12.6 KB compressed).

These search-specific modules ensure that no unused code is loaded for users who use the search feature. However, unused styles are still loaded for users who don't interact with the feature (because the codex-search-styles module contains styles for components that only appear after the user types something). This is also a one-off way of addressing the problem that requires special configuration in the Codex library and publishing a separate NPM package, which doesn't scale well if we want to provide this treatment for multiple features.

Another problem with these search-specific modules is that they duplicate part of the full Codex library. If both the search feature and another feature load on the same page, causing both @wikimedia/codex and @wikimedia/codex-search to be loaded, the search-specific components are loaded twice. Our current system is not smart enough to deduplicate this double-loading of components.

Proposal

Features that use Codex would list the components they need in their ResourceLoader module definition. ResourceLoader would then embed the JS for these components (and the components they depend on) in the contents of that module as a packageFile, and add the CSS for these components to the module's styles. This ensures that each feature loads exactly the components it needs, and no more.

See also T344386#9132451 for an alternative approach

Simple example

In extension.json (or Resources.php), use the CodexModule class for the RL module that uses Codex, and list the Codex components the module uses:

"ResourceModules": {
    "ext.foo.myfeature": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "packageFiles": [
            "myfeature/init.js",
            "myfeature/App.vue",
        ],
        "codexComponents": [
            "CdxButton",
            "CdxCheckbox",
            "CdxField",
            "CdxIcon",
            "CdxLabel",
            "CdxRadio",
            "CdxToggleSwitch"
        ],
        "dependencies": [
            "vue"
        ]
    }
}

In App.vue, get the components from ./codex-subset.js instead of from @wikimedia/codex, but otherwise use Codex normally:

<template>
    <cdx-field is-fieldset>
        <cdx-checkbox v-model="accepted"></cdx-checkbox>
        <template #label>
            I accept the terms and conditions
        </template>
    </cdx-field>
    <cdx-button action="progressive" weight="primary" :disabled="!accepted">
        Continue
    </cdx-button>
</template>
<script>
const { defineComponent } = require( 'vue' );
// Get Codex components from './codex-subset.js' instead of '@wikimedia/codex'
const { CdxButton, CdxCheckbox, CdxField } = require( './codex-subset.js' );

// @vue/component
module.exports = defineComponent( {
    components: {
        CdxButton,
        CdxCheckbox,
        CdxField
    },
    data: () => ( {
        accepted: false
    } )
} );
</script>

See also this merge request in CodexExample for another usage example.

Deduplication

This approach doesn't address deduplication: if two features that are constructed this way load on the same page, any Codex components that are used by both features would be double-loaded. We propose solving this problem in a targeted way rather than a general way. We expect that most features that use Codex will fall in one of two categories: they're either used on a very limited number of pages (e.g. the UI on a special page, or the contents of a Wikifunctions page), or they're used on almost all pages (e.g. the Vector search bar, or a future Codex implementation of UniversalLanguageSelector or Echo). It should be rare for two features from the former category to be loaded on the same page, because their scopes are generally non-overlapping. If two features using Codex are loaded on the same page, it's safe to assume at least one of them is something that appears on (almost) every page. For this reason, we focus on addressing duplicate loading of the Codex components that are used by features that appear on every page.

We propose manually curating a list of core components that are likely to overlap between every-page features and limited-scope features, and creating a ResourceLoader module that embeds these core components. ResourceLoader modules that use Codex would then depend on this core components module. For ease of use for the developer, these modules would still request embedding of all the components they use, and use them the same way in JavaScript as they would non-core components, so that consumer code doesn't have to be updated if the list of core components changes. But internally, ResourceLoader would get these components from the core components module, rather than embed them.

Deduplication example

In Resources.php, we might do something like this:

use MediaWiki\ResourceLoader\CodexModule;

return [
     // ...
     'codex-core' => [
        'class' => CodexModule::class,
        'codexComponents' => [
            'CdxButton',
            'CdxIcon',
            // ...etc other modules that are used a lot...
        ]
     ]
];

A feature that uses Codex would then define a ResourceLoader module like this:

"ResourceModules": {
    "ext.foo.myfeature": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "packageFiles": [
            "myfeature/init.js",
            "myfeature/App.vue",
        ],
        "codexComponents": [
            "CdxButton",
            "CdxCard"
        ],
        "dependencies": [
            "vue",
            "codex-core"
        ]
    }
}

In this example, ext.foo.myfeature would embed the Card component (which is not in the core components module), but would not embed the Button component (it would instead get it from the core components module). It would also embed Thumbnail (which is needed by Card and is not a core component), but it would not embed Icon (also needed by Card, but it's in the core components module).

CSS-only modules

A feature that uses Codex CSS-only components could set "codexStyleOnly": true, like this:

"ResourceModules": {
    "ext.foo.cssonlyfeature": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "styles": [
            "cssonlyfeature.less"
        ],
        "codexComponents": [
            "CdxCard",
            "CdxMessage"
        ],
        "codexStyleOnly": true
    }
}

This would embed only the CSS for the Card and Message components (and the components they depend on).

A feature that uses CSS-only components initially, but then replaces them with Vue components when JS loads, could create a style-only module and a JS module, like this:

"ResourceModules": {
    "ext.foo.enhancedfeature": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "packageFiles": [
            "enhancedfeature/init.js",
            "enhancedfeature/App.vue"
        ],
        "codexComponents": [
            "CdxTypeaheadSearch"
        ],
        "dependencies": [
            "ext.foo.enhancedfeature.styles"
        ]
    },
    "ext.foo.enhancedfeature.styles": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "styles": [
            "enhancedfeature-cssonly.less"
        ],
        "codexComponents": [
            "CdxTypeaheadSearch"
        ],
        "codexStyleOnly": true
    }
}

The JS module would embed the JS of the TypeaheadSearch component, but not its CSS, because it would detect that that is already provided by the style-only module.

Proof of concept implementation

The Design Systems Team has written the following proof of concept patches. These are not full implementations, but just serve to demonstrate the concept:

In Codex: a patch that makes the build system output the library as many small JS files that require() each other (rather than one large JS file), as well as a manifest.json file describing the dependency graph between these files.
In MediaWiki core: a patch that implements part of the CodexModule functionality described above, by reading the manifest file from Codex and embedding the appropriate files. This only implements the simple example, not the dependency smartness or style-only handling.
Examples of how to use CodexModule in VueTest and CodexExample

Open questions / issues

Style-only modules

Style-only modules can't have dependencies, see T191652 (in particular T191652#4117599 explaining why this restriction exists). This causes a problem for the deduplication strategy: we would like to create a codex-core-styles module and tell CSS-only feature modules to depend on it, but they can't. This means that CodexModule can't deduplicate them (unless we instruct it to do so in a different way), and that developers loading these modules have to manually remember to load both their module and codex-core-styles. Working around this is probably doable, but the developer experience wouldn't be great. If we ever had multiple layers of style dependencies (e.g. because we have multiple modules with shared components that depend on each other), this would become a much bigger problem.

Naming

This proposal proposes the following new names, but we're not very attached to these names and welcome ideas for better ones:

CodexModule: The subclass of ResourceLoader\Module that is used by modules that embed Codex components. This class already exists, but currently serves a different purpose (it's used for the codex-styles and codex-search-styles modules, and houses the getIcons function)
codexComponents: The key in the module definition that lists the components used in the module. We could rename this to reflect the fact that things that are not components (composables and utility functions) can also be listed here; unfortunately we don't yet have a good generic term that covers "component, composable or utility function".
codexStyleOnly: The key in the module definition that indicates that this is a style-only module, and only the CSS of the requested Codex components should be embedded.
codex-subset.js: The name of the virtual file generated by CodexModule that contains the requested Codex components (in practice, this is a wrapper file that require()s the requested components from other files

Migration

Once this feature is introduced, we should deprecate and then remove the current @wikimedia/codex-search and codex-search-styles modules. But should we also deprecate and remove the main @wikimedia/codex and codex-styles modules, and force all uses of Codex in MediaWiki to use this system?

Magic behavior

Does it make sense for the require() call in these modules to be require( './codex-subset.js' )? Or would it make more sense to use require( '@wikimedia/codex' )? We chose the former because it seemed confusing to require from @wikimedia/codex when there is already an RL module by that name (and it would have required subverting some RL internals).

Does it make sense for the codex-subset.js file to magically appear, without being listed in packageFiles? Should it always appear in the root directory of the module? Or should we automatically detect the right path for it, by making it a sibling of the entry point file? Or should we allow (or require?) the developer to specify the name/path of this file?

Should modules using CodexModule have to explicitly specify a dependency on 'vue', or should this be added automatically, since the embedded Codex code already depends on Vue?

Rejected alternatives

One module per component

If we made every Codex component its own ResourceLoader module, everything would be a lot simpler: features could just use ResourceLoader module dependencies to pull in exactly those components they want, and ResourceLoader's module loading system would ensure reuse and prevent duplicate loading. However, there are currently 29 components in Codex, 7 composables, and 4 other chunks of code that are shared between components, so we would need to create 40 modules. To support CSS-only use of Codex, each component would to be split into two modules, a style-only module and a JS module that depends on it; this would increase the total number of modules to 69.

ResourceLoader is not designed to be used this way: there is a performance impact associated with creating this many modules, and much work has gone into reducing the number of modules. For this reason, we didn't think that creating 69 new modules (and more over time, as more Codex components are created) would be acceptable. The style-only modules would also have complex dependency relationships between each other, which ResourceLoader does not support.

Fundamentally, code splitting presents an iron triangle-style trade-off (a triple constraint). There are three desirable properties: tree-shaking (not loading unused code), deduplication (not loading code twice), and a low module count. Any two of these can be satisfied perfectly, but only by completely discarding the third. The "one module per component" approach achieves perfect tree-shaking and perfect deduplication, but requires the highest number of RL modules. Embedding components in the module that uses them achieves perfect tree-shaking and requires zero additional modules, but does not achieve deduplication at all. The status quo of every feature loading the entire Codex library achieves perfect deduplication and requires only two modules, but does not achieve tree-shaking at all. The proposed solution attempts to find a middle ground where all three properties are mostly but imperfectly satisfied: tree-shaking is achieved mostly but not perfectly (some features may load a core component but not use it), deduplication is achieved mostly but not perfectly (if two features appear on the same page and share a non-core component, that component will be loaded twice), and the number of additional RL modules required is low but not zero. We propose this solution because we think the theoretical imperfections will rarely come up in practice, and are an acceptable price to pay for significantly reducing the number of modules required.

More feature-specific builds within Codex

The current @wikimedia/codex-search module is built by Codex's build system, and published as a separate NPM package. It's designed to serve a particular use case where Codex appears on every page. We could expand Codex's build system to build more packages like this, with various subsets of the library needed for various use cases. We don't propose this because it scales poorly; because MediaWiki-specific usage details should not be embedded in Codex; and because these builds would duplicate parts of each other and of the full library.

The duplication issue could be addressed by making Codex build subsets of the library that require() each other for deduplication, but this would substantially increase the number of ResourceLoader modules required, even for a relatively small number of subsets. This is because each subset needs 2 modules (one for JS, one for CSS-only), and because deduplicating chunks of code that are shared between subsets would require additional modules to be created.

Build step in MediaWiki

We could do tree-shaking of the Codex library in MediaWiki itself (and/or in extensions that use Codex), using a build tool like Rollup or Vite. But this is equivalent to the "embed components in the modules that use them" approach, with the same lack of deduplication; to avoid deduplication, some sort of coordination between extensions that use Codex has to take place. Introducing a build step in MediaWiki has also run into other problems and objections when proposed in the past.

Related efforts

Once the Vue 3 migration is completed and we can switch from the migration build of Vue to the regular build, this will reduce the size of Vue from ~57 KB compressed to ~50 KB compressed.

If we were able to use a build step or some other mechanism to compile Vue templates to JavaScript (or at least do so in performance-sensitive places), we could load the runtime-only build of Vue. This would reduce the size of Vue further, from ~50 KB compressed to ~33.5 KB compressed.

Related Objects
Search...

Status	Assigned	Task
Open	None	T186850 Wikimedia-deployed extensions/skins with no PHPUnit coverage
Stalled	None	T315792 RelatedArticles has no PHPUnit tests
Resolved	Jdlrobson	T286835 Port RelatedArticles to Codex
Resolved	CCiufo-WMF	T335317 [EPIC] Determine how to support code-splitting when using Codex inside MediaWiki
Resolved	Catrope	T344386 Decide how to implement code splitting in Codex, and how to integrate it in ResourceLoader

Event Timeline

Catrope created this task.Aug 16 2023, 10:41 PM

Restricted Application added a project: Design-System-Team. · View Herald TranscriptAug 16 2023, 10:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

egardner subscribed.Aug 16 2023, 10:45 PM

CCiufo-WMF awarded a token.Aug 17 2023, 2:52 AM

CCiufo-WMF subscribed.

AnneT subscribed.Aug 17 2023, 1:47 PM

CCiufo-WMF moved this task from Inbox to Needs Refinement on the Design-System-Team board.Aug 17 2023, 1:51 PM

egardner awarded a token.Aug 17 2023, 4:48 PM

ovasileva subscribed.Aug 17 2023, 6:23 PM

Catrope mentioned this in T343144: [Spike] Explore integrating Vite (Codex's build tool) into ResourceLoader.Aug 17 2023, 9:12 PM

Catrope mentioned this in T343141: [Spike] Code Splitting: Explore "package for every component" approach.

Catrope added a project: MediaWiki-ResourceLoader.Aug 21 2023, 4:12 PM

Restricted Application added a project: MediaWiki-Platform-Team. · View Herald TranscriptAug 21 2023, 4:12 PM

CCiufo-WMF moved this task from Needs Refinement to Backlog on the Design-System-Team board.Aug 21 2023, 5:12 PM

eamedina subscribed.Aug 21 2023, 5:23 PM

CCiufo-WMF added a parent task: T335317: [EPIC] Determine how to support code-splitting when using Codex inside MediaWiki.Aug 21 2023, 7:57 PM

CCiufo-WMF mentioned this in T335792: Prototype a code-splitting solution against a known use case.Aug 21 2023, 8:20 PM

Proposal

Features that use Codex would list the components they need in their ResourceLoader module definition. ResourceLoader would then embed the JS for these components (and the components they depend on) in the contents of that module as a packageFile, and add the CSS for these components to the module's styles. This ensures that each feature loads exactly the components it needs, and no more.

Simple example

In extension.json (or Resources.php), use the CodexModule class for the RL module that uses Codex, and list the Codex components the module uses:
"ResourceModules": {
    "ext.foo.myfeature": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "packageFiles": [
            "myfeature/init.js",
            "myfeature/App.vue",
        ],
        "codexComponents": [
            "CdxButton",
            "CdxCheckbox",
            "CdxField",
            "CdxIcon",
            "CdxLabel",
            "CdxRadio",
            "CdxToggleSwitch"
        ],
        "dependencies": [
            "vue"
        ]
    }
}
(…)

Magic behavior

Does it make sense for the require() call in these modules to be require( './codex-subset.js' )? Or would it make more sense to use require( '@wikimedia/codex' )? We chose the former because it seemed confusing to require from @wikimedia/codex when there is already an RL module by that name (and it would have required subverting some RL internals).

Does it make sense for the codex-subset.js file to magically appear, without being listed in packageFiles? Should it always appear in the root directory of the module? Or should we automatically detect the right path for it, by making it a sibling of the entry point file? Or should we allow (or require?) the developer to specify the name/path of this file?

I'd prefer something less magical.

I think it would be more understandable if you put the list of required components in the definition of the "package file", rather than the module, like this:

"ResourceModules": {
    "ext.foo.myfeature": {
        "packageFiles": [
            "myfeature/init.js",
            "myfeature/App.vue",
            {
                "name": "myfeature/codex-subset.js",
                "callback": "MediaWiki\\ResourceLoader\\CodexModule::getCodexComponents",
                "callbackParam": [
                    "CdxButton",
                    "CdxCard"
                ]
            }
        ],
        "dependencies": [
            "vue",
            "codex-core"
        ]
    }
}

This way you have a natural place to specify the path and name, and it's obvious how the magic file is generated. You also avoid having to specify "class": "MediaWiki\\ResourceLoader\\CodexModule", allowing you to use other FileModule subclasses that provide specific features (e.g. LessVarFileModule). Basically this would be composition instead of inheritance :)

On the other hand, this makes it more difficult to handle the deduplication problems with style-only modules and dependencies, since the callback function can't examine the rest of the module definition (unless you add some new way to allow it…). On the other other hand, we may not need that, depending on how you decide to solve (or not) those problems. (described below)

Deduplication

(…)

In this example, ext.foo.myfeature would embed the Card component (which is not in the core components module), but would not embed the Button component (it would instead get it from the core components module). It would also embed Thumbnail (which is needed by Card and is not a core component), but it would not embed Icon (also needed by Card, but it's in the core components module).

(…)

A feature that uses CSS-only components initially, but then replaces them with Vue components when JS loads, could create a style-only module and a JS module, like this:

(…)

The JS module would embed the JS of the TypeaheadSearch component, but not its CSS, because it would detect that that is already provided by the style-only module.

I think this needs more attention, since it seems more complicated to me than you describe. How does the module figure out which components it doesn't need to embed? Does it walk the entire dependency tree of itself? This would make its definition depend on the definitions of all the other modules, which may be okay, but we haven't done that before (and it would also affect how cache invalidation works). Or does it use a constant list defined somewhere centrally? This would limit the deduplication when building some reusable components on top of other components, which may also be okay (but we used that pattern with OOUI / mediawiki.widgets, for widgets that depend on MediaWiki localisation or configuration, so it seems worth considering how you'd do that with Codex).

Migration

(…) But should we also deprecate and remove the main @wikimedia/codex and codex-styles modules, and force all uses of Codex in MediaWiki to use this system?

No, because you couldn't use the new system from gadgets and user scripts. :)

Rejected alternatives

One module per component

(…)

Fundamentally, code splitting presents an iron triangle-style trade-off (a triple constraint). There are three desirable properties: tree-shaking (not loading unused code), deduplication (not loading code twice), and a low module count. Any two of these can be satisfied perfectly, but only by completely discarding the third. (…)

There was once an experiment that would allow having one module per component, without adding more RL modules, by trading off the fourth corner of the triangle – cache invalidation: change 347442: ResourceLoader: Add wildcard modules. To be precise, all of the Codex submodules would share a single version hash, and changing any of them would invalidate the cache for all of them. I'm not really proposing this entirely seriously, since it'd be a big change, but if folks dislike the current proposal, this could also be considered.

In T344386#9107665, @matmarex wrote:
Does it make sense for the codex-subset.js file to magically appear, without being listed in packageFiles? Should it always appear in the root directory of the module? Or should we automatically detect the right path for it, by making it a sibling of the entry point file? Or should we allow (or require?) the developer to specify the name/path of this file?

I'd prefer something less magical.

I think it would be more understandable if you put the list of required components in the definition of the "package file", rather than the module, like this:
"ResourceModules": {
    "ext.foo.myfeature": {
        "packageFiles": [
            "myfeature/init.js",
            "myfeature/App.vue",
            {
                "name": "myfeature/codex-subset.js",
                "callback": "MediaWiki\\ResourceLoader\\CodexModule::getCodexComponents",
                "callbackParam": [
                    "CdxButton",
                    "CdxCard"
                ]
            }
        ],
        "dependencies": [
            "vue",
            "codex-core"
        ]
    }
}
This way you have a natural place to specify the path and name, and it's obvious how the magic file is generated.

This does appeal to me, and I considered it, but...

You also avoid having to specify "class": "MediaWiki\\ResourceLoader\\CodexModule", allowing you to use other FileModule subclasses that provide specific features (e.g. LessVarFileModule). Basically this would be composition instead of inheritance :)

I don't think we could do this part. The (current draft) implementation adds additional files to packageFiles (because we receive a bunch of chunk files from Codex that expect to be able to require() each other by relative file path), and you can't do that from a file contents callback. I guess we could do it by locally faking / monkey-patching the require() function within the generated contents of that one virtual file, but that feels ugly. However, I do think we could implement the API you suggest (or something like it, where the codex-subset file is a virtual file in packageFiles) as long as we also set "class": "CodexModule".

On the other hand, this makes it more difficult to handle the deduplication problems with style-only modules and dependencies, since the callback function can't examine the rest of the module definition (unless you add some new way to allow it…). On the other other hand, we may not need that, depending on how you decide to solve (or not) those problems. (described below)

You're right that needing to examine the rest of the module definition is needed to do the deduplication stuff (that needs to look at the dependencies), and that's probably a bigger problem, but I think needing to cram all this code into one virtual file rather than being able to create many virtual files is also significant.

Deduplication

(…)

In this example, ext.foo.myfeature would embed the Card component (which is not in the core components module), but would not embed the Button component (it would instead get it from the core components module). It would also embed Thumbnail (which is needed by Card and is not a core component), but it would not embed Icon (also needed by Card, but it's in the core components module).

(…)

A feature that uses CSS-only components initially, but then replaces them with Vue components when JS loads, could create a style-only module and a JS module, like this:

(…)

The JS module would embed the JS of the TypeaheadSearch component, but not its CSS, because it would detect that that is already provided by the style-only module.

I think this needs more attention, since it seems more complicated to me than you describe. How does the module figure out which components it doesn't need to embed? Does it walk the entire dependency tree of itself? This would make its definition depend on the definitions of all the other modules, which may be okay, but we haven't done that before (and it would also affect how cache invalidation works). Or does it use a constant list defined somewhere centrally? This would limit the deduplication when building some reusable components on top of other components, which may also be okay (but we used that pattern with OOUI / mediawiki.widgets, for widgets that depend on MediaWiki localisation or configuration, so it seems worth considering how you'd do that with Codex).

You're right that I hand-waved a bit here, and we haven't tried implementing a proof of concept yet (maybe it would be useful for us to try that). I was thinking that yes, we would have the module walk the dependency tree of itself, and ask the ResourceLoader object for the Module objects corresponding to each of those dependencies. If it found one that was instanceof CodexModule, it could then call a CodexModule-specific method on it to ask it which components it is already taking care of. If we wanted to make this simpler (making it less versatile but probably also less likely to break), we could hard-code a list of core components in CodexModule; or perhaps hard-code the list of modules that can be deduplicated from (probably just codex-core), and only make deduplication work for direct dependencies, to avoid having to walk the dependency tree.

Migration

(…) But should we also deprecate and remove the main @wikimedia/codex and codex-styles modules, and force all uses of Codex in MediaWiki to use this system?

No, because you couldn't use the new system from gadgets and user scripts. :)

...yes, good point. Someone else brought this up but I forgot to remove this open question before I posted it. Use of Codex (or really, use of Vue) isn't well supported in gadgets and user scripts right now, and we'd probably have to remove some barriers before it would be feasible. But it makes sense not to add an additional barrier that we'd then have to work around later.

Rejected alternatives

One module per component

(…)

Fundamentally, code splitting presents an iron triangle-style trade-off (a triple constraint). There are three desirable properties: tree-shaking (not loading unused code), deduplication (not loading code twice), and a low module count. Any two of these can be satisfied perfectly, but only by completely discarding the third. (…)

There was once an experiment that would allow having one module per component, without adding more RL modules, by trading off the fourth corner of the triangle – cache invalidation: change 347442: ResourceLoader: Add wildcard modules. To be precise, all of the Codex submodules would share a single version hash, and changing any of them would invalidate the cache for all of them. I'm not really proposing this entirely seriously, since it'd be a big change, but if folks dislike the current proposal, this could also be considered.

Oooh, that's an interesting possibility! I had concerned the private modules experiment, but that one sacrificed deduplication. In my mind I thought of wildcard modules as just another kind of private modules, but you're right that it's different because the version hash is shared, and because all "private" modules' names are described by the wildcard. The cache invalidation thing shouldn't be an issue at all I think: MediaWiki treats Codex as an external library managed by foreign-resources.yaml, and only updates it when a Codex release is published (currently this happens every 2 weeks). So we would expect all Codex submodules to change at the same time anyway. (In theory this could be wasteful if a release didn't modify some of the submodules, but in practice that isn't common. The wastefulness is minimal anyway, and it's already present in the current system where all of Codex is in one module.)

I know you weren't fully serious, and maybe the implementation of this wouldn't be wildcard modules exactly, but maybe I'll try to code up a basic proof of concept for this approach. If it really would let us have all three corners of the triangle while only having a minor impact on cache invalidation (which wouldn't be worse than the status quo anyway), I think that may well be preferable to the original proposal.

larissagaulia moved this task from Inbox, needs triage to Backlog: non-prioritized on the MediaWiki-Platform-Team board.Aug 22 2023, 9:58 AM

Catrope updated the task description. (Show Details)Aug 22 2023, 4:48 PM

Catrope updated the task description. (Show Details)Aug 22 2023, 5:17 PM

ngkountas subscribed.Aug 23 2023, 8:23 AM

Michael subscribed.Aug 23 2023, 3:06 PM

Sgs subscribed.Aug 23 2023, 3:06 PM

Mooeypoo subscribed.Aug 24 2023, 2:46 PM

Posting Web Team's feedback and questions here (From @Jdrewniak @bwang and I) Thanks!

The one module per component option would be ideal if it's actually possible with wildcard modules. It would reduce a lot of complexity and would be simpler to use.

Regarding the codex core idea, it’s tricky to know when to pull in codex-core vs. when to use individual components. e.g: If you’re using a button, then you only load the button, but then when someone else loads the button, you’re both better off loading the shared codex-core. Usage seems to depend on the context for a feature, if other features on that page also need similar components, it makes sense to use codex-core.

For the example of a “feature that uses CSS-only components initially, but then replaces them with Vue components when JS loads”, will that also work with codex-core?

Moving to Following given the current action is just about requesting and responding to feedback.

kostajh subscribed.Aug 30 2023, 7:17 AM

I looked into the wildcard modules suggestion, and ultimately I don't think it solves this problem. It avoids registering one module per component, but the module names for each component (things like @wikimedia/codex.CdxButton) still appear in the list of dependencies of modules that use Codex. This doesn't solve the fundamental issue behind registering lots of modules, which is bloating the manifest in the startup module with all these module names (normally dependencies are encoded as numbers instead of strings, but that only works for referring to modules that are already in the manifest).

However, this inspired me to prototype a different approach that I think would satisfy all four corners of the triangle:

Modules would be able to provide submodules. These are subsets of the module's packageFiles. Submodules can express dependencies on each other, and can require() each other's files.
Specifically, the @wikimedia/codex module would define a submodule for each of its components and chunks
Other modules can depend either on the entire module, or only on some of its submodules (a partial dependency). Modules using Codex would use this to depend on only the submodules they need.
If module A (e.g. a feature) has a partial dependency on module B (e.g. Codex), then when loading A, the client-side loader would not request A+B from the server, but only A.
The server would respond with the contents of A, plus the contents of the submodules of B that A needs
If module C is loaded later and it also has a partial dependency on B, the client-side loader would not request C+B, but it would only request C, and it would also tell the server that it has already loaded A.
The server would then respond with the contents of C, and the submodules of B that are needed by C but have not already been loaded because of A

This is more or less a variant of T225842: Allow ResourceLoader modules to be "private", bundle them with their dependent modules, except that this operates on subsets of modules who can require each other by file name. I've uploaded a (very hacky, but functional) proof of concept of this approach at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/953358

Under this approach, modules using a subset of Codex would put something like this in their module definition:

"dependencies": [
    "@wikimedia/codex"
],
"dependenciesWithSubmodules": {
    "@wikimedia/codex": [
        "CdxButton",
        "CdxCard"
    ]
}

They would then access Codex components the normal way:

const { CdxButton, CdxCard } = require( '@wikimedia/codex' );

Compared with the original proposal, I see the following pros and cons:

Pro: (Slightly) nicer developer experience, because you can use require( '@wikimedia/codex' ) instead of require( './codex-subset.js' ). But this is something we could potentially address in the original proposal too
Pro: No need for a codex-core module or anything like that
Pro: Perfect deduplication
Con: More complex to implement
Con: Requires a lot of changes to ResourceLoader internals

Catrope updated the task description. (Show Details)Aug 30 2023, 11:11 PM

Note any solution here should ideally be portable to outside MediaWiki - for example code written in MediaWiki should work in a Node.js environment. The Nearby extension packages up a Nearby App (https://wikipedia-nearby.netlify.app/) so it's important to me that we keep this working. This is also npm start in the Nearby extension if you need to test it.

In T344386#9150868, @Jdlrobson wrote:

Note any solution here should ideally be portable to outside MediaWiki - for example code written in MediaWiki should work in a Node.js environment. The Nearby extension packages up a Nearby App (https://wikipedia-nearby.netlify.app/) so it's important to me that we keep this working. This is also npm start in the Nearby extension if you need to test it.

That's a good point, and it's an argument for why we should not make people use require( './codex-subset.js' ) to get Codex modules, but instead continue to support require( '@wikimedia/codex' ). Even if that only returns a subset of Codex (if the module is configured to only load certain components), that's probably less confusing than having the code be portable.

The submodules approach would support this out of the box; for the original proposal, we could support this if we added support for per-module aliases to ResourceLoader, so that @wikimedia/codex could be aliased to ./codex-subset.js for modules that use this feature.

Having thought a bit more about the submodules approach proposed in T344386#9132451 , I think it might be better to instead build something equivalent on top of the previously proposed private modules system (T225842), with a (private) module for each component. Private modules are a more general concept than submodules, which other things besides Codex could also benefit from, and there's a previously discussed and agreed-on implementation strategy for it, documented on the task. For Codex, I think we could combine private modules and per-module aliases to achieve code splitting for Codex, as follows:

Implement private modules per T225842, but add the "already loaded modules" feature for deduplication
Implement some sort of per-module alias support: aliases: { "foo": "bar" } would mean that require( 'foo' ) would not return the foo module, but would instead return the bar module
Register a private module for each Codex component, with names like @wikimedia/codex/CdxButton, @wikimedia/codex/CdxTextInput, etc., with the appropriate dependency relationships between them. For these to be able to require() each other, they'd need to have an alias config that looks something like "aliases": { "./CdxButton.js": "@wikimedia/codex/CdxButton" }
A module using Codex would get components from Codex the normal way list the components it needs in its module definition like this:

"ext.myModule": {
    "class": "MediaWiki\\ResourceLoader\\CodexModule",
    "packageFiles": [
        "index.js",
        "App.vue",
    ],
    "codexComponents": [
        "CdxButton",
        "CdxTextInput"
    ]
}

and use them like this:

// In App.vue
const { CdxButton, CdxTextInput } = require( '@wikimedia/codex' );

to make everything work, the CodexModule class would then transform this module definition to something like this:

"ext.myModule": {
    "packageFiles": [
        "index.js",
        "App.vue",
       { "name": "_codex_subset.js", "contents": "module.exports = { CdxButton: require( '@wikimedia/codex/CdxButton' ), CdxTextInput: require( '@wikimedia/codex/CdxTextInput' ) };" }
    ],
    "dependencies": [
        "@wikimedia/codex/CdxButton",
        "@wikimedia/codex/CdxTextInput"
    ],
    "aliases": {
        "@wikimedia/codex": "./_codex_subset.js"
    }
}

The @wikimedia/codex module would contain the full library, using the same features above (maybe something like "codexComponents": [ "*" ]

Catrope mentioned this in T345687: Make Codex ESM build tree-shakeable.Sep 11 2023, 5:19 PM

From the Task description:

Magic behavior

Does it make sense for the require() call in these modules to be require( './codex-subset.js' )?

I agree with Bartosz regarding the discoverability and "surprise" factor of the added file. Having it explicitly listed in packageFiles, even if internally there's more to it, would imho be preferred.

If I understand correctly, the broader direction here (to create a bundled file and expose it to the developer as such), is based on the assumption that using @wikimedia/codex as-is is not feasible. Is that true? It seems at glance, that the module could exist and dynamically expose whatever has been promised by other modules loaded so far. I suspect you did consider it, but the cons are not listed here. Writing that down for future reference would help.

Style-only modules can't have dependencies, see T191652 (in particular T191652#4117599 explaining why this restriction exists).
[…]. This means that CodexModule can't deduplicate them […], and that developers loading these modules have to manually remember to load both their module and codex-core-styles.

I assume the reason a special page or extension to want to queue these styles themselves, is that they output a chunk of HTML that needs those styles. Is that right? We currently have two examples (HTMLForm, and OOUI) that manage something similar by letting each component take responsibility for queueing its own (style) modules. Would something like that work here?

For cases where HTML is effectively inlined based on the Codex CSS-only API (i.e. not OOP), then another approach we could consider is Skin::getDefaultModules which scans for various HTML-based capabilities like mw-ui, sortable, and collapsible. Perhaps that could take care of queueing Codex CSS?

In T344386#9132451, @Catrope wrote:
[…] this inspired me to prototype a different approach that I think would satisfy all four corners of the triangle:
[…]
I've uploaded a (very hacky, but functional) proof of concept of this approach at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/953358

Under this approach, modules using a subset of Codex would put something like this in their module definition:
"dependencies": [
    "@wikimedia/codex"
],
"dependenciesWithSubmodules": {
    "@wikimedia/codex": [
        "CdxButton",
        "CdxCard"
    ]
}
[…]

Pro: (Slightly) nicer developer experience, because you can use require( '@wikimedia/codex' ) instead of require( './codex-subset.js' ). But this is something we could potentially address in the original proposal too

[…]

Pro: Perfect deduplication

Con: Requires a lot of changes to ResourceLoader internals

This sounds similar to the CodexModule approach where a property codexComponents controls which components to bundle. Apart from "Perfect deduplication" (as gained by passing client loadedSubmodules-state), are there other differences between these approach?

Could CodexModule ship its bundled components and expose them via @wikimedia/codex, without changes to RL internals? To me it seems like we could. The one sticking point would be how gadgets load the full module. A workaround for that might be a module like @wikimedia/codex-complete that would effectively be a CodexModule that ships all components.

From the Task description:

Deduplication

We propose manually curating a list of core components that are likely to overlap […]

This approach seems most pragmatic to me. It's easy to reason about and seems like a good balance between usability and performance, and yet with fairly low complexity. I assume this is part of a different proposal than the last two described above, as this would not require any bundling of components within the consuming feature, right?

There would internally be effectively two modules: "codex-common" and "codex-complete" where the latter depends on the former + adds the other components. To abstract this from developers, we need to let them indicate which components they need (CodexModule and codexComponents property) which then translates that to a dependency on one vs the other.

If we combine that with a way to expose them consistently via require('@wikimedia/codex'), I think that ticks all our boxes. Is that achievable without RL changes? Would it work if one of two modules adopted that name itself? Would it work if we had a third module to manage the interface? If not, what would be a minimal RL change to faccilitate this?

In T344386#9162178, @Krinkle wrote:

From the Task description:

Magic behavior

Does it make sense for the require() call in these modules to be require( './codex-subset.js' )?

I agree with Bartosz regarding the discoverability and "surprise" factor of the added file. Having it explicitly listed in packageFiles, even if internally there's more to it, would imho be preferred.

If I understand correctly, the broader direction here (to create a bundled file and expose it to the developer as such), is based on the assumption that using @wikimedia/codex as-is is not feasible. Is that true? It seems at glance, that the module could exist and dynamically expose whatever has been promised by other modules loaded so far. I suspect you did consider it, but the cons are not listed here. Writing that down for future reference would help.

I couldn't really figure out how to do this, at least not without either implementing a submodules-like system, or violating the assumption that when a module lists @wikimedia/codex as a dependency without indicating that it wants a subset, it expects to get the full library. Maybe we could make @wikimedia/codex itself a placeholder module that just exposes what other modules have loaded so far, and give the module that contains the full library a different name, like codex or codex-all?

Style-only modules can't have dependencies, see T191652 (in particular T191652#4117599 explaining why this restriction exists).
[…]. This means that CodexModule can't deduplicate them […], and that developers loading these modules have to manually remember to load both their module and codex-core-styles.

I assume the reason a special page or extension to want to queue these styles themselves, is that they output a chunk of HTML that needs those styles. Is that right?

Yes

We currently have two examples (HTMLForm, and OOUI) that manage something similar by letting each component take responsibility for queueing its own (style) modules. Would something like that work here?

It could work if we had a separate styles module for every Codex component, but that causes other issues: there would be too many of them so they would have to be private, but then they can't be depended on by script modules, leading to duplicate loading. This is really only an issue for deduplication across modules loaded on the same page though. We wouldn't have this problem at all if we just embedded each component's styles in the modules that used them, or in a submodules-like approach; and if we created a codex-core module, the problem would be fairly manageable (we'd have to ensure that code that calls addModuleStyles( 'ext.something.that.uses.codex' ) also calls addModuleStyles( 'codex-core' )).

For cases where HTML is effectively inlined based on the Codex CSS-only API (i.e. not OOP), then another approach we could consider is Skin::getDefaultModules which scans for various HTML-based capabilities like mw-ui, sortable, and collapsible. Perhaps that could take care of queueing Codex CSS?

That is an interesting idea, but I think that does require private per-component style-only modules, so that the CSS for the right components can be assembled dynamically. I would prefer having style-only modules for each feature that does this where the components being used are listed explicitly.

In T344386#9132451, @Catrope wrote:

I've uploaded a (very hacky, but functional) proof of concept of this approach at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/953358
[...]

This sounds similar to the CodexModule approach where a property codexComponents controls which components to bundle. Apart from "Perfect deduplication" (as gained by passing client loadedSubmodules-state), are there other differences between these approach?

They are very similar. The two main differences are the loadedSubmodules stuff you mention (which requires a lot of modifications to RL internals), and the fact that it's easier to make require( '@wikimedia/codex' ) work for partial usage in this approach. You previously mentioned the idea of having @wikimedia/codex being a placeholder/collector module that just exposes what other modules have loaded for it; this approach would basically do that, but it would be implemented as a core RL feature rather than in userland (the code of the module itself).

Could CodexModule ship its bundled components and expose them via @wikimedia/codex, without changes to RL internals? To me it seems like we could. The one sticking point would be how gadgets load the full module. A workaround for that might be a module like @wikimedia/codex-complete that would effectively be a CodexModule that ships all components.

Right, exactly, we'd lose the ability to use the @wikimedia/codex name to mean "load the full library", and we'd have to designate a different name for that. But I think that would be an acceptable trade-off. It would also make the CodexModule approach forwards compatible with any future submodule-like or private-modules-based approach, because we'd be able to keep using require( '@wikimedia/codex' ) everywhere.

(This would also be a breaking change for modules that currently depend on @wikimedia/codex, so we'd have to find and fix them all, or come up with some other way that allows the old pattern of "depend on @wikimedia/codex and you'll get the full library" to still work.)

From the Task description:

Deduplication

We propose manually curating a list of core components that are likely to overlap […]

This approach seems most pragmatic to me. It's easy to reason about and seems like a good balance between usability and performance, and yet with fairly low complexity. I assume this is part of a different proposal than the last two described above, as this would not require any bundling of components within the consuming feature, right?

I intended it as complimentary to the first proposal (CodexModule). The second proposal (submodules) already deduplicates perfectly, so it doesn't need this, and a codex-core module would hurt rather than help. The first proposal (CodexModule) doesn't do deduplication at all, and we could mitigate that by adding a codex-core or codex-common module that deduplicates the most commonly used components (at the cost of potentially loading some components that won't be used, but we're assuming that core/common components are so common that that's not much of a concern). This module of core/common components could be added on top of the CodexModule approach, and could be added later as an optimization once we have figured out which components are most commonly used in practice.

There would internally be effectively two modules: "codex-common" and "codex-complete" where the latter depends on the former + adds the other components. To abstract this from developers, we need to let them indicate which components they need (CodexModule and codexComponents property) which then translates that to a dependency on one vs the other.

We could do this, but I don't think it would be very beneficial, because it would still load the whole library in many cases. I'd prefer to instead start with the CodexModule embedding approach, and then refine it later by adding a codex-common module.

Summary of how I propose changing this proposal in response to Krinkle's feedback above:

Implement the CodexModule approach
- but get rid of ./codex-subset.js
- instead make @wikimedia/codex a collector module that exposes components that have been loaded
- so that components can be imported with const { CdxWhatever } = require( '@wikimedia/codex' );
- and provide a different way of loading the entire library (maybe under a different module name like codex-all)
Some time after that, create a codex-core or codex-common module to deduplicate components that are commonly loaded twice on the same page
- This can happen later, it doesn't have to be part of the initial CodexModule implementation
- This would require some changes to the CodexModule code to allow it to change its behavior in response to the module's dependencies
If and when private modules are implemented at some time in the future, consider re-implementing CodexModule on top of the private modules system instead
- We'd keep the CodexModule class and the @wikimedia/codex collector module, so the API for Codex-using modules would not change
- We'd (dynamically) register a private module for every component (or use a wildcard module if/when that functionality exists)
- The CodexModule implementation would change to add dependencies pointing to those private modules, instead of embedding component implementations

I think this is all very doable and doesn't require changes to ResourceLoader itself. The only thing I'm not super happy with yet is breaking backwards compatibility for modules that depend on @wikimedia/codex directly; maybe there's a creative way that we can mitigate that.

One concrete use case I have that is blocking the RelatedArticles port in T286835 is that I need a way to load just the Card component styles on the page via JavaScript (see https://gerrit.wikimedia.org/r/c/mediawiki/extensions/RelatedArticles/+/933201). I understand in future I'll be able to load a private module with just the model (maybe @wikimedia/codex/Card) but could you explain to me what that would look like in the first version (would it need to load the whole of @wikimedia/codex styles e.g. codex-styles?

In T344386#9170940, @Jdlrobson wrote:

One concrete use case I have that is blocking the RelatedArticles port in T286835 is that I need a way to load just the Card component styles on the page via JavaScript (see https://gerrit.wikimedia.org/r/c/mediawiki/extensions/RelatedArticles/+/933201). I understand in future I'll be able to load a private module with just the model (maybe @wikimedia/codex/Card) but could you explain to me what that would look like in the first version (would it need to load the whole of @wikimedia/codex styles e.g. codex-styles?

For the first version, you would be able to do something like this and get *only* the Codex card styles:

"ResourceModules": {
    "ext.foo.cssonlyfeature": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "styles": [
            "cssonlyfeature.less"
        ],
        "codexComponents": [
            "CdxCard"
        ],
        "codexStyleOnly": true
    }
}

If there was a "hydrated" version of this feature which contained live Codex components for clients with JS enabled, you could create an additional module which depends on this first one:

"ResourceModules": {
    "ext.foo.hydratedfeature": {
        "class": "MediaWiki\\ResourceLoader\\CodexModule",
        "codexComponents": [
            "CdxCard",
            "CdxMessage",
            "CdxButton"
        ],
        "dependencies": [
            "ext.foo.cssonlyfeature"
        ]
    }
}

Jdlrobson awarded a token.Sep 15 2023, 10:50 PM

In T344386#9168661, @Catrope wrote:

I think this is all very doable and doesn't require changes to ResourceLoader itself. The only thing I'm not super happy with yet is breaking backwards compatibility for modules that depend on @wikimedia/codex directly; maybe there's a creative way that we can mitigate that.

Could you default to providing the entire library if codexComponents isn't specified?

In T344386#9175360, @AnneT wrote:

In T344386#9168661, @Catrope wrote:

I think this is all very doable and doesn't require changes to ResourceLoader itself. The only thing I'm not super happy with yet is breaking backwards compatibility for modules that depend on @wikimedia/codex directly; maybe there's a creative way that we can mitigate that.

Could you default to providing the entire library if codexComponents isn't specified?

Yes, but it would be tricky to make that work without "class": "CodexModule" being specified. (We could put a hack for this in RL somewhere though, to automatically add the CodexModule class to modules that depend on @wikimedia/codex and don't set codexComponents.)

Krinkle moved this task from Inbox to Accepted Enhancement on the MediaWiki-ResourceLoader board.Sep 25 2023, 3:32 PM

Jdlrobson added a parent task: T286835: Port RelatedArticles to Codex.Oct 3 2023, 12:21 AM

Jdlrobson mentioned this in T286835: Port RelatedArticles to Codex.

Lens0021 subscribed.Oct 3 2023, 1:13 AM

Catrope mentioned this in T349423: [EPIC] Implement Codex code splitting in ResourceLoader\CodexModule.Oct 21 2023, 12:36 AM

CCiufo-WMF mentioned this in T349541: [Spike] Outline next steps for code splitting.Oct 23 2023, 5:00 PM

Thanks for the discussion everyone! Since the discussion is done and we have a plan, I'm going to close this task. I've opened T349423 for the implementation of this plan (the first stage at least; I'll file another task for the core components module part of this plan once we get closer to implementing it).

CCiufo-WMF mentioned this in T350040: Review Codex module implementation for ResourceLoader.Oct 30 2023, 2:39 PM

Catrope mentioned this in T350056: Code splitting: dependency deduplication of CodexModule.Feb 15 2024, 7:56 PM

Catrope mentioned this in T357836: [Spike] Explore one-module-per-component approach to code splitting.Feb 16 2024, 11:09 PM

Decide how to implement code splitting in Codex, and how to integrate it in ResourceLoaderClosed, ResolvedPublicActions

Description

Current situation

Proposal

Simple example

Deduplication

Deduplication example

CSS-only modules

Proof of concept implementation

Open questions / issues

Style-only modules

Naming

Migration

Magic behavior

Rejected alternatives

One module per component

More feature-specific builds within Codex

Build step in MediaWiki

Related efforts

Related ObjectsSearch...

Event Timeline

Proposal

Simple example

Magic behavior

Deduplication

Migration

Rejected alternatives

One module per component

Deduplication

Migration

Rejected alternatives

One module per component

Decide how to implement code splitting in Codex, and how to integrate it in ResourceLoader
Closed, ResolvedPublic
Actions

Related Objects
Search...