Page MenuHomePhabricator

Consider including a JS runtime as part of MediaWiki
Closed, DeclinedPublic5 Estimated Story Points

Description

Background

MediaWiki has existed as a classic LAMP-stack, PHP-based application for roughly 20 years at this point. One long-standing goal in the development of MW has been to ensure that the core software can run on a shared-hosting environment, where the site administrator may not have the ability to install additional packages. Practically speaking, this means that we tend to do as much as possible in PHP.

Limitations of our current approach

Ensuring that MediaWiki remains viable on shared hosting platforms is an important goal, but our decision to rely exclusively on PHP for core functionality also imposes some consequences that we must work around, most notably in the areas of front-end development. Here are some examples:

  • WMF maintains its own PHP port of the LESS CSS pre-processor which is now several major versions behind the upstream project (T288498)
  • We also maintain PHP-based tools for JS minification / validation which did not support ES6 for several years
  • Front-end developers cannot use any build tools in MW features unless they build them in an external repository and commit the resulting assets to source. This also prevents use of Typescript, CSS processors aside from LESS, and certain performance optimizations (bundling & tree-shaking code, JS transpilation, pre-compilation of Vue files, etc). See: T279108.
  • Without some kind of JS runtime, performing server-side rendering of Vue components is not going to be possible; this means that developers will need to continue writing two versions of most front-end features (one in PHP and one in JS). See: T321356.
External services

Historically, when a WMF-specific feature has needed some kind of non-PHP runtime, that feature gets developed in a dedicated extension (as opposed to being included in MW core), and the dependency is managed separately as some kind of external service. For example, the CirrusSearch extension depends on ElasticSearch, which is written in Java. A dedicated ElasticSearch service is deployed to the WMF server cluster and tools like Docker or Vagrant are used in local development.

This approach keeps MW core simple and deployable to shared hosting, but the trade-off is that WMF projects must often coordinate several different services in order to run. Local development for WMF projects has gotten increasingly complex as a result.

It would probably be possible to set up a JS runtime as an external service in order to provide some basic SSR functionality (see: T286963). But adding yet another service that may or may not be available at any given time (and needs to be provisioned, deployed, maintained, etc) seems like a less-than-ideal solution.

Questions

Should we consider introducing some kind of JS runtime into MW core itself, as opposed to deploying an external service that will only be usable for WMF projects?

Potential benefits:

  • We could solve several of the problems listed above instead of just one of them at a time
  • We could provide better integration of Vue or JS tools into MW
  • Avoid the added complexity of yet another service to maintain
  • We could align our front-end development practices to be more in line with the standard ways of doing things in 2023

Potential drawbacks:

  • We may be burdening downstream users of MW with additional requirements
  • JS runtimes can be quite large – several dozen to ~100MB or so
  • Some attention would need to be given to securing this runtime (we probably don't want to just allow things to run npm install for example)

Other Considerations

  • The PHPv8 project exposes bindings for V8 (Google's open-source JS engine, written in C++ and compiled as a cross-platform executable) for use in PHP applications. Could we use PHPv8 not just for SSR but for other things too (like a front-end build step, typescript compilation, etc)?
  • There is a new generation of alternative Node.js runtimes like Bun, Deno that are focused on security and performance (especially in SSR use-cases). These may be worth looking into and may get around some of the performance problems that would result from just naively shelling out to a Node.js executable during a web request. ESBuild, while not a full Node.js Runtime, may also be worth looking at.
Next Steps

TBD. Figuring out what we want to do here will have a lot of implications for the other initiatives around SSR and front-end build step; we may want to try to solve this problem first.

Related Objects

Mentioned In
T343144: [Spike] Explore integrating Vite (Codex's build tool) into ResourceLoader
T343049: [Spike/Prototype] Create an "intermediate library" where Codex components can be packaged into non-overlapping bundles
T337320: [Spike] Investigate A/B test results of Growth Experiments impact module
T335317: [EPIC] Determine how to support code-splitting when using Codex inside MediaWiki
T334986: [Proposal] Create a new "codex-mw" package for shared MediaWiki-specific components
T334438: Create a basic "Hello world" example of how to use Codex in a user script
T75714: Update JavaScript syntax checker for gadgets and user-scripts for ES6 and later
T279108: Introduce a Front-end Build Step for MediaWiki Skins and Extensions
T330336: Draft proposal for front-end modernization initiative
T328125: Spike: Experiment with creating GrowthExperiments components in external library
T288498: Update less.php port to support Less.js 3.13 behaviours
Mentioned Here
T75714: Update JavaScript syntax checker for gadgets and user-scripts for ES6 and later
T335317: [EPIC] Determine how to support code-splitting when using Codex inside MediaWiki
T199004: RFC: Add a frontend build step to skins/extensions to our deploy process
T279108: Introduce a Front-end Build Step for MediaWiki Skins and Extensions
T286963: Prototype a Vue SSR implementation using a Node service
T288498: Update less.php port to support Less.js 3.13 behaviours
T321356: [Technical Epic] Modern user interfaces for all users (discovery phase)

Event Timeline

This should go through the TDMP / TDF process, but that has its issues around representation for non-staff concerns which may undermine people's faith in the outcome. Not sure what to advise.

Copying comments from an ancestor discussion to simplify continuation of discourse.

Would it make more sense for us introduce a Node.js runtime[1] rather than continuing to develop and maintain our own PHP versions of various front-end tools?

This is in general the $1,000,000 USD question that has been left unanswered since nodejs first leaked into the WMF production environment as part of the "experimental" Parsoid parser. The major sticking point is not WMF usage, but rather adding a new runtime requirement for so called 3rd party MediaWiki users that would largely make running MediaWiki on classic shared hosting environments like https://www.ionos.com/hosting/web-hosting and https://www.godaddy.com/hosting/web-hosting impossible. Today it is still very possible to run MediaWiki from a host where you can only upload files and provide htaccess style web server configuration. As soon as other non-PHP language runtimes become required this will no longer be possible. There was some hope circa 2017 that the WMF would invest in a full time Product Manager for MediaWiki as a whole and that that person would begin to chip away at the hearsay in most discussions of shared hosting usage, requirements, and value to the MediaWiki ecosystem. Unfortunately for this question however the person hired for the role was taken in other directions fairly quickly and the general idea sort of fell on the floor.

I suspect most environments that have php running, have node running there as well, and php can check that this is the case.

Ionos, where I have an account and in the past have hosted several small wikis, does not have any server-side javascript runtime in their shared web hosting product. A quick search of the docs for godaddy's similar product does not indicate any server-side js support either.

For completeness, the point about requiring nodejs server side component should be IMO qualified with the note that not having nodejs SSR service wouldn't necessarily mean that no mediawiki wiki would run on the exemplary hosting. I'd think they could still run but getting the frontend rendered would then be fully shifted to client-side, i.e. would require folks browsing/using the wiki to have JS enabled in the browser. I am not saying it is not a big deal -- just the point quoted about seems to not be fully accurate.

We have a pattern for a few components in MediaWiki that are enhanced by external applications:

  • If ElasticSearch is available, you can use that for the search index instead of the DB
  • If Memcache/Redis is available, you can use that for caching instead of the DB
  • If Memcache/Redis is available, you can use that for the job queue instead of the DB

Could we architect things such that if nodeJS is available, we're able to use it, but if not, MW is still able to function?

Avoid the added complexity of yet another service to maintain

I am not a Kubernetes expert but I don't think we'd be running containers with both PHP and nodeJS processes in production, so we would need to talk to a separate service, or use shellbox to do that.

consider introducing some kind of JS runtime into MW core itself,

Could you please briefly (a few sentences, or some drawing) sketch out what that might look like, including an example request workflow for displaying some article page, because I am having trouble picturing what is being proposed.

For completeness, the point about requiring nodejs server side component should be IMO qualified with the note that not having nodejs SSR service wouldn't necessarily mean that no mediawiki wiki would run on the exemplary hosting. I'd think they could still run but getting the frontend rendered would then be fully shifted to client-side, i.e. would require folks browsing/using the wiki to have JS enabled in the browser. I am not saying it is not a big deal -- just the point quoted about seems to not be fully accurate.

We have a pattern for a few components in MediaWiki that are enhanced by external applications:

  • If ElasticSearch is available, you can use that for the search index instead of the DB
  • If Memcache/Redis is available, you can use that for caching instead of the DB
  • If Memcache/Redis is available, you can use that for the job queue instead of the DB

Could we architect things such that if nodeJS is available, we're able to use it, but if not, MW is still able to function?

I think this might be a good approach. Maybe we could add some capability to the way ResourceLoader handles Vue files, to server-render them if possible and deliver to the client for rendering otherwise.

Could you please briefly (a few sentences, or some drawing) sketch out what that might look like, including an example request workflow for displaying some article page, because I am having trouble picturing what is being proposed.

I can add a use-cases section to the task to make things a little more concrete. It may be that we cannot support all use-cases with the same tools, or that some use-cases are more important than others. Briefly I'd say the main use-cases would be:

  • Server-side rendering of Vue-based features at request time (invoked from the main PHP process)
  • Compilation of code in Core or Extensions (Typescript, non-LESS CSS pre-processors, Vue SFC compilation) – could be done ahead of time
  • Bundling/tree-shaking of code in Extensions especially (load only a single Codex component instead of the whole library, etc) – could be done ahead of time

For the "ahead of time" work, I keep thinking of how the "asset pipeline" feature worked in the classic days of Ruby on Rails. It was common for Rails apps back in the day to write front-end code using SASS and Coffeescript. You'd use a library that gave you Ruby bindings to something like V8, and that would give you the ability to run things like asset compilation scripts (in Ruby) as part of the deployment process: rake assets:precompile, etc.

It's not hard to imagine something similar for MediaWiki run through maintenance.php – compile Typescript and Vue SFCs, build extension code with something like Rollup, etc. This would *not* give you SSR but it would improve the developer tooling situation without rocking the boat too much in my opinion.

Maybe we really need two different things here – an optional SSR service with graceful fallback behavior, and a maintenance script that can use some limited Node-based build tools.

Additional considerations that come to mind:

  • We should consider providing a shell script wrapper for (relatively) simple installs, and an HTTP service wrapper for production installs. I imagine an HTTP service is how we'd want to deploy such a thing in WMF production, as opposed to shelling out, to reduce overhead. But for less performance-sensitive setups and for local development, shelling out would be easier to manage.
  • Bun and Deno look promising and we should explore them, but if possible we should aim to build something that will also work in Node.js. That way the alternative runtime can be a performance enhancement, but people could install MediaWiki even if they just have Node.js and can't/won't install something more obscure
  • We should consider how to (or whether to) support environments where even shelling out to (or installing) Node.js is not possible

I see these possible strategies for different JS runtime use cases in environments where we can't run a JS runtime:

  • Gracefully fall back to nothing: Some things are nice to have for performance reasons, but you can run MediaWiki without them and the user experience wouldn't be meaningfully different. Image scaling works this way currently. An example of a JS runtime feature could be JS/CSS minification and validation.
  • Gracefully fall back to a PHP implementation: Current examples include diff generation: it's recommended that you shell out, but there's a PHP implementation of last resort with worse but acceptable-ish performance. To some extent, JS/CSS minification/validation and Less compilation could work this way, but we're running into limitations here (JS minification doesn't support ES2016+, JS validation doesn't support ES6, Less compilation doesn't support modern Less features) that might make us want to use different strategies for these.
  • Ship precomputed outputs: For some of these use cases, the output doesn't change unless the code changes, so we could run a Node.js-based build command at release time and ship the output with the release tarball. This would not work for development or for changing the code, but it would allow running an unmodified version of MW in a PHP-only environment. We could use this strategy for TypeScript, tree shaking, Less compilation, to some extent for JS/CSS minification (on-wiki JS/CSS wouldn't benefit from minification, and on-wiki styles would not be able to use Less), and probably other related build-step-like use cases. Even in production, we may want to precompute or cache some of these things.
  • Degraded functionality: For some things we have fallbacks that work, but not as well. For example, without CirrusSearch you'll still have search, but it won't be as good. JS validation could work this way: without validation Gadgets and user scripts would still work, but they wouldn't be protected against syntax errors.
  • Missing functionality: Some things are just not supported at all without installing a service/executable, typically wrapped in an extension. Scribunto currently works this way: you can't install it if you can't shell out to an executable, and without Scribunto you can't use Lua modules.

For most proposed use cases for a JS runtime we can probably use the precomputation strategy. But for server-side rendering that won't work, and we'd need to explore different fallback strategies.

Missing functionality: Some things are just not supported at all without installing a service/executable, typically wrapped in an extension. Scribunto currently works this way: you can't install it if you can't shell out to an executable, and without Scribunto you can't use Lua modules.

It’s worth mentioning that Scribunto includes Lua binaries for several platforms (32/64-bit Windows/Linux and generic Mac), so as long as you can shell out, it works out of the box (i.e. you don’t have to install Lua separately). This means we have some precedent for including a language runtime as part of a MediaWiki extension.

However, those Lua binaries have only been updated very occasionally over the past decade:

$ GIT_PAGER=cat git log --pretty=format:'%h %aI %s' -- includes/Engines/LuaStandalone/binaries/ includes/engines/LuaStandalone/binaries/ engines/LuaStandalone/binaries/
1eecdac6de 2022-07-30T23:31:00+01:00 Capitalise Engines folder
237d059ea1 2019-01-08T21:33:47-08:00 Add lua5.1 patch for CVE-2014-5461
1fad4da137 2018-04-08T18:56:50-07:00 Move classes into includes/
00ed2a567b 2016-02-22T09:09:18-08:00 Update lua binaries to patch CVE-2014-5461
40b8bd2caa 2014-07-07T14:46:59-04:00 Add comments and remove trailing whitespace
5a9b7cc5a6 2013-08-06T10:27:35-04:00 More-compatible Linux standalone binaries
32831ec56e 2012-06-03T13:57:19+02:00 Add a Mac OS X (Lion) lua binary. Compiled for 32 and 64bit. Used automatically on Darwin systems
b0f00103e2 2012-04-16T14:41:08+10:00 Added tests and fixed bugs
54cedd69b8 2012-04-13T20:38:12+10:00 Introduced standalone interpreter, implemented module isolation

If we were to include Node.js, we’d probably have to update it much more frequently, significantly bloating the Git repository. (This could at least be mitigated by putting the runtime into a separate extension, to be included in the release tarball like other default extensions, rather than mediawiki/core.git.) I don’t know how often the other mentioned runtimes need security updates.

Restricted Application triaged this task as High priority. · View Herald TranscriptFeb 6 2023, 5:22 PM
egardner changed the task status from Open to In Progress.Feb 6 2023, 5:22 PM
egardner set the point value for this task to 5.Feb 7 2023, 9:47 PM

One concern I had when I last thought about this was i18n support. Localization messages can currently be customized locally via MediaWiki:* pages, use a MediaWiki-specific plural/gender rule syntax, and may even support wikitext (if the caller so desires). Language fallbacks and conversion are also something configured and managed in MediaWiki. Outside of the most basic of logic-less templates that can afford taking in prerendered i18n messages as parameters, it seems likely to me that an SSR service would end up needing to render i18n messages dynamically - which opens up a whole new can of worms with regards to supporting the above that'd either require additional crosstalk with MediaWiki or extra service-side logic.

I'd like to try and make this discussion a little bit more concrete by proposing a real-world use-case of having access to a Node.js runtime. Different use-cases will have different requirements, and may require different approaches. This first scenario covers "build step" needs but not SSR.

Enhanced asset management in ResourceLoader

By introducing some way to access a JS runtime on the server, we could add some new capabilities to MediaWiki's ResourceLoader system. Developers could use tools that do not have a PHP equivalent: things like the Vue or Typescript compilers, newer versions of LESS (or other CSS tools like PostCSS), Rollup, etc. To mitigate some concerns around performance and security, we could limit the use of this tool to so that it's only available for pre-compilation (invoked by a maintenance script perhaps) and possibly also for local development. This feature would not require a JS runtime to be active on the server responding to user requests in production.

What benefits would this provide?
  • Front-end features in MediaWiki core or extensions could be written using Typescript
  • Developers could create optimized bundles (using tree-shaking) for individual Resource Modules by using Rollup. This would mean that a MW extension could import a single Codex component and bundle it, rather than having to import the entire library (or a pre-defined subset of the library that is maintained upstream, which we are currently doing...)
  • Developers can pre-compile Vue files so that we don't need to ship the template compiler in production
  • We can start to move away from MediaWiki-specific ways of writing front-end code in favor of standard approaches. Write normal .vue files, write JS using import / export and ES6 syntax everywhere, etc
  • We'd be able to rely on alternatives to PHP-based tools for LESS compilation, JS minification/validation, etc. if we wanted.
Technical Considerations
  • Individual Resource Modules could opt-in to this new feature (meaning old projects could be left alone)
  • Security considerations: we never want to run npm install; the only NPM packages that this tool should be able to access should be things that have already been checked in as foreign resources. This means that developers would have access to a limited set of features and couldn't just arbitrary add tools for their own projects. Similarly these libraries would exist in MW core or maybe in some new extension created for this purpose; configuration and available libraries would be determined at the global level, not per-extension or per-skin. Everyone would get the same set of tools.
  • Some libraries we might want to include for usage here:
    • Vue/Vuex/Vue-router, etc
    • Typescript
    • Rollup (or maybe ESBuild?)
    • Terser for minification
    • Babel is probably going to be necessary
    • LESS, SASS, PostCSS
    • others?
  • If we limit Node.js-based asset processing to pre-compilation usage, there would need to be some way to distinguish between compile-time dependencies (handled in Node) and runtime dependencies (supplied by ResourceLoader in PHP).
  • Ideally this tool could rely on a few different types of JS runtimes, using the best available environment similar to how Ruby's ExecJS gem works
  • This tool should provide some kind of PHP maintenance script to precompile assets for all participating Resource Modules
  • If this functionality lived in its own extension, then other extensions could explicitly depend on it; if desired this extension could be included in the regular MW release tarballs
Basic Usage Example
  • Individual Resource Modules would opt-in to this asset handling behavior and provide some kind of entry point (probably a JS file). Any white-listed dependencies could be import-ed here. No project-level configuration would be supported; instead, all modules would be processed through a globally-defined Rollup configuration. This config would include things like support for TS, Vue, LESS, etc. Maybe a MediaWiki-specific Rollup plugin could be created for this purpose.
  • If we didn't want to support a built-in development mode, an individual team could still use something like Vite in their own codebase as a devDependency, as long as they used compatible configuration options. Otherwise, just run the maintenance script before viewing your page in local development.

By introducing some way to access a JS runtime on the server, we could add some new capabilities to MediaWiki's ResourceLoader system. Developers could use tools that do not have a PHP equivalent: things like the Vue or Typescript compilers, newer versions of LESS (or other CSS tools like PostCSS), Rollup, etc. To mitigate some concerns around performance and security, we could limit the use of this tool to so that it's only available for pre-compilation (invoked by a maintenance script perhaps) and possibly also for local development. This feature would not require a JS runtime to be active on the server responding to user requests in production.

That part sounds good to me, and seems like it would mitigate concerns raised in T199004: RFC: Add a frontend build step to skins/extensions to our deploy process / T279108: Introduce a Front-end Build Step for MediaWiki Skins and Extensions.

What benefits would this provide?
  • Front-end features in MediaWiki core or extensions could be written using Typescript

I have no opinion on this, other than that if we are going to allow core/extensions to use Typescript, we should also make a recommendation and provide resources and training to encourage adoption, so we don't end up with small islands of TypeScript adoption, which become difficult to navigate for those who aren't familiar with it.

Technical Considerations
  • Individual Resource Modules could opt-in to this new feature (meaning old projects could be left alone)
  • Security considerations: we never want to run npm install; the only NPM packages that this tool should be able to access should be things that have already been checked in as foreign resources. [...] Everyone would get the same set of tools.

<3


Should we split out this part of T328699 into a separate task, as it seems more immediately actionable and less controversial?

One concern I had when I last thought about this was i18n support. Localization messages can currently be customized locally via MediaWiki:* pages, use a MediaWiki-specific plural/gender rule syntax, and may even support wikitext (if the caller so desires). Language fallbacks and conversion are also something configured and managed in MediaWiki. Outside of the most basic of logic-less templates that can afford taking in prerendered i18n messages as parameters, it seems likely to me that an SSR service would end up needing to render i18n messages dynamically - which opens up a whole new can of worms with regards to supporting the above that'd either require additional crosstalk with MediaWiki or extra service-side logic.

We already implement most of this in client-side JavaScript. The wikitext support could use improvement, but generally we're able to substitute parameters and expand plural rules etc in JavaScript, as long as the raw message contents are packaged with the JS code (meaning the message keys used and the language need to be known in advance). At minimum, SSR could use this existing functionality. It's also important that SSRed content can be hydrated on the client, meaning that client-side JavaScript needs to be able to run the same code and arrive at the same output as the server-rendered version; so for that reason I also think we should use the same i18n message rendering code both for SSR and for client-side JS.

By introducing some way to access a JS runtime on the server, we could add some new capabilities to MediaWiki's ResourceLoader system. Developers could use tools that do not have a PHP equivalent: things like the Vue or Typescript compilers, newer versions of LESS (or other CSS tools like PostCSS), Rollup, etc. To mitigate some concerns around performance and security, we could limit the use of this tool to so that it's only available for pre-compilation (invoked by a maintenance script perhaps) and possibly also for local development. This feature would not require a JS runtime to be active on the server responding to user requests in production.

Should we split out this part of T328699 into a separate task, as it seems more immediately actionable and less controversial?

I have tried to do this in T279108#8704751

I'm closing this task as declined, as no work in this area is currently planned and I no longer consider this to be the best approach for us.

However, I will take this opportunity to list some encouraging work that is currently happening to the support for modern front-end development within our existing PHP-based infrastructure:

  • Code-splitting: DST and the MediaWiki platform team are working on upgrading MW's ResourceLoader to allow for loading only specific components from the Codex library (either the full Vue version or just the CSS styles) – see T335317 for details
  • Better ES6 support in gadgets and userscripts by use of the new Peast library (PHP-based JS syntax parser that understands modern ES6+) – see T75714