RFC: Add a frontend build step to skins/extensions to our deploy process
Open, Needs TriagePublic

Subscribers
Tokens
"100" token, awarded by nray."Love" token, awarded by Niedzielski."Like" token, awarded by Jhernandez."Yellow Medal" token, awarded by pmiazga.
Assigned To
None
Authored By
Jdlrobson, Jul 6 2018

Description

Problem

Currently when deploying code we create a directory stripped of all the git metadata and we blindly rsync it to the application servers. There is no way for individual extensions and skins to run additional install steps that may be necessary/important as part of their deployment. The absence of a build step, means skins and extensions must workaround this and ensure that code that needs deploying is inside the repo.

Many tasks developers work on are monotonous and can be automated. However, when tasks are automated, and there is no way to replicate these processes during deploys, code outputted by these processes, must at all times live inside the code repository. This is confusing, prone to human error and demoralising for developers trying to build code, as it often leads to regular merge conflict resolution (sometimes on every single commit) .

While we do not have a build step in our deploy process we are severely limited in our choice of tooling and ability to move fast.

Background

Ten years ago, build steps were rare, in particular in frontend development. Frontend development involved managing a small amount of assets providing modest enhancements to a page. Over time, JavaScript has become a more essential part of websites, MediaWiki itself makes much use of JavaScript.

The most common use case for a build step ten years ago, was to concatenate files, one of the requirements of that the original ResourceLoader was built for and does very well. However, as time has passed, build steps have been used for module loading (see http://requirejs.org/) and more recently performance optimisations.

Build processes are now commonly used to do all sorts of task - such as optimise/minify SVG assets; handling support for older browsers; making use of foreign dependencies via a package management system; dead code elimination and minification [1]

Every frontend engineer is expected to understand and make use of build processes and in fact it is considered a red flag if one doesn’t.

Frontend development inside MediaWiki proves far more difficult and can be a frustrating affair. The typical learning curve is 3 months for a well-experienced frontend developer to get up and running in our ecosystem given the unfamiliarity of our tooling and code choices.

Since the introduction of composer in MediaWiki, our code has felt much more manageable and shareable with the outside open source world. Many build steps have been rendered unnecessary e.g. Wikidata, but no frontend equivalent exists

Since we generalised our CI to run npm test we empowered developers to have more flexibility in what and how we test.

Similarly, by adding a job to deploys and giving power to developers to control the way code that gets deployed, I believe, we will see similar innovation.

[1] Although ResourceLoader provides minification, the JS implementations of minification tend to result in more bytes savings than our PHP implementation.

Why not having this is a problem

The lack of support for a build step in MediaWiki has led to many imperfect workarounds and clearly broken experiences across our codebase that are in dire need of fixing.

Some examples:

  • The Popups extension makes use of webpack to build and test complex JavaScript code. It has a test script that forces the developer to run a build step locally and commit the output. This leads to merge conflicts on every single patch in the repository, causing engineers to make tedious and mechanical corrections.
    • Despite this pain point, this build process was considered one of the main reasons we successfully, shipped this software by providing us space to move confidently and quickly.
    • MobileFrontend and Minerva skin(which power our mobile site) will soon follow the example of Popups and hit the exact same problems.
    • Challenges with Popups are documented in this phabricator task: https://phabricator.wikimedia.org/T158980
  • The Kartographer maintains a local copy of Leaflet, which was incorrectly edited by engineers unfamiliar with the code layout, and then overridden when source files were built via the build script. This has recently been corrected via a grunt test script run on every commit.
  • Flow which makes use of a build script to build server side templates. Developers modifying Handlebars templates must commit compiled templates. It is common for these not to be committed and for invalid code to be deployed to our cluster.
  • The Wikimedia portal uses a build step. Currently, this is taken care of by a bot which deploys built assets prior to deployment. In between these runs, the deploy repo lives in an outdated state.
  • Various repos make use of SVGO to compress SVG assets. See https://phabricator.wikimedia.org/T185596. This is enforced via an npm job, but could easily be handled by a build step, avoiding the need for a human to run npm install and run it themselves.
  • In MediaWiki core external libraries are copied and pasted into the resources/lib folder. This is done manually (https://github.com/wikimedia/mediawiki/commit/065b21b4bdd49ffca494ff905de29bfc500ed535). These files do not actively get updated, but if they did, a build step could help them live elsewhere.
  • The Wikidata Query Service GUI has a build process that compiles dependencies needed to display complex graphs and maps.

From what I can see there are various other (undocumented?) build processes in the ecosystem. If you are familiar with these and how a build process might help you, feel free to update these and move them into the list above.

  • Wikibase had(s) a complicated build process which was moved to composer that I am unable to understand.
  • VisualEditor uses a build step.
  • OOjs UI uses a build step.
  • LinkedWiki has a build step and doesn't check in its assets meaning it is not compatible with CI (see T198919)
  • Wikistats2 has a build process and must build and release static assets in additional commits to master example 1 example 2.

Workarounds

Right now our workaround appears to be “rebuild in php” or “force developers to run build steps locally and enforce with Jenkins”. In the case of managing SVG assets, we created the ResourceLoaderImage module and in the case of LESS compiling we introduced a PHP less compiler. In the case of compiling templates we use LightnCandy (but for the client, due to lack of a better solution and lack of a build step, offload compiling unnecessarily to our users).

While this approach is fine, when libraries already exist, this is not always practical. In the case of the LESS compiler, we use an outdated LESS compiler which has already needed to be replaced and there is more risk that these projects become unmaintained given many of these projects are moving to the Node.js ecosystem.

Enforcing with Jenkins from experience, appears to be difficult to do at scale. In the case of SVG minification various solutions exist across different repositories and standardisation and generalisation has been tricky and become blocked due to an inexperience of the many frontend engineers building these tools with our CI infrastructure.

All workarounds create unnecessary slowdown and takes focus away from actually building.

Motivation

  • Having a build step that is language agnostic would allow us to
  • Offload optimisations from the developer to automation
  • Simplify software development by empowering use of JavaScript tooling such as rollup, webpack, optimising scripts (such as svg minifiers, UglifyJS or babel minify), and up to date tools like the canonical LESS compiler.
  • Remove various “check” scripts in place in a variety of extensions which would be unnecessary
  • Avoid the need for committing assets into extensions which are built from other components within the extension making our code more readable. It is often difficult to distinguish built assets from manually written assets and README files are only somewhat effective.
  • Make it easier for newcomers to understand the code in our projects and participate in development
  • Where templates are used in our projects, compile templates as part of a build step, rather than offloading this task to the user
  • Empower developers with tools to create more efficient and better frontend interfaces by giving them the same kind of tooling and flexibility that the PHP backend leverages

Requirements

  • Build step should allow the running of an npm command prior to deploy.
  • Build step should be discoverable and consistent in every repo:

For instance I should be able to expect a build process (if available) to live inside npm run build, npm run deploy, npm run postinstall

  • Build steps need to be run locally for every single extension and skin that have registered one.
  • Build step must run on:
  • Any CI job - currently Selenium, QUnit will not run any build steps and go back what code is deployed in the repo. While the latter can be worked around using a post-update-cmd inside composer, this is not run by the former.
  • Before a deployment we would need to run this to ensure the correct code is deployed to production
  • Before a SWAT deploy on repos that have a build process we’d need to re-run the build step as content may have changed
  • Before bundling a release we would need to run this to ensure that 3rd parties are not required to install dependencies and run build processes.
  • Vagrant instances would need to be aware of any install steps as part of vagrant git-update.
  • Any errors within a build step should be made visible to whatever runs it. Build steps are expected to use error codes.

Note: For sysadmins supporting 3rd party wikis using git, given users can be expected to run php maintenance/update.php as part of many installs, it seems reasonable to expect they can also run npm install && npm run build

FAQ

I’m a sysadmin not a coder. I don’t want a build step as this makes development harder.
If you are using MediaWiki’s tar releases, you will not notice anything different. When releases are bundled, we’d do the build step for you.

Won’t https://phabricator.wikimedia.org/T133462 fix this?
This will be extremely useful and provide a more standard way to load files, but it will not allow anything further than that. This will provide greater freedom into how users build code, but it will not allow things like transpiling TypeScript or transpiling modern JavaScript that would be enabled by allowing the greater freedom of a build step.

Can’t we just recreate the tooling you are using in PHP?
For some of it, yes. In fact, we added some extremely powerful and useful PHP tooling for svg handling inside MediaWiki ResourceLoader. This is however costy as it requires a frontend developer to convince a backend developer to implement said tooling, and there are usually options readily available and more well maintained in the npm ecosystem e.g. https://www.npmjs.com/package/svg-transform-loader. Building this out took time.

While possible, something like transpiling ES6 JavaScript to ES5 would be a little crazy to implement in PHP but is available “off the shelf” in the npm ecosystem.

Introducing a build step introduces more that can go wrong in our pipeline. What if NPM is down?

Sure, there will definitely be things to work out here and they will be taken care of on a case by case basis. We should be reassured that npm is more resilient these days, but this is not a problem unique to our ecosystem. A build step could be written to fetch a recent stable release of code from another repository if necessary.

I don't want to introduce foreign code from 3rd parties via npm
This is happening already in skins/extensions. Having a central build step won't change this.

What if build scripts pull down code malicious packages from npm modules?
We'll review https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes but to begin with (particularly with regard to package-lock (T179229). We will need some way to manage this, while still getting the benefit of automation, for instance we could limit outgoing HTTP requests and run our own private mirror of package managers we need.

I don't want to run npm jobs on every git checkout!
It's unlikely you will need to. If vagrant is being used and we had a a standard approach, vagrant would be automated to take care of this for you. Built assets although maybe outdated are likely to be usable unless incompatible changes have been made in PHP. If you are working on JavaScript code that uses a build step, then you would be using that same build step for development.
One possible compromise, to mitigate the pain here could be limiting build steps to skins and extensions (ie. not core), so that if people want to avoid a build step altogether they are not forced to do so.

Jdlrobson created this task.Jul 6 2018, 9:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 6 2018, 9:56 PM
Jdlrobson updated the task description. (Show Details)Jul 6 2018, 9:58 PM
Krenair added a subscriber: Krenair.Jul 7 2018, 1:09 AM
sietec added a subscriber: sietec.Jul 7 2018, 11:25 PM
Jdlrobson moved this task from Inbox to Next up on the User-Jdlrobson board.Jul 12 2018, 10:57 PM
nray added a subscriber: nray.Jul 12 2018, 11:25 PM
Jdlrobson updated the task description. (Show Details)Jul 13 2018, 6:07 PM
Jdlrobson added subscribers: Mooeypoo, Krinkle, Catrope and 3 others.

I'm adding these tags to get more eyeballs and hear concerns. Note, I'll be traveling for some of July (like many others) so it may be best to delay any conversations until August. Looking forward to some async/sync conversation!

greg added a subscriber: greg.Jul 18 2018, 6:08 PM
tstarling moved this task from Inbox to Backlog on the TechCom-RFC board.Jul 18 2018, 8:46 PM
Milimetric added a subscriber: Milimetric.EditedJul 18 2018, 9:11 PM

Here's a quick summary of what we do with webpack for Wikistats 2. I realize it may not be relevant, but maybe some of our experience is applicable or at least useful:

  • we develop on master, and have webpack set up to watch and build into /dist-dev as we code (and test with karma), we don't commit /dist-dev
  • we build into /dist for production, and commit that to the repo
  • webpack sticks the git sha1 into the name of the files, like main-aef9966.js or main-aef9966.css, so that client-side caching isn't a problem
  • we have jenkins build and run our karma tests
  • we git merge -X theirs --squash master from master to the release branch
  • we have puppet ensure => latest from the release branch to the production server

Our docs are here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats_2#Releasing_a_new_version_to_production

Anomie added a subscriber: Anomie.Jul 23 2018, 3:16 PM

As a developer, I find myself a bit skeptical of this proposal.

I find I seldom have to run update.php. Schema changes are a big deal, so we don't have many of them

I also find I seldom have to run composer update. Our policy of version-pinning and the need to update mediawiki/vendor in tandem seems to keep down the number of breaking changes in there too.

But this is proposing something that seems planned like it'll need to be run after every git checkout. Maybe such repos should use git hooks of some sort to enforce that, with the side effect of urging developers of such extensions to keep the build step fast.

I also can't say I've had much good experience with npm stuff in general, from crazy-deep dependency trees where it takes forever just to check for updates, to oojs-ui (at one point anyway) seeming like it was downloading an entire headless build of chromium for who-knows what purpose in order to run its build, to the fact that https://bugs.debian.org/794890 was open for a long time and only now seems to be being fixed.

I’m a sysadmin not a coder. I don’t want a build step as this makes development harder.
If you are using MediaWiki’s tar releases, you will not notice anything different. When releases are bundled, we’d do the build step for you.

That seems a bit shortsighted, assuming that everyone who's a coder doesn't mind installing build tools for every extension that needs testing and that everyone who's not a coder is happy to use release tarballs.

Jdlrobson updated the task description. (Show Details)Jul 26 2018, 1:42 AM
Jdlrobson updated the task description. (Show Details)Jul 26 2018, 1:46 AM
Jdlrobson updated the task description. (Show Details)Jul 26 2018, 1:51 AM
nray awarded a token.Jul 26 2018, 1:54 AM
nray rescinded a token.
nray awarded a token.

I think it would be easier to discuss this if we dropped the language agnostic part of this and just dealt with a frontend build step, since that's really the main motivation behind this (AIUI) and where most of the problems lie. Technically with composer there are post-install hooks (I think npm might have these too?) so we already run stuff like ComposerVendorHtaccessCreator.php as you run "composer install". But anyways, I don't see any concrete usecase for making this language agnostic - let's just discuss frontend.

My most recent deep experience with npm was as part of T107561: MediaWiki support for Composer equivalent for JavaScript packages, and I was left extremely disappointed. Out of the 5-6 libraries that we use, one of them had uploaded something different to npm than what was in git, another was missing git tags, and then another was not distributing any licensing information at all. I think including more npm libraries is going to be more work to make sure we can check off the necessary items before bringing in new external libraries than it is for composer libraries.

So the main concerns I can think of right now:

  • Auditability: how are we going to audit the code we deploy? In PHP we do this by reviewing patches to mediawiki/vendor (an imperfect solution to be clear).
  • Availbility of tooling: npm is not available in Debian stable - one of the operating systems we give first class support to. How are we going to deal with that?
  • Extension support: We had to develop the composer-merge-plugin to get extension dependencies to work. How are we going to handle this for npm libraries?
  • Patching: What happens if we need to patch something (potentially a security patch) immediately, without having the upstream maintainer do a new release?

@Legoktm Good points. Also keep in mind that an important factor here is transformation of original code as generated using npm packages, not the inclusion of npm packages themselves. E.g. transpilation with Babel and bundling with Webpack.

That is the main reason why I haven't suggested a more straight-forward approach using mediawiki-vendor (or something like it). The transformation of original code is a more difficult problem to solve and would need to also consider lots of principles that we currently uphold with Git but would lose if not carefully considered:

  • Deterministic deployment. Be able to have the same code run locally, in CI, beta, prod, and even after reverting/re-deploying in prod.
  • Fast deployment. – Processing happens ahead of time, or at run-time, not during deployment.
  • Secure deployment. – All run-time code can be inspected in Git and uniquely identified by a commit hash.

I suspect the only feasible way forward that upholds those values is to make this dependant on the on-going cross-departmental project to deploy MediaWiki using immutable container images that contain all relevant source code within (e.g. using Docker containers, or similar). Those images would need to be securely built from a continuous integration/delivery pipeline, inspectable, testeable, and stored somewhere. See T170453 (RelEng/SRE goal) and its many sub tasks.

It might me possible to achieve without CI/CD, but that would require a significant amount of dedicated resources from many teams that I don't think we can spare any time soon – given the number of projects we've already committed to that are prerequisites to other future projects.

I talked to @Samwilson last week and he is also running into similar problems in that they are committing built assets into a repo. However since that code is hosted on github there are a few Travis CI solutions there which don't exist for code deployed via train.

The Wikimedia portal uses a build step. Currently, this is taken care of by a bot which deploys built assets prior to deployment. In between these runs, the deploy repo lives in an outdated state.

@Jdrewniak could you provide some notes (via edits to the description) on how the portal repo is setup? I think this would be useful.

I think it would be easier to discuss this if we dropped the language agnostic part of this and just dealt with a frontend build step, since that's really the main motivation behind this (AIUI) and where most of the problems lie. Technically with composer there are post-install hooks (I think npm might have these too?) so we already run stuff like ComposerVendorHtaccessCreator.php as you run "composer install". But anyways, I don't see any concrete usecase for making this language agnostic - let's just discuss frontend.

I'm fine with limiting the discussion to frontend.

My most recent deep experience with npm was as part of T107561: MediaWiki support for Composer equivalent for JavaScript packages, and I was left extremely disappointed. Out of the 5-6 libraries that we use, one of them had uploaded something different to npm than what was in git, another was missing git tags, and then another was not distributing any licensing information at all. I think including more npm libraries is going to be more work to make sure we can check off the necessary items before bringing in new external libraries than it is for composer libraries.

I'm not sure I follow here. If we were using npm, we'd have more confidence in the code we're pushing and it would be clearer what version number we are on.
Right now we have to rely on the header of ~/git/core/resources/lib/jquery/jquery.js to know the version, but I have no confidence that is right as there's a chance of human error. If package.json stated "jquery": "3.3.1, that goes away.

That said, while this would be empowered by a build step, it's not necessary for this RFC. We could run our own npm proxy if necessary or commit the node_modules folder if we really wanted to. A build step might be a node.js script that simply compresses SVGs for instance and would be a good step in the right direction. A script like https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/master/dev-scripts/svg_check.sh for instance could be turned into a svg minifier that run on build/deploy rather than an svg checker that runs on every CI job.

So the main concerns I can think of right now:

  • Auditability: how are we going to audit the code we deploy? In PHP we do this by reviewing patches to mediawiki/vendor (an imperfect solution to be clear).

With node 8 we can upload package-lock.json and make use of npm audit. It's my understanding we have no protection right now, so this IMO is a huge improvement.

  • Availbility of tooling: npm is not available in Debian stable - one of the operating systems we give first class support to. How are we going to deal with that?

I don't know enough about this problem so comment, but surely other people are encountering this issue and we're not the first.

  • Extension support: We had to develop the composer-merge-plugin to get extension dependencies to work. How are we going to handle this for npm libraries?
  • Patching: What happens if we need to patch something (potentially a security patch) immediately, without having the upstream maintainer do a new release?

We fork the library and patch it. Currently I have no idea what we do for external libraries inside the core/resources/lib - do we add patches on top of that or edit the code itself? I doubt we have any protection there, so this doesn't seem like a new problem to me.

How can we move this conversation forward? Can I request TechComm schedules a discussion on this RFC this with at the least the following attendees: @Legoktm @Jdrewniak @Krinkle @Jhernandez @Milimetric @Niedzielski @Samwilson ?

Joe added a subscriber: Joe.Thu, Jan 10, 11:51 AM

Please note that the statement

Currently when deploying code we sync a git repo.

is actually incorrect. We create a directory stripped of all the git metadata and we blindly rsync it to the application servers.

The git repositories are synced to the deployment hosts, but we don't use git in any way for our deployment process (and I would say we should).

Joe added a comment.Thu, Jan 10, 12:34 PM

I suspect the only feasible way forward that upholds those values is to make this dependant on the on-going cross-departmental project to deploy MediaWiki using immutable container images that contain all relevant source code within (e.g. using Docker containers, or similar). Those images would need to be securely built from a continuous integration/delivery pipeline, inspectable, testeable, and stored somewhere. See T170453 (RelEng/SRE goal) and its many sub tasks.

I don't think there is a direct dependency between the two; I think the efforts might go along in parallel if we want to add a build step to deployments. You can build the artifacts in a container and export them, for instance, and that's well supported by our CI infrastructure AIUI. The problem I see is that implementing a pipeline that upholds the values you cited will require a lot of effort anyways.

Some various considerations:

  1. I agree with @Legoktm - it's easier to get to a consensus if we narrow the scope of the proposal to the frontend javascript code
  2. A model similar to what @Milimetric was mentioning might work, but I miss to many details to provide a feedback. I still think the RfC should be a little less light on the technical details of the proposal once we've agreed on what we want to achieve.
  3. Reproducibility and auditability are to be guaranteed by a CI pipeline that generates binary artifacts, that will then be fetched (via git) either for deployments via scap or inside containers (in the future). This is a tall goal to achieve with the npm ecosystem, and will probably involve multiple teams.
  4. It might make sense to have a single build pipeline or to have a separate one per-extension and just standardizing how we build the artifacts. I would tend to think that the former approach is more manageable.
  5. Managing reverts could be an issue, that needs to be better understood
  6. The binary artifacts build should only happen when needed, or we'd need to change the way we deploy code before we can do it (think of pooling/depooling of databases, or SWAT, when multiple small patches get deployed sequentially and sometimes fast-reverted)

It might me possible to achieve without CI/CD, but that would require a significant amount of dedicated resources from many teams that I don't think we can spare any time soon – given the number of projects we've already committed to that are prerequisites to other future projects.

What I listed above can only happen if a proper CI pipeline is set up, and I suspect this would require a non-negligible amount of time from a few teams, most notably Release Engineering and in general the infrastructure teams (SRE, Core Platform, Performance). So resourcing seems to be a real concern before this proposal can be considered feasible.

Jdlrobson updated the task description. (Show Details)Thu, Jan 10, 4:05 PM
Jdlrobson renamed this task from RFC: Add a language agnostic build step to skins/extensions to our deploy process to RFC: Add a frontend build step to skins/extensions to our deploy process.Thu, Jan 10, 4:09 PM
Jdlrobson updated the task description. (Show Details)

I just want to re-iterate our simpler way of handling build steps in Wikistats 2. In light of the problems described here, I think it deserves a second look. It's pretty standard on other nodejs projects I've looked at. We basically factor out most of our common build config and use it in two kinds of builds:

  1. npm run dev: builds source files and any dependencies. Sets up webpack-devserver to watch for file changes and rebuild. Outputs to a directory that's ignored by git and used during local development.
  2. npm run build: builds source files and any dependencies, including production-oriented transformations like minifying. Outputs to a dist directory that runs in production, and is committed to git. When we do this, we:
    • build, modifying only the dist directory
    • test the new build locally
    • push the new dist to gerrit, by itself with no other source modifications, where CI tests it
    • merge and deploy

With the dev build, we develop in peace without worrying about the build system. When we want to release we go through the process above. The release commits are separate from any other commits and so can be reverted easily. Of course this wasn't meant for the mediawiki ecosystem, but I've sat here thinking about most of the problems that you all raise above and I think solutions exist for each of them. The trickiest bit in my mind is de-duplicating dependencies across multiple extensions and core, but I've got a few ideas and both npm and webpack have dealt with this problem. The only other thing that stood out was npm not existing on Debian 9, which I am at a loss about. Like... why...

With the dev build, we develop in peace without worrying about the build system. When we want to release we go through the process above. The release commits are separate from any other commits and so can be reverted easily. Of course this wasn't meant for the mediawiki ecosystem, but I've sat here thinking about most of the problems that you all raise above and I think solutions exist for each of them.

The problem with this workflow is that then your acceptance/browser tests are running against the last deployed frontend assets. Which is fine, but then once you build you may get browser tests that fail and it is hard to pinpoint which commit broke them.

Another issue is when server code needs the updated frontend code/assets. You risk not running the dist scripts etc. at the right time and deploying broken code in subtle ways.

To avoid these, you would make your integration/acceptance jobs run the dist job locally before running the tests, and at that point we are talking again about having a standard way to run the dist job on CI jobs. Not on deploy, which maybe makes things easier RFC wise, but sort of the same concept.

The trickiest bit in my mind is de-duplicating dependencies across multiple extensions and core, but I've got a few ideas and both npm and webpack have dealt with this problem. The only other thing that stood out was npm not existing on Debian 9, which I am at a loss about. Like... why...

Tricky indeed, but we have the same problem right now without any solution. We can move forward with having build systems having the same solution we use now: Manual inspection and care from developers. And maybe in the future we can streamline this some other way. Maybe a linter that checks for shared dependencies across extensions or something low cost that would help rise duplicate dependencies. Maybe even streamlining frontend build-steps into mediawiki core itself!

The problem with this workflow is that then your acceptance/browser tests are running against the last deployed frontend assets. Which is fine, but then once you build you may get browser tests that fail and it is hard to pinpoint which commit broke them.

You can make both your local tests and CI tests run against the dist-dev version of your build, the one you use locally for development. And you could have a separate CI job testing the dist version for when that's updated on release commits. The CI job running against dist-dev would just run the local build, or even a third build that's very similar but meant for headless testing (so, for example, no webpack-devserver).

Another issue is when server code needs the updated frontend code/assets. You risk not running the dist scripts etc. at the right time and deploying broken code in subtle ways.

You never deploy intermediate commits, only release commits. This can be accomplished by segregating release commits onto a separate branch if needed. So if you are mixing server and frontend code in the same repository, you build both. At the very least you should test both as I was brainstorming above.

To avoid these, you would make your integration/acceptance jobs run the dist job locally before running the tests, and at that point we are talking again about having a standard way to run the dist job on CI jobs. Not on deploy, which maybe makes things easier RFC wise, but sort of the same concept.

I agree, but don't see problems with doing any of the above as part of CI jobs.

Tricky indeed, but we have the same problem right now without any solution. We can move forward with having build systems having the same solution we use now: Manual inspection and care from developers. And maybe in the future we can streamline this some other way. Maybe a linter that checks for shared dependencies across extensions or something low cost that would help rise duplicate dependencies. Maybe even streamlining frontend build-steps into mediawiki core itself!

Hm, I thought ResourceLoader helped a bit with it. But if we have a similar issue, then yes, definitely, let's standardize on webpack or similar. I was just googling around and found two different solutions that are much better than manual inspection.