= Problem
Currently when deploying code we create a directory stripped of all the git metadata and we blindly rsync it to the application servers. There is no way for individual extensions and skins to run additional install steps that may be necessary/important as part of their deployment. The absence of a build step, means skins and extensions must workaround this and ensure that code that needs deploying is inside the repo.
Many tasks developers work on are monotonous and can be automated. However, when tasks are automated, and there is no way to replicate these processes during deploys, code outputted by these processes, must at all times live inside the code repository. This is confusing, prone to human error and demoralising for developers trying to build code, as it often leads to regular merge conflict resolution (sometimes on every single commit) .
While we do not have a build step in our deploy process we are severely limited in our choice of tooling and ability to move fast.
= Background
Ten years ago, build steps were rare, in particular in frontend development. Frontend development involved managing a small amount of assets providing modest enhancements to a page. Over time, JavaScript has become a more essential part of websites, MediaWiki itself makes much use of JavaScript.
The most common use case for a build step ten years ago, was to concatenate files, one of the requirements of that the original ResourceLoader [[ https://www.mediawiki.org/wiki/ResourceLoader/Requirements/Tim_Starling | was built for ]] and does very well. However, as time has passed, build steps have been used for module loading (see http://requirejs.org/) and more recently performance optimisations.
Build processes are now commonly used to do all sorts of task - such as optimise/minify SVG assets; handling support for older browsers; making use of foreign dependencies via a package management system; dead code elimination and minification [1]
Every frontend engineer is expected to understand and make use of build processes and in fact it is considered a red flag if one doesn’t.
Frontend development inside MediaWiki proves far more difficult and can be a frustrating affair. The typical learning curve is 3 months for a well-experienced frontend developer to get up and running in our ecosystem given the unfamiliarity of our tooling and code choices.
Since the introduction of composer in MediaWiki, our code has felt much more manageable and shareable with the outside open source world. Many build steps have been rendered unnecessary e.g. [[ https://phabricator.wikimedia.org/T173818 | Wikidata ]], but no frontend equivalent exists
Since we generalised our CI to run `npm test` we empowered developers to have more flexibility in what and how we test.
Similarly, by adding a job to deploys and giving power to developers to control the way code that gets deployed, I believe, we will see similar innovation.
[1] Although ResourceLoader provides minification, the JS implementations of minification tend to result in more bytes savings than our PHP implementation.
= Why not having this is a problem
The lack of support for a build step in MediaWiki has led to many imperfect workarounds and clearly broken experiences across our codebase that are in dire need of fixing.
Some examples:
* The Popups extension makes use of webpack to build and test complex JavaScript code. It has a test script that forces the developer to run a build step locally and commit the output. This leads to merge conflicts on every single patch in the repository, causing engineers to make tedious and mechanical corrections.
** Despite this pain point, this build process was considered one of the main reasons we successfully, shipped this software by providing us space to move confidently and quickly.
** MobileFrontend and Minerva skin(which power our mobile site) will soon follow the example of Popups and hit the exact same problems.
** Challenges with Popups are documented in this phabricator task: https://phabricator.wikimedia.org/T158980
* The Kartographer maintains a local copy of Leaflet, which was incorrectly edited by engineers unfamiliar with the code layout, and then overridden when source files were built via the [[ https://github.com/wikimedia/mediawiki-extensions-Kartographer/blob/master/bin/build.sh | build script ]]. This has recently been corrected via a [[ https://github.com/wikimedia/mediawiki-extensions-Kartographer/blob/master/Gruntfile.js#L92 | grunt test script ]] run on every commit.
* Flow which makes use of a [[ https://github.com/wikimedia/mediawiki-extensions-Flow/blob/master/Makefile#L96 | build script to build server side templates ]]. Developers modifying Handlebars templates must commit [[ https://github.com/wikimedia/mediawiki-extensions-Flow/tree/master/handlebars/compiled | compiled templates ]]. It is common for these not to be committed and for invalid code to be deployed to our cluster.
* The Wikimedia portal uses a build step. Currently, this is taken care of by a bot which deploys built assets prior to deployment. In between these runs, the deploy repo lives in an outdated state.
* Various repos make use of SVGO to compress SVG assets. See https://phabricator.wikimedia.org/T185596. This is enforced via an npm job, but could easily be handled by a build step, avoiding the need for a human to run `npm install` and run it themselves.
* In MediaWiki core external libraries are copied and pasted into the [[ https://github.com/wikimedia/mediawiki/tree/master/resources/lib | resources/lib ]] folder. This is done manually (https://github.com/wikimedia/mediawiki/commit/065b21b4bdd49ffca494ff905de29bfc500ed535). These files do not actively get updated, but if they did, a build step could help them live elsewhere.
* The Wikidata Query Service GUI has a build process that [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/gui/+/refs/heads/master/Gruntfile.js | compiles dependencies ]] needed to display complex graphs and maps.
From what I can see there are various other (undocumented?) build processes in the ecosystem. If you are familiar with these and how a build process might help you, feel free to update these and move them into the list above.
* Wikibase had(s) a complicated build process which was moved to composer that I am unable to understand.
* VisualEditor uses a build step.
* OOjs UI uses a build step.
* LinkedWiki has a build step and doesn't check in its assets meaning it is not compatible with CI (see T198919)
* Wikistats2 has a build process and must build and release static assets in additional commits to master [[ https://gerrit.wikimedia.org/r/442334 | example 1 ]] [[ https://gerrit.wikimedia.org/r/445217 | example 2 ]].
= Workarounds
Right now our workaround appears to be “rebuild in php” or “force developers to run build steps locally and enforce with Jenkins”. In the case of managing SVG assets, we created [[ https://www.mediawiki.org/wiki/ResourceLoader/Images | the ResourceLoaderImage module ]] and in the case of LESS compiling we introduced a [[ https://www.mediawiki.org/wiki/Requests_for_comment/LESS | PHP less compiler ]]. In the case of compiling templates we [[ https://github.com/zordius/lightncandy | use LightnCandy ]] (but for the client, due to lack of a better solution and lack of a build step, offload compiling unnecessarily to our users).
While this approach is fine, when libraries already exist, this is not always practical. In the case of the LESS compiler, we use an outdated LESS compiler [[ https://www.mediawiki.org/wiki/Requests_for_comment/Change_LESS_compilation_library | which has already needed to be replaced ]] and there is more risk that these projects become unmaintained given many of these projects are moving to the Node.js ecosystem.
Enforcing with Jenkins from experience, appears to be difficult to do at scale. In the case of SVG minification various solutions exist across different repositories and standardisation and generalisation has been tricky and [[ https://phabricator.wikimedia.org/T179361 | become blocked ]] due to an inexperience of the many frontend engineers building these tools with our CI infrastructure.
All workarounds create unnecessary slowdown and takes focus away from actually building.
= Motivation
* Having a build step that is language agnostic would allow us to
* Offload optimisations from the developer to automation
* Simplify software development by empowering use of JavaScript tooling such as rollup, webpack, optimising scripts (such as svg minifiers, UglifyJS or babel minify), and up to date tools like the canonical LESS compiler.
* Remove various “check” scripts in place in a variety of extensions which would be unnecessary
* Avoid the need for committing assets into extensions which are built from other components within the extension making our code more readable. It is often difficult to distinguish built assets from manually written assets and README files are only somewhat effective.
* Make it easier for newcomers to understand the code in our projects and participate in development
* Where templates are used in our projects, compile templates as part of a build step, rather than offloading this task to the user
* Empower developers with tools to create more efficient and better frontend interfaces by giving them the same kind of tooling and flexibility that the PHP backend leverages
= Requirements
[] Build step should allow the running of an npm command prior to deploy.
[] Build step should be discoverable and consistent in every repo:
For instance I should be able to expect a build process (if available) to live inside `npm run build`, `npm run deploy`, `npm run postinstall`
[] Build steps need to be run locally for every single extension and skin that have registered one.
[] Build step must run on:
* Any CI job - currently Selenium, QUnit will not run any build steps and go back what code is deployed in the repo. While the latter can be worked around using a `post-update-cmd` inside composer, this is not run by the former.
* Before a deployment we would need to run this to ensure the correct code is deployed to production
* Before a SWAT deploy on repos that have a build process we’d need to re-run the build step as content may have changed
* Before bundling a release we would need to run this to ensure that 3rd parties are not required to install dependencies and run build processes.
* Vagrant instances would need to be aware of any install steps as part of vagrant git-update.
[] Any errors within a build step should be made visible to whatever runs it. Build steps are expected to use error codes.
Note: For sysadmins supporting 3rd party wikis using git, given users can be expected to run `php maintenance/update.php` as part of many installs, it seems reasonable to expect they can also run `npm install && npm run build`
=FAQ
**I’m a sysadmin not a coder. I don’t want a build step as this makes development harder.**
If you are using MediaWiki’s tar releases, you will not notice anything different. When releases are bundled, we’d do the build step for you.
**Won’t https://phabricator.wikimedia.org/T133462 fix this?**
This will be extremely useful and provide a more standard way to load files, but it will not allow anything further than that. This will provide greater freedom into how users build code, but it will not allow things like transpiling TypeScript or transpiling modern JavaScript that would be enabled by allowing the greater freedom of a build step.
**Can’t we just recreate the tooling you are using in PHP?**
For some of it, yes. In fact, we added some extremely powerful and useful PHP tooling for svg handling inside MediaWiki ResourceLoader. This is however costy as it requires a frontend developer to convince a backend developer to implement said tooling, and there are usually options readily available and more well maintained in the npm ecosystem e.g. https://www.npmjs.com/package/svg-transform-loader. Building this out took time.
While possible, something like transpiling ES6 JavaScript to ES5 would be a little crazy to implement in PHP but is available “off the shelf” in the npm ecosystem.
**Introducing a build step introduces more that can go wrong in our pipeline. What if NPM is down?**
Sure, there will definitely be things to work out here and they will be taken care of on a case by case basis. We should be reassured that npm is more resilient these days, but this is not a problem unique to our ecosystem. A build step could be written to fetch a recent stable release of code from another repository if necessary.
**I don't want to introduce foreign code from 3rd parties via npm**
This is happening already in skins/extensions. Having a central build step won't change this.
**What if build scripts pull down code malicious packages from npm modules?**
We'll review https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes but to begin with (particularly with regard to package-lock (T179229). We will need some way to manage this, while still getting the benefit of automation, for instance we could limit outgoing HTTP requests and run our own private mirror of package managers we need.
**I don't want to run npm jobs on every git checkout!**
It's unlikely you will need to. If vagrant is being used and we had a a standard approach, vagrant would be automated to take care of this for you. Built assets although maybe outdated are likely to be usable unless incompatible changes have been made in PHP. If you are working on JavaScript code that uses a build step, then you would be using that same build step for development.
One possible compromise, to mitigate the pain here could be limiting build steps to skins and extensions (ie. not core), so that if people want to avoid a build step altogether they are not forced to do so.