The following intertwined discussion were descoped from this ticket:
1. {T257072} to systematically audit every version of every dependency (end result: [[ http://libraryupgrader2.wmflabs.org/library | libraryupgrader2 ]])
# {T257061} - the repository storing the audited package versions
* Related: {T107561}
# {T257068} - The package manager distributing packages from the repository to CI and developer nodes
= Problem
Currently when deploying MediaWiki code at WMF, Scap creates a single directory with the contents of MW core and all all installed skins/extensions, stripped of all the git metadata and we blindly rsync it to the application servers.
There is not currently a process in place somewhere between approving a commit in Gerrit and it being live on WMF application servers, that would allow for an extension or skin to build derivative files or inject additional files from third-party sources (e.g. npm packages).
The absence of such a build step, means skins and extensions must workaround this by ensuring all code needed in production is committed to the Git repository.
Many tasks developers perform are monotonous and can be automated. However, these automation tasks currently cannot happen as part of the deployment process. Instead, they are run locally or by bots, with their output always being committed in the same repository. This can be confusing to contributes, prone to human error and demoralising for developers as it regularly leads to merge conflicts (sometimes on every single commit) .
As long as we do not have a build step in our deploy process, we are:
* are discouraged from making use of existing tooling without modification that could lead to potential optimisation in our code but would change the resulting code on every commit
** e.g. better minification, dead code elimination, Node.js templating libraries, transpilers, SVG minification (see workarounds)
* exposing ourselves to avoidable errors at all times (see examples below)
* unable to move fast.
** are forced to commit assets to master and experience merge conflicts with every patchset
** forced to run scripts manually that are slow and could be delayed till deploy-time (e.g. minification of image assets)
* Wasting time reinventing the wheel in PHP/build processes (see ResourceLoaderImageModule and Discovery, Gruntfile tasks, deploy-bot for examples - see workarounds)
In summary, these build and deploy steps are already occurring in a de facto, informal manner on individual developer's local machines. This is not ideal from the security or developer performance perspectives. This RFC is about formalizing these methods and automating the process where possible.
= Background
Ten years ago, build steps were rare, in particular in frontend development. Frontend development involved managing a small amount of assets providing modest enhancements to a page. Over time, JavaScript has become a more essential part of websites, MediaWiki itself makes much use of JavaScript.
The most common use case for a build step ten years ago, was to concatenate files, one of the requirements of that the original ResourceLoader [[ https://www.mediawiki.org/wiki/ResourceLoader/Requirements/Tim_Starling | was built for ]] and does very well. However, as time has passed, build steps have been used for module loading (see http://requirejs.org/) and more recently performance optimisations.
Build processes are now commonly used to do all sorts of task - such as optimise/minify SVG assets; handling support for older browsers; making use of foreign dependencies via a package management system; dead code elimination and minification [1]
Every frontend engineer is expected to understand and make use of build processes and in fact it is considered a red flag if one doesn’t.
Frontend development inside MediaWiki proves far more difficult and can be a frustrating affair. The typical learning curve is 3 months for a well-experienced frontend developer to get up and running in our ecosystem given the unfamiliarity of our tooling and code choices.
Since the introduction of composer in MediaWiki, our code has felt much more manageable and shareable with the outside open source world. Many build steps have been rendered unnecessary e.g. [[ https://phabricator.wikimedia.org/T173818 | Wikidata ]], but no frontend equivalent exists
Since we generalised our CI to run `npm test` we empowered developers to have more flexibility in what and how we test.
Similarly, by adding a job to deploys and giving power to developers to control the way code that gets deployed, I believe, we will see similar innovation.
[1] Although ResourceLoader provides minification, the JS implementations of minification tend to result in more bytes savings than our PHP implementation.
= Why not having this is a problem
The lack of support for a build step in MediaWiki has led to many imperfect workarounds and clearly broken experiences across our codebase that are in dire need of fixing.
Some examples:
* The Popups extension makes use of webpack to build and test complex JavaScript code. It has a test script that forces the developer to run a build step locally and commit the output. This leads to merge conflicts on every single patch in the repository, causing engineers to make tedious and mechanical corrections.
** Despite this pain point, this build process was considered one of the main reasons we successfully, shipped this software by providing us space to move confidently and quickly.
** MobileFrontend and Minerva skin(which power our mobile site) will soon follow the example of Popups and hit the exact same problems.
** Challenges with Popups are documented in this phabricator task: https://phabricator.wikimedia.org/T158980
* The Kartographer maintains a local copy of Leaflet, which was incorrectly edited by engineers unfamiliar with the code layout, and then overridden when source files were built via the [[ https://github.com/wikimedia/mediawiki-extensions-Kartographer/blob/master/bin/build.sh | build script ]]. This has recently been corrected via a [[ https://github.com/wikimedia/mediawiki-extensions-Kartographer/blob/master/Gruntfile.js#L92 | grunt test script ]] run on every commit.
* Flow which makes use of a [[ https://github.com/wikimedia/mediawiki-extensions-Flow/blob/master/Makefile#L96 | build script to build server side templates ]]. Developers modifying Handlebars templates must commit [[ https://github.com/wikimedia/mediawiki-extensions-Flow/tree/master/handlebars/compiled | compiled templates ]]. It is common for these not to be committed and for invalid code to be deployed to our cluster.
* The Wikimedia portal uses a build step. Currently, this is taken care of by a bot which deploys built assets prior to deployment. In between these runs, the deploy repo lives in an outdated state.
* Various repos make use of SVGO to compress SVG assets. See https://phabricator.wikimedia.org/T185596. This is enforced via an npm job, but could easily be handled by a build step, avoiding the need for a human to run `npm install` and run it themselves.
* In MediaWiki core external libraries are copied and pasted into the [[ https://github.com/wikimedia/mediawiki/tree/master/resources/lib | resources/lib ]] folder. This is done manually (https://github.com/wikimedia/mediawiki/commit/065b21b4bdd49ffca494ff905de29bfc500ed535). These files do not actively get updated, but if they did, a build step could help them live elsewhere.
* The Wikidata Query Service GUI has a build process that [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/gui/+/refs/heads/master/Gruntfile.js | compiles dependencies ]] needed to display complex graphs and maps.
From what I can see there are various other (undocumented?) build processes in the ecosystem. If you are familiar with these and how a build process might help you, feel free to update these and move them into the list above.
* Wikibase had(s) a complicated build process which was moved to composer that I am unable to understand.
* VisualEditor uses a build step.
* OOjs UI uses a build step.
* LinkedWiki has a build step and doesn't check in its assets meaning it is not compatible with CI (see T198919)
* Wikistats2 has a build process and must build and release static assets in additional commits to master [[ https://gerrit.wikimedia.org/r/442334 | example 1 ]] [[ https://gerrit.wikimedia.org/r/445217 | example 2 ]].
= Workarounds
Right now our workaround appears to be “rebuild in php” or “force developers to run build steps locally and enforce with Jenkins”. In the case of managing SVG assets, we created [[ https://www.mediawiki.org/wiki/ResourceLoader/Images | the ResourceLoaderImage module ]] and in the case of LESS compiling we introduced a [[ https://www.mediawiki.org/wiki/Requests_for_comment/LESS | PHP less compiler ]]. In the case of compiling templates we [[ https://github.com/zordius/lightncandy | use LightnCandy ]] (but for the client, due to lack of a better solution and lack of a build step, offload compiling unnecessarily to our users).
While this approach is fine, when libraries already exist, this is not always practical. In the case of the LESS compiler, we use an outdated LESS compiler [[ https://www.mediawiki.org/wiki/Requests_for_comment/Change_LESS_compilation_library | which has already needed to be replaced ]] and there is more risk that these projects become unmaintained given many of these projects are moving to the Node.js ecosystem.
Enforcing with Jenkins from experience, appears to be difficult to do at scale. In the case of SVG minification various solutions exist across different repositories and standardisation and generalisation has been tricky and [[ https://phabricator.wikimedia.org/T179361 | become blocked ]] due to an inexperience of the many frontend engineers building these tools with our CI infrastructure. The approach with SVG minification involves adding a check for whether SVGs are minified and committed rather than simply running the optimisation on existing code in the repo.
All workarounds create unnecessary slowdown and takes focus away from actually building.
= Motivation
* Having a build step for Node.js would allow us to
* Offload optimisations from the developer to automation
* Simplify software development by empowering use of JavaScript tooling such as rollup, webpack, optimising scripts (such as svg minifiers, UglifyJS or babel minify), and up to date tools like the canonical LESS compiler.
* Remove various “check” scripts in place in a variety of extensions which would be unnecessary
* Avoid the need for committing assets into extensions which are built from other components within the extension making our code more readable. It is often difficult to distinguish built assets from manually written assets and README files are only somewhat effective.
* Make it easier for newcomers to understand the code in our projects and participate in development
* Where templates are used in our projects, compile templates as part of a build step, rather than offloading this task to the user
* Empower developers with tools to create more efficient and better frontend interfaces by giving them the same kind of tooling and flexibility that the PHP backend leverages
= Requirements
[] Build step should allow the running of an npm command prior to deploy.
[] Build step should be discoverable and consistent in every repo:
For instance I should be able to expect a build process (if available) to live inside `npm run build`, `npm run deploy`, `npm run postinstall`
[] Build steps need to be run locally for every single extension and skin that have registered one.
[] Build step must run on:
* Any CI job - currently Selenium, QUnit will not run any build steps and go back what code is deployed in the repo. While the latter can be worked around using a `post-update-cmd` inside composer, this is not run by the former.
* Before a deployment we would need to run this to ensure the correct code is deployed to production
* Before a SWAT deploy on repos that have a build process we’d need to re-run the build step as content may have changed
* Before bundling a release we would need to run this to ensure that 3rd parties are not required to install dependencies and run build processes.
* Vagrant instances would need to be aware of any install steps as part of vagrant git-update.
[] Any errors within a build step should be made visible to whatever runs it. Build steps are expected to use error codes.
Note: For sysadmins supporting 3rd party wikis using git, given users can be expected to run `php maintenance/update.php` as part of many installs, it seems reasonable to expect they can also run `npm install && npm run build`
=FAQ
**I’m a sysadmin not a coder. I don’t want a build step as this makes development harder.**
If you are using MediaWiki’s tar releases, you will not notice anything different. When releases are bundled, we’d do the build step for you.
**Won’t https://phabricator.wikimedia.org/T133462 fix this?**
This will be extremely useful and provide a more standard way to load files, but it will not allow anything further than that. This will provide greater freedom into how users build code, but it will not allow things like transpiling TypeScript or transpiling modern JavaScript that would be enabled by allowing the greater freedom of a build step.
**Can’t we just recreate the tooling you are using in PHP?**
For some of it, yes. In fact, we added some extremely powerful and useful PHP tooling for svg handling inside MediaWiki ResourceLoader. This is however costy as it requires a frontend developer to convince a backend developer to implement said tooling, and there are usually options readily available and more well maintained in the npm ecosystem e.g. https://www.npmjs.com/package/svg-transform-loader. Building this out took time.
While possible, something like transpiling ES6 JavaScript to ES5 would be a little crazy to implement in PHP but is available “off the shelf” in the npm ecosystem.
**Introducing a build step introduces more that can go wrong in our pipeline. What if NPM is down?**
Sure, there will definitely be things to work out here and they will be taken care of on a case by case basis. We should be reassured that npm is more resilient these days, but this is not a problem unique to our ecosystem. A build step could be written to fetch a recent stable release of code from another repository if necessary.
**I don't want to introduce foreign code from 3rd parties via npm**
This is happening already in skins/extensions. Having a central build step won't change this.
**What if build scripts pull down code malicious packages from npm modules?**
We'll review https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes but to begin with (particularly with regard to package-lock (T179229). We will need some way to manage this, while still getting the benefit of automation, for instance we could limit outgoing HTTP requests and run our own private mirror of package managers we need.
**I don't want to run npm jobs on every git checkout!**
It's unlikely you will need to. If vagrant is being used and we had a a standard approach, vagrant would be automated to take care of this for you. Built assets although maybe outdated are likely to be usable unless incompatible changes have been made in PHP. If you are working on JavaScript code that uses a build step, then you would be using that same build step for development.
One possible compromise, to mitigate the pain here could be limiting build steps to skins and extensions (ie. not core), so that if people want to avoid a build step altogether they are not forced to do so.