Page MenuHomePhabricator

MediaWiki-Vendor creates a scenario in which incompatible versions of dependencies can be present
Open, Needs TriagePublic

Description

Problem
It is possible for multiple extensions to require the same package in incompatible versions and for one of those versions to be in MediaWiki-Vendor. This would work fine until the other package expects the code to work a certain way (and doesn't have a test for it) and then all of the sudden it breaks (since the package is not in a version the extension expects).

For instance, if you have two extensions A and B. If A requires guzzlehttp/guzzle in ^6.0.0 and B requires guzzlehttp/guzzle in ~6.2.0 Composer will install 6.2.3 instead of the latest version which is 6.3.3. It does this because B has not declared that it is compatible with anything other than 6.2.x, but A is fine with anything in 6.x.x. If it can't resolve the version to something that is compatible with everything Composer will throw an exception and force you to deal with the problem. This is a huge safe-guard against incompatible versions that we are not using, and we are forcing site administrators to deal with on their own.

This is amplified by MediaWiki-Vendor because the guzzlehttp/guzzle version constraint must be manually specified which may become incompatible (over time) with A and B

Proposed Solution
Instead of maintaining a repo of dependencies. WMF could instead use Composer to require the extensions they need. This will let extensions define dependencies without having to manually update MediaWiki-Vendor. This does not mean WMF needs to run Composer in production, they can simply commit the vendor directory in their instance of MediaWiki.

For instance, if you were to require the extension like:

composer require mediawiki/abuse-filter

you would get the abuse filter extension (installed in the extensions folder) as well as all of that extensions dependencies installed in the vendor directory.

WMF could either use a version constraint on the extension, or use dev#master which will give them the latest master of that extension.

This follow's Composer's best-practices and removes the need to manually merge and update dependencies in MediaWiki-Vendor

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Given that that RfC has been closed with "Affected teams (e.g. Release Engineering or MediaWiki Platform Team) are free to create a new RFC at any time, possibly borrowing some of the ideas from this RFC.", I'm not sure merging new proposals in is helpful.

There has been discussions (and I think maybe an RFC) about using Composer to include extensions, and I'm pretty sure it was declined. A couple of extensions do support it, but it's the minority for sure. For example, we've not even gotten all extension to use extension registration... So having to do support for composer installing too...

Although we could use composer to grab the dependancies, by adding "extensions" for installing in the composer.json on the vendor repo, we then have duplicate code to checkout, push around. And then having composer include the extension into all wikis in many cases is a no go too. Unless, like you say, there's some magic way to get them into the extensions dir... But then we're not using git submodules, and the workflow is somewhat different. Not sure if that's a good or a bad thing.

There is the option of using the composer.json https://github.com/wikimedia/mediawiki/blob/master/composer.local.json-sample which would do some of this magically, but then we're dependant on people having a clone of mw core, on the WMF branches, and *only* the WMF deployed extensions checked out. All in the right places; not just having a clone of the vendor repo

The repo is used for many reason, primarily to not run composer on the cluster, but also for keeping track of what versions we're using; sometimes for security (version X of library Y has been reviewed, and known "good"), other times, because upstreams have a lot of releases, that don't add much. We don't want to update every time, so having it as a manual process works

And if things don't have specific enough versions defined, we end up with people having to review things from other libraries, that have had releases changing things. Someone then has to review this, which is potentially just noise for us.

There is probably some better ways for doing this overall, but I'm not sure what it is. I don't see how having to change (or add) a line in composer.json (well, you don't even actually need to do that, if you just do composer require) then running composer update is much more work than either adding the extension, or having composer find it magically, then running composer update anyway.

There's only OOjs UI that regularly changes it's version (usually, weekly), and again, it's done as a manual process. Because the appropriate teams can review the changes

Given that that RfC has been closed with "Affected teams (e.g. Release Engineering or MediaWiki Platform Team) are free to create a new RFC at any time, possibly borrowing some of the ideas from this RFC.", I'm not sure merging new proposals in is helpful.

Agreed.

There has been discussions (and I think maybe an RFC) about using Composer to include extensions, and I'm pretty sure it was declined. A couple of extensions do support it, but it's the minority for sure. For example, we've not even gotten all extension to use extension registration... So having to do support for composer installing too...

There are three ways you can load an extension with Composer:

  1. If it is on Packagist, then it can be loaded by simply specifying the extension.
  2. If it is not on Packagist, but has a valid composer.json file, then you can specify the VCS repository
  3. If it not on Packagist, and does not have a valid composer.json then you can manually specify the package.

Extensions do not have to support Composer in order for it to be used (though, it does make it easier, and I would recommend putting the ones WMF uses on Packagist).

Although we could use composer to grab the dependancies, by adding "extensions" for installing in the composer.json on the vendor repo, we then have duplicate code to checkout, push around.

Why would we have duplicate code?

And then having composer include the extension into all wikis in many cases is a no go too. Unless, like you say, there's some magic way to get them into the extensions dir... But then we're not using git submodules, and the workflow is somewhat different. Not sure if that's a good or a bad thing.

Yes, Composer Installers already does this.

There is the option of using the composer.json https://github.com/wikimedia/mediawiki/blob/master/composer.local.json-sample which would do some of this magically, but then we're dependant on people having a clone of mw core, on the WMF branches, and *only* the WMF deployed extensions checked out. All in the right places; not just having a clone of the vendor repo

They don't have to have the WMF deployed extensions checked out. Composer would specify them for them and put them in the right place. They would simply checkout the wmf branch and run composer install They would get all of WMF's extensions as well as all of the correct dependencies.

The repo is used for many reason, primarily to not run composer on the cluster, but also for keeping track of what versions we're using; sometimes for security (version X of library Y has been reviewed, and known "good"), other times, because upstreams have a lot of releases, that don't add much. We don't want to update every time, so having it as a manual process works

composer.lock already keeps track of every version number. If you don't want to update the deps everytime, but do want to update the extensions you can run:

composer update --root-reqs

which will only update what is in your root composer.json not the dependencies of those packages (full disclosure, I wrote the --root-reqs option for this reason).

And if things don't have specific enough versions defined, we end up with people having to review things from other libraries, that have had releases changing things. Someone then has to review this, which is potentially just noise for us.

Again, just don't update their deps. :) but they should define a stable version range.

There is probably some better ways for doing this overall, but I'm not sure what it is. I don't see how having to change (or add) a line in composer.json (well, you don't even actually need to do that, if you just do composer require) then running composer update is much more work than either adding the extension, or having composer find it magically, then running composer update anyway.

Because, as I defined in the problem, if you are an extension maintainer, you have to go update a WMF repo just to get your deps installed properly.

There's only OOjs UI that regularly changes it's version (usually, weekly), and again, it's done as a manual process. Because the appropriate teams can review the changes

The changes would still be reviewed.

There is probably some better ways for doing this overall, but I'm not sure what it is. I don't see how having to change (or add) a line in composer.json (well, you don't even actually need to do that, if you just do composer require) then running composer update is much more work than either adding the extension, or having composer find it magically, then running composer update anyway.

Because, as I defined in the problem, if you are an extension maintainer, you have to go update a WMF repo just to get your deps installed properly.

No you don't. They only need to be in the vendor repo if for WMF deployment. For other jobs, it's just making sure the CI config is right so it runs composer for you

There's only OOjs UI that regularly changes it's version (usually, weekly), and again, it's done as a manual process. Because the appropriate teams can review the changes

The changes would still be reviewed.

I didn't say they wouldn't. I'm just saying that's the only team that regularly changes anything in that repo

For other jobs, it's just making sure the CI config is right so it runs composer for you.

How does one update the CI config? It needs to be updated for AbuseFilter and AntiSpoof. :)

For other jobs, it's just making sure the CI config is right so it runs composer for you.

How does one update the CI config? It needs to be updated for AbuseFilter and AntiSpoof. :)

integration/config repo on gerrit, mirrored (because github is easier for browsing) at https://github.com/wikimedia/integration-config

integration/config repo on gerrit, mirrored (because github is easier for browsing) at https://github.com/wikimedia/integration-config

I created T178452: Run `composer install` on Jenkins for AbuseFilter & AntiSpoof since I don't understand this repository. :)

Based on the comment in T178452#3693007, this is still a problem that ought to be resolved.

The primary reasons for our manually curated and and reviewed vendor directory were laid out by Chris in T105638#1515362. Composer itself has gained some additional TLS functionality since that critique, but as far as I know there is still no revision signing and validation system to ensure that previously security reviewed packages have not been tampered with.

I agree that in many ways the deviation from normal Composer usage is annoying. I'm not at all against someone investing time into examining the various attack vectors and attempting to find alternate compensating controls. The current process has been shown to be a fairly minor burden for software development, but process improvements are generally driven by frustration or fresh perspectives.

I may be confused, but this general theme of becoming more aligned with wide-spread Composer practices seems to be scattered over several different tickets at this point:

The fragmentation of the discussion in my opinion does not help to advance towards resolution of the underlying issues. I would really like to see one consolidated RFC created and discussed. The outcome of that discussion might be partial or complete adoption of the suggested changes, but having a holistic discussion rather than talking about several small parts of the same larger problem would be helpful both for advancing the larger community's understanding of the issues.

as far as I know there is still no revision signing and validation system to ensure that previously security reviewed packages have not been tampered with.

What type of validation system are you looking for? If the vendor directory is committed to WMF's repo, any changes would be apparent, but if we would prefer not to, every commit hash is in composer.lock and it should also contain a shahash to verify that the package hasn't been modified. But I think it's fine if the vendor directory is committed to the repo (the only real downside to doing that is that developers who are not aware of the process might attempt to modify things in /vendor, but that would be apparent in their patch).

The fragmentation of the discussion in my opinion does not help to advance towards resolution of the underlying issues. I would really like to see one consolidated RFC created and discussed. The outcome of that discussion might be partial or complete adoption of the suggested changes, but having a holistic discussion rather than talking about several small parts of the same larger problem would be helpful both for advancing the larger community's understanding of the issues.

Well I mean there are a bunch of different problems with (potentially) different solutions. But overall, we have no policy, which is where I think the fragmentation comes from. We don't have a policy on whether extensions should have valid composer.json files, we don't have a policy if they should be on Packagist or not, we don't have a policy on how extensions should define dependencies (certainly should not be tied to WMF imho), we don't have a policy on how WMF (or any other MediaWiki) user should recursively "load" (read: not install) extensions that have dependencies.

I can try and organize this (thanks for your list!) into a coherent issue, but it's difficult to do that without someone immediately closing the issue and pointing to a RFC that is several years and is way way out of scope for resolving these issue (i.e. we don't need to install or enable extensions with Composer, just resolve the dependencies).

For now, I feel like the best course of action has been to point out holes in this ship rather than a comprehensive solution which is already being rejected in most people's minds (or so that's what I've found). But if you think it's worth it, then I'll create that higher-level issue that explains all of the problems and the comprehensive solution.

as far as I know there is still no revision signing and validation system to ensure that previously security reviewed packages have not been tampered with.

What type of validation system are you looking for? If the vendor directory is committed to WMF's repo, any changes would be apparent, but if we would prefer not to, every commit hash is in composer.lock and it should also contain a shahash to verify that the package hasn't been modified. But I think it's fine if the vendor directory is committed to the repo (the only real downside to doing that is that developers who are not aware of the process might attempt to modify things in /vendor, but that would be apparent in their patch).

The task description says:

Instead of maintaining a repo of dependencies. WMF ought to instead use Composer to require the extensions they need. This will let extensions define dependencies without having to manually update MediaWiki-Vendor

I guess I'm confused about what you are actually proposing. Is it the manually curated composer.json in mediawiki/vendor.git that you are trying to find a way to replace? The first sentence reads to me like a proposal to somehow git rid of mediawiki/vendor.git entirely and replace it with some as yet unspecified process to collect the required Composer managed dependencies by via "simply commit the vendor directory in their instance of MediaWiki". I'm not sure that there is an "instance of MediaWiki" to do this for in the Foundation production network. I do not believe that there is any one of the 800+ wikis which actually has all of the extensions enabled. Management of heterogeneous wiki farms was the root problem in T467: RfC: Extension management with Composer that led to its rejection.

Ah. I see. I assume that, even if a wiki in the farm does not have the extension enabled, it is in the filesystem? If not, how is the filesystem managed?

What I'm proposing as a solution (and this ought to be more clear, apologizes). Is that WMF (and any other mediawiki project for that matter would have a composer.json that would be something like this):

{
	"name": "wikimedia/wiki",
	"type": "project",
	"description": "Wikimedia Wiki",
	"homepage": "https://www.wikimedia.org/",
	"license": "GPL-2.0+",
	"require": {
		"mediawiki/core": "dev#wmf/1.31.0-wmf.3",
		"wikimedia/anti-spoof": "dev#wmf/1.31.0-wmf.3",
		"wikimedia/abuse-filter": "dev#wmf/1.31.0-wmf.3",
                "wikimedia/vector": "dev#wmf/1.31.0-wmf.3",
	}
}

Obviously there would be more deps than this.
Assuming ./html is the web-accessible directory:

  • mediawiki/core would get cloned into ./html/core .
  • extensions would get cloned into ./html/extensions.
  • skins would get cloned into ./html/skins.
  • all other deps would be out of the webroot in ./vendor

All of these directories can be committed to this wmf project (if we want them to be). There would also need to be copies of the front controllers in ./html that would do a require_once ./core/SAMEFILE.

This is the "wrapper" setup (option 2) as described in T167038.

Doing this would allow extensions (and core) to define their own dependencies without having to manually define and update them. As I mentioned above, you can only update core/extensions with:

composer update --root-reqs

which, in our setup, would not update anything in ./vendor

Also, this does not enable (or even autoload) the extensions, it only loads them into the filesystem and resolves the dependencies.

Also, this does not enable (or even autoload) the extensions, it only loads them into the filesystem and resolves the dependencies.

That part actually depends very much on the extension in question. The original design of Composer usage for loading extensions included enabling the extension. Semantic MediaWiki at least still adds its entry point file to the autoloader to do this. I believe that there are other extensions in the wild which continue to do this as well.

Also, this does not enable (or even autoload) the extensions, it only loads them into the filesystem and resolves the dependencies.

That part actually depends very much on the extension in question. The original design of Composer usage for loading extensions included enabling the extension. Semantic MediaWiki at least still adds its entry point file to the autoloader to do this. I believe that there are other extensions in the wild which continue to do this as well.

Interesting. I suppose it doesn't matter so much if the classes are autoloaded, but it does matter if the extension is auto-enabled (I don't think it should be). But again, we need to have a policy on such things (and enforce that policy on the CI).

If we wanted to, I believe it's possible to hook into Composer's autoloader and then we could filter out anything not explicitly enabled.

It would actually be another unique practice to enable extensions when required with Composer. I can't think of a project that does it that way. Symfony, Drupal, WordPress, etc. do not allow enabling plugins with Composer, you must always explicitly enable them.

(ok, Symfony Flex does auto-enable, but you can disable the plugin without removing it from the filesystem)

Ah. I see. I assume that, even if a wiki in the farm does not have the extension enabled, it is in the filesystem? If not, how is the filesystem managed?

The on disk state is identical for all of the Wikimedia wikis (at least in the main wiki farm). The extensions which are added to the wmf release branches as versioned submodules are managed by the make-wmf-branch script. The enabling of any given extension is handled by the Het Deploy configuration files. As I understand it, your proposal would change/augment make-wmf-branch script to do something like editing the composer.json to set the desired versions of each extension and run Composer to add them and their dependencies to the working tree. This would also need to somehow retain the submodule attachment so that SWAT patches and other fixes could be easily applied to the release branch during its lifetime.

This would also require the execution of make-wmf-branch to be performed on host has connectivity to the Internet to resolve and fetch the Composer managed libraries. This is the point at which the question of signed packages and verification comes into the picture. Semver versioning relies on git tags and git tags are alterable. So without some cryptographic integrity check we can't be sure that the libfoo v1.2.3 which was security reviewed for execution on the Foundation production cluster is the same libfoo v1.2.3 that is fetched by a particular Composer run. This is even less assured if the Extensions themselves are using range-based version constraints for their dependencies and we are not using a composer.lock that is somehow propagated from release branch to release branch to constrain the choice of upstream library. These are the class of problems that the current manually curated mediawiki/vendor.git process was designed to address. As some level it may seem like paranoia, but the Wikimedia wikis contain information that is accessible to the application code but not exposed in the user interface to unprivileged user that some outside actors would very much like to have (e.g. IPs of authed users which may disclose their physical location). Having untrusted/unreviewed code put live on the wiki farm is a direct threat to the security of that data.

My understanding is that composer support came before extension registration, and amongst some extension developers (mostly enterprise-y ones, maybe) was seen as a great way to load an extension without having to add anything to LocalSettings.php. Mostly SMW-related extensions, I think (I was working on a SMW company intranet at the time, developing in-house extensions that all loaded themselves without anything further than an entry in composer.json, and I remember converting some extensions to be like this because it was seen as a good practice). So, it was a feature, not a bug! :-)

Earlier, the big problem with Composer and MediaWiki was that you couldn't choose which extensions were enabled; of course that's now changed, with the requirement for wfLoadExtension(). It seems to me that the current limitations to using Composer (aside from within the WMF, where there are various operational complexities) is mostly that extension authors don't add full composer.json files and don't tag versions. As far as I can see, if someone wants to deploy extensions using Composer at the moment, it works. Whether the same system should be used with the WMF, I'm not sure — if it's really not a good idea, then we should do more to stop 3rd parties from doing it too, I reckon.

(I replied before seeing bd808's reply.)

I don't know if there's an easier way to do the WMF management of the vendor files, but it does seem to be a reasonably uncommon way of deploying MediaWiki. I think single-wiki sysadmins are reasonably well served by the status quo, but you just have to look at the huge variation of set-ups of people trying to run multiple wikis to see that there's really not much agreement about the best way to do it. The docs on https://www.mediawiki.org/wiki/Manual:Wiki_family are all over the place, and plenty don't even seem to know e.g. that there's a --wiki flag for maintenance scripts.

It'd be cool to have some more clarity around how to install, upgrade, etc. and I think part of that is about managing the vendor directory. Perhaps the WMF way is the best, and we should promote it as the standard way?

I don't know if there's an easier way to do the WMF management of the vendor files, but it does seem to be a reasonably uncommon way of deploying MediaWiki.

I completely agree that it is uncommon. I would go further and say that it is unique. There may be one or two other MediaWiki farms that try to follow the complete Wikimedia deployment model if any. This bug however is 100% targeted at our unique situation as I read it. The mediawiki/vendor.git repo is an artifact of the Wikimedia Foundation's deployment pipeline. Its not a recommended best practice for general MediaWiki deployments.

I think single-wiki sysadmins are reasonably well served by the status quo, but you just have to look at the huge variation of set-ups of people trying to run multiple wikis to see that there's really not much agreement about the best way to do it. The docs on https://www.mediawiki.org/wiki/Manual:Wiki_family are all over the place, and plenty don't even seem to know e.g. that there's a --wiki flag for maintenance scripts.

It'd be cool to have some more clarity around how to install, upgrade, etc. and I think part of that is about managing the vendor directory. Perhaps the WMF way is the best, and we should promote it as the standard way?

It how we (the Wikimedia Foundation) run our (the Wikimedia movement) wiki farm. I would be very scared to call any of MultiVersion and HetDeploy a 'standard'. Its a collection of glue that we use today to serve 800+ wikis that are kind of sort of the same with a few thousand knobs that get turned this way and that per wiki. The config system is fragile and complex. Anyone starting from scratch would probably come up with a better solution as far as the ability to turn those knobs without breaking things is concerned. Ask Kunal, Reedy, and Chad sometime about all the ways they want to change it. :)

The thing that makes standardizing wiki farm setup and maintenance difficult is that there are a wide range of use cases where a farm might be employed. What fits one may not work at all for another. There are probably some common issues that standard or at least default solutions could be found for, but to start on that work you would first need a pretty good survey of wiki farm requirements. The MediaWiki Farmers user group might be a place to start having those discussions, but I think it is unlikely that those 22 people represent the breadth of wiki farm requirements.

Let me try to reframe this issue in a different way. (I might actually be talking about a different issue; feel free to tell me so and I'll go file a separate task then.)

Scope: The way we manage the vendor repo for WMF production. (Not getting rid of the vendor repo; not 3rd-party wikifarm management practices.)

Problem statement: With meticulous work, we are preventing our dependency management system from managing dependencies.
The fundamental idea of Composer (and dependency management in general) is that you have a root set of packages that you actually cares about; the dependency manager tracks that set and can generate the larger set of packages needed to actually get the code to run. By flattening all dependencies into the root set, we completely break that ability. When we undeploy and extension, there is no way to tell which dependencies have become unnecessary; when an extension (or a dependency of an extension) switches from one library to anther, we can't tell whether that library became unnecessary; there is no way to tell whether a library is pinned to a specific version for some specific reason or that just happened to be the current version when the requirement was added. All the nice tools like composer why break. It adds maintenance burden (e.g. when there is a PHP version update and some library becomes incompatible, we can't tell whether that's a problem to be solved or we can just delete it and be fine).

(Also, the manual pinning process is a bit tedious; although as Reedy said that's not a big problem as it's very rarely needed.)

Requirements that the current process fullfills and new onces should do so too:

  • All changes to vendor libraries must go through version control.
  • It must be possible to require libraries which are not dependencies of an extension. (Something that's merely suggested, for example.)
  • It must be possible to update a library while keeping the dependency graph consistent but without doing a lot of unnecessary updates

Possible solutions:

  1. Low-tech approach: just keep track of what our actual root requirements are in a text file. Or maybe a shell script file with composer require commands and comments. (Life would be so much simpler if composer.json allowed comments. Thanks Douglas Crockford :/ ) Determining whether a library is needed would be very tedious but at least possible.
  2. Use composer require on extensions, as suggested in the task description. Prevent extensions from going into vendor somehow (e.g. by requiring "type": "mediawiki-extension" for all MediaWiki extensions which have a composer.json file, and adding composer/installers to the root requirements) and gitignore them. Use the lockfile for pinning. There are a couple of problems with this (which might be surmountable):
    • MediaWiki extensions typically do not have useful versioning. The wmf/* branches are short-lived; commit id pinning makes commits unintelligible; proper version tagging would probably be an unwanted extra burden.
    • I don't think Composer supports limited updates - either you limit the update to a whitelist and risk breaking the dependency requirements, or use --with-dependencies and get a bunch of unneeded updates. See composer/composer#6601 although that bug report needs less hypotheses and more tests.

(Aside: @bd808 is of course right that it would be great to have a task that systematically reviews all our issues with Composer; it probably won't happen though because no one has time for that. So let's not reinvent waterfall development and block sensible small improvements on planning for everything :)
(Aside 2: event if we don't plan to fix / can't fix our problem(s) with composer, it pays to have a clear understanding of why they are problems and what prevents us from fixing them. Large projects tend to have some pull with their upstreams as long as they can clearly communicate and justify their needs; we are not very good at doing that and most of the time not even trying.)

As I understand it, your proposal would change/augment make-wmf-branch script to do something like editing the composer.json to set the desired versions of each extension and run Composer to add them and their dependencies to the working tree.

Correct. The "version" can be whatever, typically it's a tag, but afaik, we use wmf branches, it should be consistant with whatever we are doing now.

This would also need to somehow retain the submodule attachment so that SWAT patches and other fixes could be easily applied to the release branch during its lifetime.

This is the part I'm a little fuzzy on, so again, please excuse the ops ignorance. :)
If we are applying patches to the wmf branch within an extension, the deploy could either:

  1. Run composer update EXTENSION to pull the latests on that branch for the extension (and update the commit hash in composer.lock). For an extra level of protection, run composer update --root-reqs EXTENSION which will only update the extension and not the extensions dependencies.
  2. Or they could cd into the submodule and do a git pull (but composer.lock would be out-of-sync (if that is ok).

I would prefer the former so that composer.lock is kept in sync.
Keep in mind that the extensions could also be in (as in, copied by Composer and committed) this (new?) repo as well, so they could be updated before deploying (and deploying would be updating all of them at once)

This would also require the execution of make-wmf-branch to be performed on host has connectivity to the Internet to resolve and fetch the Composer managed libraries.

Not necessarily, if the "project" repo that contains my example composer.json (above) also has committed the ./vendor directory, then the only thing that is being updated is the extensions. That can be done by either accessing Packagist (over the internet) or by specifying the VCS repo, which can be a path on the local filesystem.

This is the point at which the question of signed packages and verification comes into the picture. Semver versioning relies on git tags and git tags are alterable. So without some cryptographic integrity check we can't be sure that the libfoo v1.2.3 which was security reviewed for execution on the Foundation production cluster is the same libfoo v1.2.3 that is fetched by a particular Composer run.

While git tags are alterable, afaik, git commit hashes are not. composer.lock actually ignores the version number and instead uses the commit hash. The file is saying "this version number, is this commit hash". If that version number no longer equals the same commit hash, I believe it will throw a warning, but still use the commit hash rather than force an update of composer.lock. Also, if we want an additional check, we can simply commit the vendor directory to the repository. That is technically not necessary, but it would make any changes completely apparently.

This is even less assured if the Extensions themselves are using range-based version constraints for their dependencies and we are not using a composer.lock that is somehow propagated from release branch to release branch to constrain the choice of upstream library.

The choice of upstream library, ought to be up to WMF, not the extension author (who, ought to specify a range of compatibility). composer.lock maintains the commit hashes of all of the packages, and WMF is not in any way mandated to update an extensions deps. In fact, if extensions did pin a version in composer.json this would create more problems. If one extension requires version 1.0.2 and other 1.0.3 Composer would fail and throw an exception (since no version can be satisfied). If both specify a range like ^1.0.2 and ^1.0.3 then WMF can choose to install 1.0.3 or 1.0.4, etc. since the extension author is saying that their extension ought to work with either version. One way to ensure that extensions do work with the entire range, is to run Jenkin's tests once with standard composer install and another with composer install --prefer-lowest (which will install the oldest supported version of all of the deps). This will ensure that the extension continues to work with the entire range of supported dependencies.

These are the class of problems that the current manually curated mediawiki/vendor.git process was designed to address. As some level it may seem like paranoia, but the Wikimedia wikis contain information that is accessible to the application code but not exposed in the user interface to unprivileged user that some outside actors would very much like to have (e.g. IPs of authed users which may disclose their physical location). Having untrusted/unreviewed code put live on the wiki farm is a direct threat to the security of that data.

I completely understand the paranoia. I am not in anyway saying we should not review changes. I'm only saying that the "deps" should be the extensions, not the libraries. We can still (and probably should) continue to commit the dependencies (core, extensions, skins, and vendor) to a repo for review before deployment. We also can still (and probably should) continue to update a whitelist of dependencies and review those changes before deployment.

Scope: The way we manage the vendor repo for WMF production. (Not getting rid of the vendor repo; not 3rd-party wikifarm management practices.)

Uhh... right. The vendor repo would be replaced with a WMF wiki "project" repo that would contain the dependencies of the project (core, extensions, skins, libraries).

Problem statement: With meticulous work, we are preventing our dependency management system from managing dependencies.
The fundamental idea of Composer (and dependency management in general) is that you have a root set of packages that you actually cares about; the dependency manager tracks that set and can generate the larger set of packages needed to actually get the code to run. By flattening all dependencies into the root set, we completely break that ability. When we undeploy and extension, there is no way to tell which dependencies have become unnecessary; when an extension (or a dependency of an extension) switches from one library to anther, we can't tell whether that library became unnecessary; there is no way to tell whether a library is pinned to a specific version for some specific reason or that just happened to be the current version when the requirement was added. All the nice tools like composer why break. It adds maintenance burden (e.g. when there is a PHP version update and some library becomes incompatible, we can't tell whether that's a problem to be solved or we can just delete it and be fine).

Exactly.

Requirements that the current process fullfills and new onces should do so too:

  • All changes to vendor libraries must go through version control.
  • It must be possible to require libraries which are not dependencies of an extension. (Something that's merely suggested, for example.)
  • It must be possible to update a library while keeping the dependency graph consistent but without doing a lot of unnecessary updates

Those all seem completely valid to me. If they aren't possible right now, then we should attempt to fix them upstream, and if that's not possible then we should use Composer's API (imho).

Possible solutions:

  1. Low-tech approach: just keep track of what our actual root requirements are in a text file. Or maybe a shell script file with composer require commands and comments. (Life would be so much simpler if composer.json allowed comments. Thanks Douglas Crockford :/ ) Determining whether a library is needed would be very tedious but at least possible.

Yes. This seems like we would be somewhat reinventing the wheel a bit, but certainly possible to do it that way. I would much rather determine our pain points and work with Composer for a possible solution before we went that route.

  1. Use composer require on extensions, as suggested in the task description. Prevent extensions from going into vendor somehow (e.g. by requiring "type": "mediawiki-extension" for all MediaWiki extensions which have a composer.json file, and adding composer/installers to the root requirements)

That's exactly what I would do. I think we should actually enforce (by policy and with the CI) that extensions have a valid composer.json and the type is set to mediawiki-extension.

and gitignore them.

That's not completely necessary. I prefer it that way (prevents developers who are no familiar with Composer from changing deps) amongst other reasons, but I think it's fine if we commit the dependencies to the repo (several large projects do this).

Use the lockfile for pinning.

The lockfile should always be used for pinning instead of composer.json. However, this isn't a huge deal in the "project" repo, if a project would like to pin versions in composer.json, that's technically fine since nothing should be depending on the "project". So if everyone would feel more comfortable if the versions continued to be pinned in composer.json (with the "project" repo), I'm perfectly fine with that. They should not be pinned in core, extensions, or skins.

There are a couple of problems with this (which might be surmountable):

  • MediaWiki extensions typically do not have useful versioning. The wmf/* branches are short-lived; commit id pinning makes commits unintelligible; proper version tagging would probably be an unwanted extra burden.

I was under the impression that we use the wmf branches for deployment? What if we had a script that updated the branch name in composer.json whenever a new branches are created?

  • I don't think Composer supports limited updates - either you limit the update to a whitelist and risk breaking the dependency requirements, or use --with-dependencies and get a bunch of unneeded updates. See composer/composer#6601 although that bug report needs less hypotheses and more tests.

I left a comment on there, but yeah we do need to give them a failing test so they can see what is going on.

(Aside: @bd808 is of course right that it would be great to have a task that systematically reviews all our issues with Composer; it probably won't happen though because no one has time for that. So let's not reinvent waterfall development and block sensible small improvements on planning for everything :)

Would you like me to create an Epic of sorts? What would it even be called?

(Aside: @bd808 is of course right that it would be great to have a task that systematically reviews all our issues with Composer; it probably won't happen though because no one has time for that. So let's not reinvent waterfall development and block sensible small improvements on planning for everything :)

Invoking the specter of waterfall project planning is throwing a lot of shade on my statements without helping me understand how you see this proposal as something that is isolated from the majority of issues that I referenced. This proposal to me reads as step N in an N step process to completely align the usage of Composer by the Wikimedia Foundation for MediaWiki. I may be missing a less invasive method, but I think this would require a system much like T89945: Merge to deployed branches instead of cutting a new deployment branch every week. where a "MediaWiki project" repo would replace the current mediawiki/vendor.git repo and also become the root unit of deployment to /srv/mediawiki/php-1.*-wmf.* on production hosts. This in turn presumes T166956: Cannot use Composer's CLI to manage a project's dependencies. It implies adding Jenkins tests to ensure certain things about the composer.json for all extensions deployed on the Wikimedia cluster. It also implies changes to the existing Jenkins test pipeline for all tests which include mediawiki/vendor.git in their setup as that repo would be removed. Finally it changes to how the deploy branch is structured, how the contents of that branch are selected, and how updates are made to the branch.

What I am saying here is that this is not a trivial change and it does not exist in a state completely detached from most of the related issues I referenced. This is proposing a desired end-state for a complete revamp of the dependency management process which in turn effects every use case from prepping a deploy server for a scap * run all the way back to CI for every MediaWiki gerrit change.

The task T172927 is linked to this one, focusing on reconciling mediawiki/vendor’s composer.json for tarballs with mediawiki/core’s composer.json, hence mediawiki/vendor could be fully dedicated to Wikimedia. There are issues related to librairies used by extensions, similarly to what Tgr exposed above in T178137#3695758.

About the per-wiki activation of extensions in a wikifarm, the solution I chosed in MediaWiki-extensions-MediaWikiFarm is to create small isolated Composer autoloaders (all in /vendor, in subdirectories composerKEY, e.g. /vendor/composer085b59d), where each small autoloader only loads the MediaWiki extension (and required sub-extensions). During MediaWiki initialisation, each autoloader corresponding to an activated extension in that wiki is called, in addition of the base autoloader. To respect global version contraints, it is first checked that Composer agrees when all extensions are activated. It is a Composer hack, but it works (for now).

The main issue in wikifarms is the auto-activation of the extensions (no issue about registering classes, either in classmaps or PSR-4), and it was not well-understood when it was exposed in upstream Composer in #4109. Afaik (and partly an opinion of mine) the current best way to use Composer for dependency management and at the same time stay compatible with wikifarms is the way PageForms does: use composer.json for the "require" section, but not use the "autoload" section (more specifically not use the "autoload" -> "files" section, which could execute active code). (For now MediaWikiFarm does not manage correctly this case, which is a new type of activation.)

As you (@dbarratt) suggest it above in T178137#3694997, perhaps a Composer hook removing "autoload -> files" sections by default for Composer type "mediawiki-extension" and activating it on a per-wiki basis could be a path to be explored. This would mean some API/method instructing to activate a (sub-)autoloader.

What is the current intention with this task?

The use of mediawiki-vendor is optional for third-party site admins. Always has been. If they use tarballs, we transparently make it part of the tarball without requiring any awareness of where it came from. If they use Git, they can choose to use, or not, either way works.

The maintenance of mediawiki-vendor for those who review changes to MediaWiki core and WMF-deployed projects, is not optional. This isn't particularly due to any limitations on Composer side, but was a conscious decision for the purposes of security, availability, and ability to have deterministic builds that can be reversed. Security here doesn't just mean to have the same code each time. Even with a perfect lock file and secure program to interpret and enforce it, that only helps to schedule or delay your compromise to align with updates to the lock file. Security here also means the ability to review the changes.

@bd808 has outlined this better than I could already, including with references to related tasks and RFCs. This system isn't going away any time soon I imagine. It isn't an oversight, it exists intentionally. We can certainly aim to streamline the work around this, and better document it, but without a clear problem description and resourcing that commits to doing something about it, I'm not sure what the next step is for this task.

Krinkle moved this task from Inbox to Watching on the TechCom board.
dbarratt renamed this task from MediaWiki-Vendor is an unnecessary unique practice that must be manually maintained to MediaWiki-Vendor creates a scenario in which incompatible versions of dependencies can be present.Apr 3 2019, 10:59 PM
dbarratt updated the task description. (Show Details)

@Krinkle I've updated the task description with what the core of the problem is. If it's not a problem, then feel free to ignore, but it does keep me up at night. :)

This just happened to us with some WIkibase extensions.
Apparently none of our CI now actually does a composer install it seems? otherwise this should have been caught?

For a while merged into the main branches the WikibaseQualityConstraints extension had a library compatibility that would not work with other extensions.
Nothing in CI flagged this up, despite these extensions being tested together.
This was eventually caught by devs that happened to pull the latest version off all of the things and try a composer update which failed
The fix in the extension was https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseQualityConstraints/+/672990

Without having read all of the comments on this lengthy ticket it feels like it would make a lot more sense to have mediawiki-vendor maintained automatically as part of post merge jobs, or as part of the branch cuts?
CI could then just move back to doing composer install covering cases like this and T179663 and T113360 and generally leading to less developer confusion around how all of this is held together.

Without having read all of the comments on this lengthy ticket it feels like it would make a lot more sense to have mediawiki-vendor maintained automatically as part of post merge jobs, or as part of the branch cuts?

Probably not gonna happen. If we allow post merge jobs to randomly update/install stuff, it defeats the point of the auditability of having the mediawiki-vendor repo etc.

"oh, the bot installed/updated that after patch $x" is not acceptable.

We are not going to have automated processes pull untrusted code and and commit to Wikimeda repos without human oversight, but we could easily have CI test whether the version constraints in mediawiki/vendor are compatible with core and the various extensions. Basically, have a job check out mediawiki/core, mediawiki/vendor and all Wikimedia extensions, create a composer.local.json that merges in vendor and the extensions, and run composer validate. It will be a little stricter than what's actually needed (there are plenty of Wikimedia extensions Wikibase Repo doesn't actually need to coexist with) but that's probably fine.

Probably not gonna happen. If we allow post merge jobs to randomly update/install stuff, it defeats the point of the auditability of having the mediawiki-vendor repo etc.

The composer.lock file prevents this from happening:

Commit your composer.lock file to version control

Committing this file to VC is important because it will cause anyone who sets up the project to use the exact same versions of the dependencies that you are using. Your CI server, production machines, other developers in your team, everything and everyone runs on the same dependencies, which mitigates the potential for bugs affecting only some parts of the deployments. Even if you develop alone, in six months when reinstalling the project you can feel confident the dependencies installed are still working even if your dependencies released many new versions since then. (See note below about using the update command.)

https://getcomposer.org/doc/01-basic-usage.md#installing-with-composer-lock

Without having read all of the comments on this lengthy ticket it feels like it would make a lot more sense to have mediawiki-vendor maintained automatically as part of post merge jobs, or as part of the branch cuts?

Probably not gonna happen. If we allow post merge jobs to randomly update/install stuff, it defeats the point of the auditability of having the mediawiki-vendor repo etc.

"oh, the bot installed/updated that after patch $x" is not acceptable.

There can still be a review process, and IMO this should be part of the train / deployment process / branch cut process.
The current implementation of this "fix" couples the development of a whole bunch of things with various wmf specific deployment needs.
I'm not saying they are not important requirements, but this could all be less coupled and IMO less painful and less prone to things going wrong inadvertently.

We are not going to have automated processes pull untrusted code and and commit to Wikimeda repos without human oversight, but we could easily have CI test whether the version constraints in mediawiki/vendor are compatible with core and the various extensions. Basically, have a job check out mediawiki/core, mediawiki/vendor and all Wikimedia extensions, create a composer.local.json that merges in vendor and the extensions, and run composer validate. It will be a little stricter than what's actually needed (there are plenty of Wikimedia extensions Wikibase Repo doesn't actually need to coexist with) but that's probably fine.

I think such a job (or some other fix) is important, otherwise we will end up with something unexpectedly breaking eventually.
I didn't realize until today that we now have 0 jobs between development and product that actually run composer install?
The whole system is currently built around people, and making sure that people don't miss steps, in a very complex system etc.
IMO the management of all of this should be automated.

  • Dev 1 makes a patch in extension 1, which includes some version change...
  • Path 1: The change breaks something (we land in cross repo dependency hell), Say what is broken, and what other things need fixing
  • Path 2: The change doesn't break anything, and doesn't require any actually deployed library to change versions, Allow the patch to the merged, do nothing else.
  • Path 3: The change doesn't break anything, and DOES require a deployed library to change versions. Allow the patch to be merged, create the patch for mediawiki-vendor making the version change, block cutting the branch on review of this. On merge ping back in the gerrit commit saying that this also need review?

Without having read all of the comments on this lengthy ticket it feels like it would make a lot more sense to have mediawiki-vendor maintained automatically as part of post merge jobs, or as part of the branch cuts?

Probably not gonna happen. If we allow post merge jobs to randomly update/install stuff, it defeats the point of the auditability of having the mediawiki-vendor repo etc.

"oh, the bot installed/updated that after patch $x" is not acceptable.

There can still be a review process, and IMO this should be part of the train / deployment process / branch cut process.

The automated branch cut process that RelEng keeps highlighting as a problem when people expect manual review because they don't have the knowledge of the codebase to make any judgements on?

I think adding quibble-composer-mysql-php72-noselenium-docker to extension-quibble would suffice to fix this task. If RelEng are OK with the added load, that sounds simple enough?

Probably not gonna happen. If we allow post merge jobs to randomly update/install stuff, it defeats the point of the auditability of having the mediawiki-vendor repo etc.

The composer.lock file prevents this from happening:

Commit your composer.lock file to version control

Committing this file to VC is important because it will cause anyone who sets up the project to use the exact same versions of the dependencies that you are using. Your CI server, production machines, other developers in your team, everything and everyone runs on the same dependencies, which mitigates the potential for bugs affecting only some parts of the deployments. Even if you develop alone, in six months when reinstalling the project you can feel confident the dependencies installed are still working even if your dependencies released many new versions since then. (See note below about using the update command.)

https://getcomposer.org/doc/01-basic-usage.md#installing-with-composer-lock

Sure. But if we/composer just follow what the composer.lock says, that doesn't get towards any sort of "maintained automatically", does it?

Sure. But if we/composer just follow what the composer.lock says, that doesn't get towards any sort of "maintained automatically", does it?

What do you mean? This gets maintained automatically if we were to use composer to load the extensions themselves. This is basically what the Drupal project does.

I think adding quibble-composer-mysql-php72-noselenium-docker to extension-quibble would suffice to fix this task. If RelEng are OK with the added load, that sounds simple enough?

I'm pretty sure this is the correct answer. For most deployed things, CI makes sure it works with mediawiki/vendor, which is what's used in production.

Without having read all of the comments on this lengthy ticket it feels like it would make a lot more sense to have mediawiki-vendor maintained automatically as part of post merge jobs, or as part of the branch cuts?

Probably not gonna happen. If we allow post merge jobs to randomly update/install stuff, it defeats the point of the auditability of having the mediawiki-vendor repo etc.

"oh, the bot installed/updated that after patch $x" is not acceptable.

Agreed. [off-topic] But it certainly would be nice to have automatic tooling that proposed the updates (probably pre-merge, not post-merge) and left it for humans to CR+2.

This just happened to us with some WIkibase extensions.
Apparently none of our CI now actually does a composer install it seems? otherwise this should have been caught?

Afaik all or most repos test both vendor and composer (and third-party repos hosted in Gerrit only test with composer). There is no intention not to support or continuously validate that a fresh install from composer based purly on local repo composer.json works. If an individual repository or job variant no longer runs this version of the job, that's mostly done by accident or as optimisation and should be uncontroversial to reinstate.

At least for core and for most repos I spot-checked, we do still run at least one job that installs core + repo + dependencies and runs composer instead of vendor.

Sure. But if we/composer just follow what the composer.lock says, that doesn't get towards any sort of "maintained automatically", does it?

What do you mean? This gets maintained automatically if we were to use composer to load the extensions themselves. This is basically what the Drupal project does.

Take a look at:
https://www.drupal.org/docs/develop/using-composer/using-composer-to-install-drupal-and-manage-dependencies#s-create-a-project
and
https://www.drupal.org/docs/develop/using-composer/using-composer-to-install-drupal-and-manage-dependencies#adding-modules

I imagine MediaWiki could take an approach like this and Wikimedia would be implemented like any other site. A system like this would allow the dependencies to be completely managed by Composer.

Agreed. [off-topic] But it certainly would be nice to have automatic tooling that proposed the updates (probably pre-merge, not post-merge) and left it for humans to CR+2.

+1, pre merge just sounds slightly harder to write some code for than post merge.
pre merge you need to account for changes to the Patchset etc.

At least for core and for most repos I spot-checked, we do still run at least one job that installs core + repo + dependencies and runs composer instead of vendor.

If this was the case (in terms of dependencies for wikimedia production) then those jobs should have been broken for ~ 30 mins today between https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/672439 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseQualityConstraints/+/672990
It could be that none of this particular job actually go run in that window?

I think adding quibble-composer-mysql-php72-noselenium-docker to extension-quibble would suffice to fix this task. If RelEng are OK with the added load, that sounds simple enough?

Sounds like a solution, thought we could probably also create a lighter weight job that literally just does a composer install with all wmf deployed things and then stops there?
I had a quick look around composer docs to see if there is a way to do a dry run install, but I didn't spot anything.

For https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/672439 , it bumps data-values/number to 0.11.0

The CI jobs use mediawiki/vendor.git to fulfill the dependencies. Looking at the composer.vendor.json.txt file attached to the build (which is the composer.json from mediawiki/vendor.git) it has: "data-values/number": "0.10.1"

So everything ran with 0.10.1 and the change got merged.

Then from now on, any jobs using composer to ship the dependency would end up being broken cause WikibaseQualityConstraints was still using 0.10.1 whereas Wikibase now requests 0.11.0.

I believe the jobs using vendor.git were unaffected.

When vendor.git is used, we might want to add a step to validate the dependencies are correct? That would surely bring in nice incompatibilities which would be addressed by bumping the dependency in vendor then bump it in all affected extensions/skins. I can imagine a dependency checker job that would clone all the deployed repositories and vendor, then run composer / composer merge plugin to ensure we stay compatible.

An alternative is to overhaul the whole system and use a common dependency for production. Maven has the concept of a Bill of Materials ( https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html ) which centrally manage the dependencies and then the various plugins/extensions/skins would refer to it. That is not an easy task though and I don't really see how we can envision such a system in composer :\

I think adding quibble-composer-mysql-php72-noselenium-docker to extension-quibble would suffice to fix this task. If RelEng are OK with the added load, that sounds simple enough?

I am not sure what that will solve? Is the intent to run composer + composer merge plugin to validate the whole dependencies tree (including vendor.git)? I guess we can come with a dedicated job that just run that composer install step:

quibble --packages-source vendor -c "composer install"

And maybe solely trigger it when composer.json is touched?

I think adding quibble-composer-mysql-php72-noselenium-docker to extension-quibble would suffice to fix this task. If RelEng are OK with the added load, that sounds simple enough?

I am not sure what that will solve? Is the intent to run composer + composer merge plugin to validate the whole dependencies tree (including vendor.git)?

No, it'd ensure that running composer + the composer-merge-plugin for the whole dependency tree *except* vendor won't break the world. The existing vendor jobs should ensure that the vendor repo is sufficient. The intersection of these two states is what we want – passes on vendor, and passes with fresh composer install without conflicts.

I guess we can come with a dedicated job that just run that composer install step:

quibble --packages-source vendor -c "composer install"

And maybe solely trigger it when composer.json is touched?

Could do this instead, as an optimisation, sure.

Tagging releng team per my comment in another ticket

I'm not sure if this should be on the external column of the Release-Engineering-Team board.
This is an issue with how mediawiki-vendor and development is tied together that is not owned by us (wmde / wikibase) at all.
Our life would be easier without this repository entirely.
The direct result of not tackling this issue is that production etc will continue to break in unexpected ways due to difference between mediawiki-vendor and dependencies defined in extensions (not just Wikibase).

thcipriani subscribed.

Tagging releng team per my comment in another ticket

I'm not sure if this should be on the external column of the Release-Engineering-Team board.
This is an issue with how mediawiki-vendor and development is tied together that is not owned by us (wmde / wikibase) at all.
Our life would be easier without this repository entirely.
The direct result of not tackling this issue is that production etc will continue to break in unexpected ways due to difference between mediawiki-vendor and dependencies defined in extensions (not just Wikibase).

Will update on other ticket, sorry for delay: I have thoughts, they are long and complicated