Page MenuHomePhabricator

automatically build and commit mediawiki/vendor (composer)
Closed, DeclinedPublic

Description

Automatically build and commit mediawiki/vendor (the output of composer) when running CI for the master branch of mediawiki/core. (This needs to later be extend to all extensions deployed on beta that add dependencies via the composer merge plugin.)

The reason this is needed is because we need a way to deploy to beta when we use composer during ci (T90303) instead of the vendor repository. Also automating this minimizes the window when core is broken after a change to vendor or the dependencies (T88211). In a world where we intend to actively split out common functionality in components making the use and update of components less painful is a good idea. This task would make T74952 unnecessary.

Event Timeline

JanZerebecki raised the priority of this task from to Medium.
JanZerebecki updated the task description. (Show Details)
JanZerebecki added subscribers: bd808, DanielFriesen, Krinkle and 4 others.

I don't think we should do this.

Automating this defeats the purpose of mediawiki/vendor and compromises our production integrity. There won't be any moment during which mediawiki-core is broken. This is only the case right now because we use mediawiki/vendor on master, instead of Composer. We must use vendor when running tests for vendor itself (which runs core tests), and for wmf branches of MediaWiki core and extensions. But not anywhere else. I've outlined a working model for this in the past on the tasks you mentioned.

We could automatically draft commits for vendor, but the code must be manually reviewed and approved by someone we trust. Last I checked, that is non-negotiable. Whatever gain it might bring to our workflow is irrelevant.

There are also careful restrictions with regards to conflicts with versions and modules (or their dependencies) used in operations/mediawiki-config which must be compatible with MediaWiki, since they share the same runtime environment.

I created an RFC (T105638) where this task might be one of the solutions. I'm open to other solutions to the problems outlined in the RFC. Maybe most of the following should be discussed further on the RFC.

Automating this defeats the purpose of mediawiki/vendor and compromises our production integrity.

I think it is better to move the review to earlier in the chain so that it easier possible to focus on what matters and automate other parts.

There won't be any moment during which mediawiki-core is broken. This is only the case right now because we use mediawiki/vendor on master, instead of Composer. We must use vendor when running tests for vendor itself (which runs core tests), and for wmf branches of MediaWiki core and extensions. But not anywhere else. I've outlined a working model for this in the past on the tasks you mentioned.

Submodule updates on wmf branches are automatic so there is a moment where core is broken every time something in vendor needs to change. Even explicitly so by the update.php version check.

There are also careful restrictions with regards to conflicts with versions and modules (or their dependencies) used in operations/mediawiki-config which must be compatible with MediaWiki, since they share the same runtime environment.

Are these restrictions documented somewhere? How are they tested?

Automating this defeats the purpose of mediawiki/vendor and compromises our production integrity.

I think it is better to move the review to earlier in the chain so that it easier possible to focus on what matters and automate other parts.

Jan and I talked about this for a while today. My stance is that there needs to be reasonable certainty that the code that was reviewed is the code that is deployed. In the current mediawiki/vendor repo, we do this by deploying what has been reviewed and +2'ed in gerrit. When composer is used to pull in a library where the updates have already been reviewed, composer cannot be reasonably relied on to ensure that the resulting codebase is the same as the one that was reviewed (see https://github.com/composer/composer/issues/1074, https://github.com/Stype/packagistproxy). So the code review needs to come later, not earlier.

So in the case where a library is included in vendor, and the library is maintained by MediaWiki developers, the library owner should code review patches to the library for security / performance / architecture / etc, AND the update of that library in mediawiki/vendor.git needs to be reviewed to ensure that it is the same set of changes that were reviewed by the library maintainer (and even better if the person +2ing the patchset in vendor also reviews for security / performance, just like if they were patching a library not maintained by MediaWiki developers, but it's understandable that might not always be practical).

So let's make sure the process and review checkpoints are correct, and then we can figure out how to automate the process. In the specific case that Jan brought to me, I'd be happy to see:

  1. Library maintainer code reviews patches
  2. Deployment repo is composed with composer on a secure build system
  3. For each library pulled in with composer, the directory is recursively diffed against a known-good version of the repository (pulled down with git, or some other utility that actually has integrity checking).
  4. Composed deployment branch is committed and automatically merged for deployment

(pulled down with git, or some other utility that actually has integrity checking)

Transport integrity as in HTTPS to github (without key/ca pinning)?

(pulled down with git, or some other utility that actually has integrity checking)

Transport integrity as in HTTPS to github (without key/ca pinning)?

Better would be an ssh connection to gerrit where the fingerprint has been verified. But yes, an HTTPS connection which has a valid CA signature would make MITM-ing this sufficiently expensive that I would be ok with it for the near future.

So let's make sure the process and review checkpoints are correct, and then we can figure out how to automate the process. In the specific case that Jan brought to me, I'd be happy to see:

  1. Library maintainer code reviews patches
  2. Deployment repo is composed with composer on a secure build system
  3. For each library pulled in with composer, the directory is recursively diffed against a known-good version of the repository (pulled down with git, or some other utility that actually has integrity checking).
  4. Composed deployment branch is committed and automatically merged for deployment

Does step 1 in this process rule out 3rd party libraries? Today I see 9 3rd party libraries in the composer.lock of mediawiki/vendor.git.

Does step 1 in this process rule out 3rd party libraries? Today I see 9 3rd party libraries in the composer.lock of mediawiki/vendor.git.

This was specifically for "the case where a library is included in vendor, and the library is maintained by MediaWiki developers."

For 3rd-party libraries, we code review the initial commit to vendor, and updates to the library. So all of the code in vendor has been +2'ed by a MediaWiki developer.

Does step 1 in this process rule out 3rd party libraries? Today I see 9 3rd party libraries in the composer.lock of mediawiki/vendor.git.

This was specifically for "the case where a library is included in vendor, and the library is maintained by MediaWiki developers."

For 3rd-party libraries, we code review the initial commit to vendor, and updates to the library. So all of the code in vendor has been +2'ed by a MediaWiki developer.

I don't see technically how we can treat the two origins as separate concerns programmatically.

I honestly don't see a good way to do it programmatically without hacking
up composer, but maybe we can find/build a solution?

Let me go back and reiterate, my main concern is that every commit in core,
WMF deployed extensions, skins, and vendor have been code reviewed by a
mediawiki developer. If we're adding code that had previously been reviewed
by a mediawiki developer, and we're really sure that is the same code that
was reviewed, I'm fine with special casing that and automating some of the
busywork getting it deployed. But it's the exception, not the rule.

So if we can find a way to whitelist some repos in vendor and automatically
update them when we deploy, I'm not going to block it for security. But no,
I don't see a way to do that using stock composer, and we shouldn't do that
for libraries that we don't maintain.