Page MenuHomePhabricator

Deployment infrastructure for PHP microservices
Closed, ResolvedPublic

Description

There is no service that uses the deployment pipeline and is php-based. PHP support will need to be added for T260330.

This should be pretty straightforward - SRE will provide a base php image, and the only thing that needs to be done in addition is to copy the source files under a specific folder in the container and (I guess?) run composer there.

Specifically, this project has an additional complication:

we will need to publish multiple images for each tagged version of the software, each one of them using the same base layer, but installing some additional software in each one of them.

For instance we might have:

  • one variant with the php service + imagemagick
  • one variant with the php service + pygmentize
  • one variant with the php service + ploticus

and so on. And we'd need to publish all of them. Can the pipeline be adapted to do that?

Event Timeline

thcipriani added subscribers: mmodell, thcipriani.

Scap3 was built as a general deployment tool, and should be able to deploy PHP code (I believe phabricator is deployed via scap -- although @mmodell might correct me there :))

Docs for scap3 (in varying levels of detail) are on Wikitech and doc.wikimedia.org.

Is the plan to deploy this as a standalone service? Or will this be similar to how php parsoid is currently deployed (cf: T240055)?

@thcipriani this is a service that will have to be deployed to kubernetes. So I think the actual problem is that the deployment pipeline doesn't currently support php.

Joe updated the task description. (Show Details)

Change 623658 had a related patch set uploaded (by Jeena Huneidi; owner: Jeena Huneidi):
[blubber@master] Support PHP microservices

https://gerrit.wikimedia.org/r/623658

daniel removed a project: Patch-For-Review.

I added php support to blubber. An example of what would be added to the variant in your blubberfile for php services:

php:
  requirements: [composer.json]
  production: true # optional for adding the --no-dev flag

This will add the composer install run command to the dockerfile
If there is a need to run composer install in a dir other than the workdir, we'll need to make some more changes.

Naike moved this task from Backlog to In Progress on the MW-on-K8s board.
Naike moved this task from In Progress to In Review on the MW-on-K8s board.

Have we ever run composer in production/part of an automated build process like this? For MediaWiki we commit all the dependencies in mediawiki/vendor which allows for human review/auditing of dependencies, makes it trivial to patch dependencies without waiting for upstream and removes any hard dependencies on packagist and github.

My first inclination would be to do something similar here, maybe even piggybacking off of mediawiki/vendor given that the dependencies will be a subset. What do other npm services do?

npm services build the whole of their (non-dev) npm dependency graph into the image and use that, yeah; we could do the same for composer without major concern, I think.

Some kind of /deploy repo seems needed, I think, as otherwise we would be deploying unaudited code never seen by a trusted pair of eys. There'd be no diff to review during production dependency updates or image rebuilds.

As I understand it, there's a halt on that npm approach which indeed seems to have slipped away from the deploy-repo approach for one or two services in the past year without Security realizing it. This is unfortunate, but also makes it a bad example to follow.

A simpler approach than a deploy repo would be to commit the production dependencies to the repo itself. Given the generally minimal and responsible use of dependencies in the PHP ecosystem, that's probably quite do-able as well. The only issue with that is that Composer makes a mess of that during local development, which one then has to carefully undo or step around when drafting other commits. One might be able to make that simpler by using a few gitignore rules, or by having two composer.json files. One for development using the standard vendor directory that we don't check-in, and another for production with vendor-dir set to lib/ or some such.

Some kind of /deploy repo seems needed, I think, as otherwise we would be deploying unaudited code never seen by a trusted pair of eys. There'd be no diff to review during production dependency updates or image rebuilds.

How feasible is it in this case to audit, via a trusted pair of eyes, the code though? In the nodejs case, being probably the pathological case, it's borderline impossible. The dependency tree, even flattened, is huge usually and brings in probably tens (if not hunders) of thousands of lines of code per project. E.g. and of the top of my head, citoid's old deploy repo[1] clocked at 369055 LoC for javascript files only (including blank lines and comments, but the size would be staggering anyway even counting those out) for dependent node modules. I expect this to have increase since then as well as other projects exhibiting similar numbers. This is a known issue and the npm ecosystem has introduced the npm audit command which somewhat makes this a bit better but at least informing of the known vulnerabilities. But auditing for unknown vulnerabilities still is a herculean task.

I guess if the number of dependencies is small enough (which I would expect for the service in question), it remains doable, but it might not be desirable in the future if those increase substantially. In any case, to increase reproducibility and auditibility dependencies should be version pinned.

As I understand it, there's a halt on that npm approach which indeed seems to have slipped away from the deploy-repo approach for one or two services in the past year without Security realizing it. This is unfortunate, but also makes it a bad example to follow.

Wait, what? First time I hear of this. When did that halt happen? Has it been communicated? All of the nodejs services on kubernetes follow the npm install approach for a long time now, what does that mean for them?

A simpler approach than a deploy repo would be to commit the production dependencies to the repo itself. Given the generally minimal and responsible use of dependencies in the PHP ecosystem, that's probably quite do-able as well. The only issue with that is that Composer makes a mess of that during local development, which one then has to carefully undo or step around when drafting other commits. One might be able to make that simpler by using a few gitignore rules, or by having two composer.json files. One for development using the standard vendor directory that we don't check-in, and another for production with vendor-dir set to lib/ or some such.

The vendoring approach described above is also used in the pipeline, e.g. see blubber's repo[2]. And in the larger ecosystem, kubernetes uses the vendoring approach very heavily. It comes of course with its own gotchas as dependencies have all to be tracked and updated (semi-?)manually. From my own experience with deploy repos in WMF, I prefer that approach to deploy repos, as the latter might have reproducibility problems and a rather poor UX (clone the normal repo, run the commands to create the deploy repo, push a large commit that's difficult to audit, merge, then use that repo).

[1] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/citoid/deploy/+/refs/heads/master/node_modules/
[2] https://gerrit.wikimedia.org/r/plugins/gitiles/blubber/+/refs/heads/master/vendor/

Some kind of /deploy repo seems needed, I think, as otherwise we would be deploying unaudited code never seen by a trusted pair of eys. There'd be no diff to review during production dependency updates or image rebuilds.

How feasible is it in this case to audit, via a trusted pair of eyes, the code though? In the nodejs case, being probably the pathological case, it's borderline impossible. The dependency tree, even flattened, is huge usually and brings in probably tens (if not hunders) of thousands of lines of code per project. E.g. and of the top of my head, citoid's old deploy repo[1] clocked at 369055 LoC for javascript files only (including blank lines and comments, but the size would be staggering anyway even counting those out) for dependent node modules. I expect this to have increase since then as well as other projects exhibiting similar numbers. This is a known issue and the npm ecosystem has introduced the npm audit command which somewhat makes this a bit better but at least informing of the known vulnerabilities. But auditing for unknown vulnerabilities still is a herculean task.

As I alluded to earlier, all of the Shellbox dependencies have already been audited via their inclusion in mediawiki/vendor. I don't know if the Shellbox service dependencies will forever be a subset of MediaWiki's, but at least for now I think there's no extra work being added if we want to audit all the PHP dependencies. It might just be me, but if I find PHP code easier to audit compared to nodejs's explosion of libraries that might've been transpiled or minified or whatever. And PHP doesn't attempt to ship/compile native code into vendor/!

I guess if the number of dependencies is small enough (which I would expect for the service in question), it remains doable, but it might not be desirable in the future if those increase substantially. In any case, to increase reproducibility and auditibility dependencies should be version pinned.

Shellbox's vendor/ has a little under 16K LoC. I can see us adding a few more libraries like something for metrics, but nothing like the 369K you mentioned for citoid.

A simpler approach than a deploy repo would be to commit the production dependencies to the repo itself. Given the generally minimal and responsible use of dependencies in the PHP ecosystem, that's probably quite do-able as well. The only issue with that is that Composer makes a mess of that during local development, which one then has to carefully undo or step around when drafting other commits. One might be able to make that simpler by using a few gitignore rules, or by having two composer.json files. One for development using the standard vendor directory that we don't check-in, and another for production with vendor-dir set to lib/ or some such.

The vendoring approach described above is also used in the pipeline, e.g. see blubber's repo[2]. And in the larger ecosystem, kubernetes uses the vendoring approach very heavily. It comes of course with its own gotchas as dependencies have all to be tracked and updated (semi-?)manually. From my own experience with deploy repos in WMF, I prefer that approach to deploy repos, as the latter might have reproducibility problems and a rather poor UX (clone the normal repo, run the commands to create the deploy repo, push a large commit that's difficult to audit, merge, then use that repo).

This probably works in the short-term but I think it would make any non-Wikimedia deployment of shellbox rather difficult if your PHP version/environment doesn't match exactly what we use.

Specifically regarding pygmentize, we should be using the version that's currently bundled in the mediawiki/extensions/SyntaxHighlight_GeSHi repository, as the lexer list and generated CSS need to stay in sync. I'm not really sure how we'd go about that.

Some kind of /deploy repo seems needed, I think, as otherwise we would be deploying unaudited code never seen by a trusted pair of eys. There'd be no diff to review during production dependency updates or image rebuilds.

How feasible is it in this case to audit, via a trusted pair of eyes, the code though? In the nodejs case, being probably the pathological case, it's borderline impossible. The dependency tree, even flattened, is huge usually and brings in probably tens (if not hunders) of thousands of lines of code per project. E.g. and of the top of my head, citoid's old deploy repo[1] clocked at 369055 LoC for javascript files only (including blank lines and comments, but the size would be staggering anyway even counting those out) for dependent node modules. I expect this to have increase since then as well as other projects exhibiting similar numbers. This is a known issue and the npm ecosystem has introduced the npm audit command which somewhat makes this a bit better but at least informing of the known vulnerabilities. But auditing for unknown vulnerabilities still is a herculean task.

As I alluded to earlier, all of the Shellbox dependencies have already been audited via their inclusion in mediawiki/vendor. I don't know if the Shellbox service dependencies will forever be a subset of MediaWiki's, but at least for now I think there's no extra work being added if we want to audit all the PHP dependencies. It might just be me, but if I find PHP code easier to audit compared to nodejs's explosion of libraries that might've been transpiled or minified or whatever. And PHP doesn't attempt to ship/compile native code into vendor/!

I guess if the number of dependencies is small enough (which I would expect for the service in question), it remains doable, but it might not be desirable in the future if those increase substantially. In any case, to increase reproducibility and auditibility dependencies should be version pinned.

Shellbox's vendor/ has a little under 16K LoC. I can see us adding a few more libraries like something for metrics, but nothing like the 369K you mentioned for citoid.

OK, then on the order of still feasible, at least for now and some foreseeable future. Thanks for sharing that.

A simpler approach than a deploy repo would be to commit the production dependencies to the repo itself. Given the generally minimal and responsible use of dependencies in the PHP ecosystem, that's probably quite do-able as well. The only issue with that is that Composer makes a mess of that during local development, which one then has to carefully undo or step around when drafting other commits. One might be able to make that simpler by using a few gitignore rules, or by having two composer.json files. One for development using the standard vendor directory that we don't check-in, and another for production with vendor-dir set to lib/ or some such.

The vendoring approach described above is also used in the pipeline, e.g. see blubber's repo[2]. And in the larger ecosystem, kubernetes uses the vendoring approach very heavily. It comes of course with its own gotchas as dependencies have all to be tracked and updated (semi-?)manually. From my own experience with deploy repos in WMF, I prefer that approach to deploy repos, as the latter might have reproducibility problems and a rather poor UX (clone the normal repo, run the commands to create the deploy repo, push a large commit that's difficult to audit, merge, then use that repo).

This probably works in the short-term but I think it would make any non-Wikimedia deployment of shellbox rather difficult if your PHP version/environment doesn't match exactly what we use.

Yes, vendoring is not without gotchas. As far as PHP versions go, we can probably solve most of the easy issues in CI by testing across the PHP versions matrix we aim to support. Distribution/OS differences aren't going to be easy to catch though unless we add tests for those too (and it's probably not worth it until he have to cross that bridge). I don't think other approaches (e.g. deploy/ repo, running composer in the pipeline) would catch those either (I might be wrong). Environment diffs are a necessary evil as deployments will differ per administrative domain (who and how actually runs the thing). Reproducible environments via containers is the only feasible way to fight that in our case, by releasing the docker containers for others to use, if they wish.

As I understand it, there's a halt on that npm approach which indeed seems to have slipped away from the deploy-repo approach for one or two services in the past year without Security realizing it. This is unfortunate, but also makes it a bad example to follow.

Wait, what? First time I hear of this. When did that halt happen? Has it been communicated? All of the nodejs services on kubernetes follow the npm install approach for a long time now, what does that mean for them?

I'm not certain there's ever been a reasonable, organizationally-accepted policy or set of guidelines around using various npm commands (especially install) at any point along a given extension/app/service's production deployment path. That being said, npm install shouldn't be run on any production hardware as part of a build/deployment step, if that's currently happening. Regarding npm dependency security review - that presents a far greater challenge than it does within php, python, etc.. The Security-Team is absolutely not staffed or equipped to manually review hundreds of thousands of lines of often volatile npm dependency code. We currently make a best effort with automated SCA tooling and ad-hoc manual reviews of specific production dependencies. Though this is all outside the scope of this task.

That being said, npm install shouldn't be run on any production hardware as part of a build/deployment step, if that's currently happening.

The ambiguous word "production" here is not incredibly helpful. I assume you mean "within the trusted network with direct access to wiki databases and their associated PII", but that assumption could be incorrect. Projects using PipelineLib can and do execute npm install or npm ci on hardware that is not located in the aforementioned trusted network segment, but to call our CI/CD systems "not production" is perpetuating a fallacy that business could continue as normal without CI/CD systems.

As I understand it, there's a halt on that npm approach which indeed seems to have slipped away from the deploy-repo approach for one or two services in the past year without Security realizing it. This is unfortunate, but also makes it a bad example to follow.

Wait, what? First time I hear of this. When did that halt happen? Has it been communicated? All of the nodejs services on kubernetes follow the npm install approach for a long time now, what does that mean for them?

I'm not certain there's ever been a reasonable, organizationally-accepted policy or set of guidelines around using various npm commands (especially install) at any point along a given extension/app/service's production deployment path.

If there has ever been a policy, it has not been communicated, never mind being organizationally accepted (and/or reasonable). That being said, SRE has up to now successfully resisted having npm installed in servers/containers that are responding to requests (end-user or otherwise). However that approach clearly made zero sense in CI/CD systems. People were anyway circumventing it by running npm install locally on their laptops and uploading to gerrit the artifacts (in /deploy repos). With the advent of the Deployment Pipeline we 've just accepted that and are now running npm install and npm test in CI/CD systems.

That being said, npm install shouldn't be run on any production hardware as part of a build/deployment step, if that's currently happening.

That's what's happening. For years now (since 2017). Not as part of a deployment step (SRE would never agree to that), but as part of the build step of a container, as well as npm test being run as part of the test step of a container.

We can go into semantics about what "production hardware" means in your sentence above as @bd808 points out, but running npm install, npm test are crucial parts of the container building pipeline right now and have been for years. They need to be run in order to have nodejs services in the first place. For what is worth, pip install is also run for python applications to create the container. For other languages golang projects for now are accommodated using Debian packaged dependencies or the vendoring approach when that is not feasible. Similarly I 'd expect to accommodate other languages (if/when/needed) by utilizing their own package managers.

Regarding npm dependency security review - that presents a far greater challenge than it does within php, python, etc.. The Security-Team is absolutely not staffed or equipped to manually review hundreds of thousands of lines of often volatile npm dependency code. We currently make a best effort with automated SCA tooling and ad-hoc manual reviews of specific production dependencies.

That's reasonable.

Though this is all outside the scope of this task.

Agreed. If we need to discuss more the points above, I 'd suggest we split it off into another task.

As I understand it, there's a halt on that npm approach which indeed seems to have slipped away from the deploy-repo approach for one or two services in the past year without Security realizing it. This is unfortunate, but also makes it a bad example to follow.

Wait, what? First time I hear of this. When did that halt happen? Has it been communicated? All of the nodejs services on kubernetes follow the npm install approach for a long time now, what does that mean for them?

I'm not certain there's ever been a reasonable, organizationally-accepted policy or set of guidelines around using various npm commands (especially install) at any point along a given extension/app/service's production deployment path. That being said, npm install shouldn't be run on any production hardware as part of a build/deployment step, if that's currently happening. Regarding npm dependency security review - that presents a far greater challenge than it does within php, python, etc.. The Security-Team is absolutely not staffed or equipped to manually review hundreds of thousands of lines of often volatile npm dependency code. We currently make a best effort with automated SCA tooling and ad-hoc manual reviews of specific production dependencies. Though this is all outside the scope of this task.

Alex and Bryan already clarified some points, but I want to lay the argument out pretty clearly:

  • I trust the security and opsec of our CI more than I trust the opsec of the individual developer laptop. Not because I think our developers are unable to keep their laptops free of malware, but because the attack surface of a personal computer is incredibly large compared to a production, firewalled system.
  • Give we do no manual security review of dependencies, we should resort to set up an automated scanning of the dependencies, for which just using the frozen dependencies file is enough. See what github does. If I had to invest time and money on something, that would be it
  • If we want to ensure we're not vulnerable to sudden breaches of npm et al (not just taking over a dev account; I'm talking diffused compromise of the platform), we need an artifact repository where to import the dependencies we want to use at build time.

So, if we're really interested in improving the security of our dependency management, the deploy repos are not the solution, but rather an obsolete solution that gives a false sense of comfort.

In the case of PHP, vendoring dependencies is more manageable and I don't oppose it, but I still think it's not particularly better than automated scanning and reporting, plus an artifact repository, quite the contrary.

I also want to stress this is the wrong task to have the above discussions, please open a new one to keep discussing this stuff.

@jeena we already have a .pipeline directory for Shellbox, and it works as intended in creating multiple images. Should we consider this task resolved?