Page MenuHomePhabricator

Installing composer modules for deployment
Open, MediumPublic

Description

In Parsoid/JS, all of the npm modules only include JS code and there are no binary dependencies. This let us pre-install these modules and check them into git in the mediawiki/services/parsoid/deploy repository. So, deployment to ruthenium, beta cluster, or production was a matter of simply deploying this repository without any deployment-time npm install involved.

We should figure out how feasible this approach is for Parsoid/PHP and/or what is the deployment practice for PHP code wrt composer modules.

This has a bearing on QA and testing on ruthenium (see T213493: Install PHP7 on scandium) so it would be useful to think about this and resolve this sooner than later.

Details

Related Gerrit Patches:

Event Timeline

ssastry created this task.Jan 10 2019, 10:54 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 10 2019, 10:54 PM
ssastry triaged this task as Medium priority.Jan 10 2019, 10:58 PM
ssastry moved this task from Backlog to Deployment on the Parsoid-PHP board.Jan 10 2019, 11:15 PM

For WMF deployments of MediaWiki, we use the mediawiki/vendor repo for the composer libraries in what sounds like much the same way as you use mediawiki/services/parsoid/deploy. When we get to the point of pulling the Parsoid-PHP library into MediaWiki, that's how it'll be done to deploy to Beta and prod.

For the mixed JS/PHP testing use case described in T213493, you could probably just go ahead and include the vendor/ directory in mediawiki/services/parsoid/deploy.

Third parties are generally expected to use composer install (with or without --no-dev), much like they're expected to use npm install for node modules.

Joe added a subscriber: Joe.

Given parsoid/PHP is intended to be used as a library by a MediaWiki extension, I think it should be included in the code we release with scap, and the extension be activated only on scandium for the time being.

My point is - we need a full MediaWiki installation do do parsoid testing, that will be updated via scap, so why should we dwelve in (failure-prone) symlinking directories from other checkouts.

Updates to parsoid should probably go out via scap in production, as regular deployments as well.

I'd ask the Release-Engineering-Team team to advise on the best way forward.

From the release perspective, I think including it in mediawiki/vendor to be released by scap with MediaWiki makes sense unless there is some compelling reason not to.

From the release perspective, I think including it in mediawiki/vendor to be released by scap with MediaWiki makes sense unless there is some compelling reason not to.

Great, Thanks! Let me know if there is something we need to do here on the parsing-team end with the Parsoid repo to enable this or if this is already covered by existing tooling.

Parsoid/PHP is currently configured as an extension to let us test the Parsoid API endpoints and all the integration code without having to merge all that code into core. And, https://www.mediawiki.org/wiki/Manual:External_libraries indicates that we need the load_composer_autoloader: true enabled which it already is.

So, after chatting with folks in the releng IRC channel, it appears we need to get the necessary additional repos into core's vendor repo then.

Tgr added a subscriber: Tgr.Jul 29 2019, 6:44 PM

FWIW, load_composer_autoloader is used in certain third-party setups, and in Vagrant, where Composer is run for each extension separately. It doesn't do anything in production, where Composer is run centrally via composer-merge-plugin (and then that gets snapshotted to mediawiki/vendor).

These are the only two libraries that we will need to add to the vendor repo both of which are maintained by us.

"wikimedia/wikipeg": "2.0.3",
"wikimedia/zest-css": "1.1.2",

WikiPEG is @tstarling's fork of pegjs which we renamed to WikiPEG. The PHP base classes are the port of the JS code and Tim wrote the PHP code generator which generates PHP code from the grammar file in the Parsoid repo.

@cscott ported zest.js to PHP.

Change 526249 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Bump semver to 1.5.0 to match core's version

https://gerrit.wikimedia.org/r/526249

Change 526262 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] WIP: Add new Parsoid/PHP dependencies

https://gerrit.wikimedia.org/r/526262

Change 526249 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Bump semver to 1.5.0 to match core's version + handle fallout of that

https://gerrit.wikimedia.org/r/526249

Since Parsoid is PHP 7.2+ and includes zest and wikipeg which are also PHP 7.2+, I think the best option is for Parsoid to check in a local copy of zest and wikipeg, either in a deploy repo or directly in the source code. That way we're not adding non-HHVM-compatible code to the production vendor repo (yet! we'd do so once HHVM is retired), and presumably the scap configuration will already take care of installing Parsoid-as-an-extension only on PHP 7.2+ hosts.

(As I understand it, composer supports nested vendor/ directories, so as long as Parsoid-the-extension-as-it-is-released contains a populated vendor/ directory with wikipeg and zest everything will Just Work. Feedback welcome!)

cscott added a comment.EditedJul 30 2019, 11:39 PM

How's this:

bundled/wikimedia/zest-css -- copy from vendor/wikimedia/zest-css
bundled/wikimedia/wikipeg -- copy from vendor/wikimedia/wikipeg

and add:

	"AutoloadNamespaces": {
		"MWParsoid\\": "extension/src",
                "Wikimedia\\Zest\\": "bundled/wikimedia/zest-css/src/",
                "WikiPEG\\": "bundled/wikimedia/wikipeg/src"
	},

We could leave the composer.json alone, so that standalone mode isn't affected; we'd only use the bundled libraries when running as an extension (for a short transition period).

Alternatively we could leave extension.json alone and remove wikimedia/wikipeg and wikimedia/zest-css from the require clause in composer.json, and add the appropriate phrases to the autoload clause in composer. That has the benefit of running exactly the same code standalone or as an extension, but removes the link to the original source and version of wikipeg and zest, making it a little harder to switch composer.json back once the short-term need for HHVM compatibility is gone.

Quickly stashing it in the repo seems the simplest way forward for now.

presumably the scap configuration will already take care of installing Parsoid-as-an-extension only on PHP 7.2+ hosts.

How is that achieved with the scap configuration? I took a look at parsoid-deploy/scap/scap.cfg and I couldn't really see how that would be achieved.

Are we still going to be deploying parsoid's /deploy repository? We can bundle the composer dependencies in that repository for now.

As for hooking up parsoid + MW, I was thinking that we'd deploy parsoid using its normal deployment stuff, and then in MediaWiki's CommonSettings.php, we'd check to see if parsoid exists on the filesystem (also maybe guarded by hostname check) and then load the dummy extension.

This would all be temporary, as the real deployment would have the MW glue code in MW itself, and Parsoid plus its dependencies would go in mediawiki/vendor.

cscott added a comment.EditedJul 31 2019, 9:34 PM

Since Parsoid is currently PHP 7.2+, I think adding parsoid directly to core as a library will probably have to wait until T192166: Drop HHVM support from MediaWiki or T228342: Define criteria for setting explicit PHP support target for MediaWiki. It seems like that's not likely to happen until November-ish, maybe MW 1.35. So it's worth thinking of deploying as an extension as a medium-term strategy that may still be in place for the initial production use of Parsoid, which we're still hoping will happen in the Sept/Oct timeframe. It doesn't seem like packaging as an extension is causing any real trouble at the moment.

@Legoktm the deploy repo adds an extra level of directory hierarchy. That seems like it could cause trouble for an extension deploy; we'd need to add a new extension.json which copied the 'real' one and added an extra directory level to the autoload paths, etc. Seems like a pain compared to just deploying from the main Parsoid repo and adding a few files in bundled/wikimedia. In that case Parsoid is a "real" extension and doesn't require any special handling from either CI or scap compared to any other extension.

EDIT: @Legoktm suggests on IRC that you could leave the extra level of directory hierarchy in the deploy repo and just adjust the path in wfLoadExtension() to point at the 'real' extension.json inside Parsoid/src (which is where the main Parsoid codebase is checked out as a submodule in the deploy repo).

Just a point of process. There have been many suggestions here and on the gerrit patch. I think it is useful for someone to summarize the options so we can actually pick one that seems reasonable to everyone (Parsing, RelEng, SRE, CPT) . Does someone wants to take that on or should I do it?

For everyone's benefit, here is a summary of the three proposals. All options assume Parsoid/PHP will be treated as an extension of MediaWiki for the immediate term.

  1. Merge Parsoid dependencies to the vendor/ repo: gerrit 526262 attempts that.
    • Problem: These dependencies are 7.2+ and merging them will mess up CI for everyone else since they will fail on require 7.2+
    • Solution 1a: Work around problem by fudging php requirements in the wikipeg and zest repos to be php 5.x+ which lets them be merged and not trip up CI for everyone else. Since all those other repos won't use these libraries, there is likely no real issue. As for Parsoid's use, since Parsoid itself requires 7.2+, Parsoid's CI will always get 7.2+ and hence meet the actual php requirement for those repos.

      (See https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/526262#message-0d820a2955b198e2a9cb2f33efe2692db723dbcc for Reedy's suggestion and my +1 of that idea)
    • Solution 1b: Temporarily "inline" the zest and wikipeg repos into the Parsoid repo.

      See T213494#5377746 and James +1s this proposal. See T213494#5378265 where Scott proposes specific changes to Parsoid's extension.json and how to make iths happen.
    • Personal preference: If we go with option 1, I prefer 1a to 1b, but could be convinced to go with 1b.
  1. Continue using Parsoid/JS's default deployment mechanism via scap as an "independent service". But, also deploy Mediawiki to those same servers and have code in MW's CommonSettings.php to inspect presence of Parsoid and if so, load Parsoid as an extension. This could work and the benefit is that it relies on existing Parsoid deployment mechanism. It requires us to also deploy MediaWiki to scandium and add the inspection mechanism to core to trigger Parsoid load.

    See T213494#5382008 and followup for where Kunal suggested this.

Thoughts?

I don't have any huge objections to either option. It sounds like #2 would give you the ability to deploy independently from the train/mediawiki swat deployments at the expense of a little extra work to set up?

I don't have any huge objections to either option. It sounds like #2 would give you the ability to deploy independently from the train/mediawiki swat deployments at the expense of a little extra work to set up?

Indeed .. it would be a sweet deal to be able to preserve that deployment independence.

Option 2 seems preferable to me as well at this point. It seems like there is a compelling reason to not merge the parsoid PHP dependencies into <code>mediawiki/vendor</code> -- continued need for HHVM support in core. That makes option 2 the easiest path forward here it seems.

cscott added a comment.Aug 1 2019, 7:43 PM

I think there's some longer-term planning considerations here:

  1. deploying parsoid separately from the train is nice -- until we start trying to bring Parsoid into production, which we (optimistically) hope will be sometime in Sept. Once we're running on production servers (or even in beta) the ability to independently deploy Parsoid and bring down unrelated services seems more like a bug than a feature. (aka why I don't like option 2)
  1. We hope to put Parsoid into production use in a manner similar to the current production deployment of PHP 7.2. That is, for a limited subset of traffic, or only for certain clients (mobile content service, etc). And our timeframe for this is (again, optimistically) before we expect to be deprecating HHVM in core. So option 1b would allow us to deploy Parsoid into production as an extension on the production PHP 7.2 servers and use the existing X-Wikimedia-Debug etc mechanisms to steer traffic. (aka why I like option 1b)
  1. In theory wikipeg and zest-css are independent libraries. I don't really like lying about their dependencies, nor pushing versions with misleading dependencies to packagist. So we'd be making forked copies with the faked dependencies and probably installing directly from github, not packagist, and then installing those forked copies into the vendor repo. That seems like a mess. (aka why I don't like option 1a)

I think there's some longer-term planning considerations here:

  1. deploying parsoid separately from the train is nice -- until we start trying to bring Parsoid into production, which we (optimistically) hope will be sometime in Sept. Once we're running on production servers (or even in beta) the ability to independently deploy Parsoid and bring down unrelated services seems more like a bug than a feature. (aka why I don't like option 2)

This is the Parsoid/JS scenario and we haven't had this problem because we test before we deploy. Even when we are past this hhvm/php7.2 support scenario and can move to an integrated model without the fudging, MediaWiki can and will be updated independently of Parsoid. So, we will continue to need pre-deploy QA of whatever Parsoid+M/W combination is going to be running in production.

I can buy the argument that we are trying to move to a more integrated model than we have with the Parsoid/JS setup .. just that I am yet to be convinced that this is any more fragile than what we have now with Parsoid/JS.

  1. We hope to put Parsoid into production use in a manner similar to the current production deployment of PHP 7.2. That is, for a limited subset of traffic, or only for certain clients (mobile content service, etc). And our timeframe for this is (again, optimistically) before we expect to be deprecating HHVM in core. So option 1b would allow us to deploy Parsoid into production as an extension on the production PHP 7.2 servers and use the existing X-Wikimedia-Debug etc mechanisms to steer traffic. (aka why I like option 1b)

That is a good argument for this.

  1. In theory wikipeg and zest-css are independent libraries. I don't really like lying about their dependencies, nor pushing versions with misleading dependencies to packagist. So we'd be making forked copies with the faked dependencies and probably installing directly from github, not packagist, and then installing those forked copies into the vendor repo. That seems like a mess. (aka why I don't like option 1a)

Fair enough. We can dump this option.

Tgr added a comment.Aug 2 2019, 10:49 AM

There are three ways to deploy Parsoid:

  1. The way it's done for libraries: add it to mediawiki/vendor's composer.json, add $IP/vendor/wikimedia/parsoid/extension.json to wmf-config/extension-list and wfLoadExtension( 'Parsoid', "$IP/vendor/wikimedia/parsoid/extension.json" ) to CommonSettings.php in the operations/mediawiki-config repo.
  2. The way it's done for extensions: add it to make-wmf-branch (which adds it as a submodule to mediawiki/extensions in wmf branches), add $IP/extensions/Parsoid/extension.json to extension-list, add wfLoadExtension( 'Parsoid' ) to CommonSettings.php.
  3. The way it's done for services: use scap to deploy it to /srv/parsoid, add /srv/parsoid/extension.json to extension-list, add wfLoadExtension( 'Parsoid', "/srv/parsoid/extension.json' ) to CommonSettings.php

Parsoid is currently a service, it is transitioning to being a library, and it pulls in MediaWiki dependencies by pretending it's an extension, so neither of those choices are completely crazy. They have major impact on code updates and deployment:

  • With #1, code updates happen via new Parsoid version releases, and version bumps in vendor (which will be deployed via the train). Urgent code updates happen via backporting all this to the appropriate wmf branch of vendor. (This is how we want it to be in the long term; OTOH a big chore for something that's very actively developed and/or going through a major transition, so I'd probably avoid it for now.)
  • With #2, they happen automatically via the train, and manually via SWATting patches backported to the wmf branch(es).
  • With #3, deploys happen manually via scap. You get full control and you don't have to backport everything; OTOH I'm not sure scap supports staggered rollouts, where group0 wikis get the update a day earlier.

That's the fundamental decision IMO, and how to install the modules largely follows: if you use scap, you can just put them in the deploy repo, if you use composer to install Parsoid itself, you need to cheat with the PHP dependencies anyway, and no reason not to do that for wikipeg/zest either (it could take the form of having dedicated tags, v1.1.0-hhvm or something like that, to avoid breaking things for other downstreams, although I doubt they exist), if you go the extension route, probably copying or submoduling them in the Parsoid repo and using the MediaWiki PSR-4 autoloader is easiest.

cscott added a comment.Aug 3 2019, 3:39 AM

I bet we'd want to go with #3, at least initially. I think we'd prefer not to add Parsoid and it's dependencies to mediawiki/vendor's composer.json (#1) because of the HHVM/PHP 7.2 thing. #2 is attractive, and I suspect we might want to do that once we move past testing and start putting Parsoid into production, but I bet we don't want to jump on the deploy train quite yet.

That leaves the question of how to structure the deploy repo; we have a mild hack where we symlink package.json in the root of the deploy repo to src/package.json one level down; I don't think that will quite work for composer.json and PHP's autoloader. We might need to copy composer.json down a level and edit it to add extra directory prefixes in the "autoload" clause?

Tgr added a comment.Aug 5 2019, 1:12 PM

You could symlink Parsoid's composer.json, run composer on the top level and make sure the autoloader file in the toplevel vendor repo is required in the configuration file. Or have a top-level composer file which uses composer-merge-plugin to include Parsoid, and set the vendor-dir option in the top-level composer file to force it to install inside the Parsoid repo (or wherever). Or you could have a top-level composer file and install Parsoid as a library. That will reintroduce the package management overhead and is probably a bad idea, at least initially; the other two options don't have major benefits or drawbacks that I can think of.

Ok, let us pick a simple solution for now that works for scandium's deployment so we can run tests. We have enough time to figure out the production deployment solution, and there will also be better clarity around state of PHP 7.2 / HHVM support in a few weeks time.

I am going to remove this as a subtask for scandium deployment to reflect this.

ssastry moved this task from Deployment to Post-Port Work on the Parsoid-PHP board.Nov 5 2019, 5:49 PM
ssastry moved this task from Post-Port Work to Deployment on the Parsoid-PHP board.Dec 8 2019, 3:22 AM