Page MenuHomePhabricator

Support Parsoid/PHP in MediaWiki-Vagrant
Open, Needs TriagePublic

Description

VisualEditor and Parsoid/JS must be provisioned manually on Cloud VPS, because the existing Parsoid/JS vagrant role makes strong assumptions about host names, which breaks for public, wmcloud.org virtual hosts. Meanwhile, Parsoid/PHP is already the production backend for VisualEditor, reduces server complexity, saves resources, and is the ideal configuration for most mw-vagrant use cases, because it's the closest to production.

Write a new vagrant role to provision Parsoid/PHP.

Event Timeline

In the future this will be as simple as enabling one or two MediaWiki extensions with no special configuration, but at the moment, and probably for months to come, requires monumental funkiness:

// Hack to support testing Parsoid as an extension, while overriding
// the composer library included with core. (T227352)
$wgParsoidInstallDir = '/srv/parsoid'; // for example
if ( is_dir( $wgParsoidInstallDir ) ) {
	AutoLoader::$psr4Namespaces += [
		// Keep this in sync with the "autoload" clause in
		// $PARSOID_INSTALL_DIR/composer.json
		'Wikimedia\\Parsoid\\' => "$wgParsoidInstallDir/src",
	];
	wfLoadExtension( 'Parsoid', "$wgParsoidInstallDir/extension.json" );
}

Change 616521 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/vagrant@master] [WIP] Role for Parsoid/PHP

https://gerrit.wikimedia.org/r/616521

Just to note, the WIP patch here is good enough for my immediate task in T257322. Merging is stalled on design and "product" decisions, for example can we drop Parsoid/JS already or should the roles coexist for a while?

I'm CC'ing the MediaWiki-Vagrant project, because other people probably want a role that does this.

The patch works flawlessly, but it is best for us to consider how we should be dealt with the puppets that are still requiring Parsoid/JS (see T259988)

In the future this will be as simple as enabling one or two MediaWiki extensions with no special configuration, but at the moment, and probably for months to come, requires monumental funkiness:

I don't actually view this as terribly funky, fwiw. It's a concession to deploy strategies, mostly; it lets us develop fast-changing code in the Parsoid repo w/o having to try to keep multiple repositories (core, VE, and Parsoid) in sync. For the release version of 1.35 it was easy enough to get rid of the requirement to install Parsoid separately and we'll probably do that (or something like it) again for future release branches; it's just not a great way to do our week-to-week deployments.

But that's not really the point of this bug...

Yes, we need to kill the Parsoid/JS config. I don't think anything is actually using it any more, but IIRC the Parsoid/PHP config used to inherit several bits from the Parsoid/JS config so it's probably a slightly annoying task to separate those out again. Here are some WIP patches to do so:

To be clear, I'd rather not lick the cookie on this, *please* take over these patches and this project.

(on the deployment strategy tangent,)

I don't actually view this as terribly funky

The new Parsoid extension is brilliant, requires zero configuration, and I've been able to enable it in several different environments without a hitch, as long as I simply clone the service repo directly into my extensions directory—the funkiness as you point out is only caused by the new Parsoid living in an unrelated repo (outside of the mediawiki-extensions- namespace, bringing additional quirks), but with some overlap in config and class loading.

If there's time left to adjust the deployment strategy, it might be simpler to give the extension a new name, like "WikitextHtml", "WikiToWeb", "WikiConvertor", "HtmlConvertor", "RDFa", ... (I love the idea of saying what the extension is used for, but maybe I'm not quite capturing the main activity well?) The name "Parsoid" has never been very self-explanatory anyway, so nothing to lose here but nostalgic charm? "Parse" is a generic term for almost everything happening inside a computer, and "-oid" is entertainingly a nonsense word which makes its subject even more generic, so I feel like the original title must have been meant as a joke.

On a practical level, if Parsoid 1 and 2 are sitting next to each other in the extension directory then no custom sleight of hand need be performed to switch between them. Site admins just flip configuration that disables one extension and loads the other. Supplemental configuration variables can be prepared in earlier commits, and rollback is a trivial git revert (this is an important property for any deployment, since the potential rollback happens at an unpredictable time and its deployer will likely have no special experience with the broken feature).

Yes, we need to kill the Parsoid/JS config. I don't think anything is actually using it any more

Fantastic! But should we continue to support it in mw-vagrant until MediaWiki 1.34 is end-of-life? For example, in case anyone needs to recreate an extension interaction bug in the old code. I wonder if it's possible to rename the role (inspired by the current deployment strategy), so that anyone with vagrant role "parsoid" enabled will magically get Parsoid/PHP at some point, but can enable a "parsoid-old" role to provision the legacy service? Or, if the extension gets a new name then everything simpler and the new role is named after it.

Here are some WIP patches to do so:

To be clear, I'd rather not lick the cookie on this, *please* take over these patches and this project.

It's looking really good! I left a few thoughts over there, amazing that such an enormous project is wrapping up already. As a consumer of this code, it's been a magical transition so far and I've been blissfully unaware of the heavy lifting happening behind the scenes.

This might be impossible, but now I'm thinking that an even more ideal migration strategy would be if Parsoid 1 and 2 can be enabled at the same time. To switch the consumers over, you would remove the Parsoid 1 configuration.

Parsoid/PHP is a dependency of MediaWiki core and as such included in any Vagrant install. A clearer description of what use cases the vagrant role is intended to support would help figure out what's left to be done here.

I do not know if it is possible to pass the mediawiki system version info to the puppet file, so we can simply to an if statement for to Parsoid configuration.

I'm not sure what if statement that would be. Parsoid/PHP gets automatically installed as part of MediaWiki core if the MediaWiki version is recent enough to support Parsoid/PHP. That's why it was made into a Composer library, to make that kind of dependency management easy.

Do you mean this in CommonSettings.php?

// This is a temporary hack for hooking up Parsoid/PHP with MediaWiki
// This is just the regular check out of parsoid in that week's vendor
$parsoidDir = "$IP/vendor/wikimedia/parsoid";
if ( ( $_SERVER['SERVERGROUP'] ?? null ) === 'parsoid' ) {
	$wgParsoidSettings = [
		'useSelser' => true,
		'linting' => true,
		'nativeGalleryEnabled' => false,  // T214649
	];

	if ( wfHostName() === 'scandium' ) {
		// Scandium has its own special check out of parsoid for testing.
		$parsoidDir = __DIR__ . "/../../parsoid-testing";
		// Override settings specific to round-trip testing on scandium
		require_once "$parsoidDir/tests/RTTestSettings.php";
	}

	wfLoadExtension( 'Parsoid', "$parsoidDir/extension.json" );
}
unset( $parsoidDir );
// End of temporary hack for hooking up Parsoid/PHP with MediaWiki

Change 636552 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/vagrant@master] Separate Parsoid/JS and Parsoid/PHP roles

https://gerrit.wikimedia.org/r/636552

Change 636552 merged by jenkins-bot:
[mediawiki/vagrant@master] Separate Parsoid/JS and Parsoid/PHP roles

https://gerrit.wikimedia.org/r/636552