Page MenuHomePhabricator

Uncached VisualEditor w/ Parsoid/PHP (no JS, no RESTBase) for MW 1.35 LTS
Closed, ResolvedPublic

Description

For the next MW LTS release we'd like to have a VisualEditor work "out of the box" with just PHP code on the server. This won't be the setup we use in production (except perhaps for wikitech, see T241961) but will be a reasonable set up for small wikis.

This is a tracker bug for the work required.

FIXME: create subtasks. Probably at least: (1) updates to VirtualRestService code in MediaWiki core to have VE talk to Parsoid via the internal library API instead of a HTTP API, and (2) documentation updates.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 23 2020, 6:55 PM
JTannerWMF moved this task from To Triage to Triaged on the VisualEditor board.Mar 24 2020, 3:04 PM
JTannerWMF added a subscriber: JTannerWMF.

We are assuming the Parsing team is going to work on this

We are assuming the Parsing team is going to work on this

We should talk! :-)

So far, this is what I've been using,

wfLoadExtension( 'Parsoid', 'vendor/wikimedia/parsoid/extension.json' );
$wgEnableRestAPI = true;
$wgParsoidSettings = [];
$wgVirtualRestConfig['modules']['parsoid'] = array(
	'url' => 'http://localhost/rest.php',
	'domain' => 'localhost',
);

So far, this is what I've been using,

wfLoadExtension( 'Parsoid', 'vendor/wikimedia/parsoid/extension.json' );
$wgEnableRestAPI = true;
$wgParsoidSettings = [];
$wgVirtualRestConfig['modules']['parsoid'] = array(
	'url' => 'http://localhost/rest.php',
	'domain' => 'localhost',
);

This still calls the REST API. The plan here is to directly call the library interface provided by src/Parsoid.php if wikis might not want to expose their wiki's Parsoid REST API to the world.

Yes, that's what it says in the description. I was just providing my config in case anyone wanted to play around until this is resolved.

This is @matmarex's configuration: https://github.com/MatmaRex/patchdemo/blob/master/LocalSettings.txt#L15-L27
Very similar.

As @subbu said, the goal is to do this w/o the localhost loopback, probably by adding a new VRS endpoint.
Ideally we'd also make this the "default" configuration for VE, so VE setup can be as simple as possible.

One of the issues flagged by the VE team was stashing/switching support. Needs investigation to see what needs to be done/could be done here. If switching doesn't work reliably w/o RESTBase, we should turn it off in the default configuration, so that whatever we ship in the LTS "just works", even if it doesn't have 100% of features.

One of the issues flagged by the VE team was stashing/switching support. Needs investigation to see what needs to be done/could be done here. If switching doesn't work reliably w/o RESTBase, we should turn it off in the default configuration, so that whatever we ship in the LTS "just works", even if it doesn't have 100% of features.

On officewiki where VE runs with RESTBase, you can switch between VE and source edit versions ... so, at least some of that should already work?

(Sorry, I forgot to comment about that yesterday.)

Everything in VE works without RESTBase and its caching, including switching editors.

The only limitation is that if you save the page after switching from WTE to VE, the diff may be dirty, since the Parsoid document that was generated while switching is not cached, so it's not available to Parsoid for selser (at least I think that's how this works).

We have a config option $wgVisualEditorAllowLossySwitching=false to disable switching in this case and avoid the dirty diffs. I wouldn't mind making that the default if you think that's better, although I feel like on most small wikis dirty diffs are not a big problem (since either all the users know and trust each other and don't review diffs, or there are so few edits that reviewing a couple of annoying diffs is not a big deal). This is documented here: https://www.mediawiki.org/wiki/Extension:VisualEditor#Switching_between_Wikitext_Editing_and_VisualEditor

We could probably solve this limitation as well if there was a way to send the "original" document back to Parsoid, along with the modified document we're asking it to convert to wikitext.

(Sorry, I forgot to comment about that yesterday.)

Everything in VE works without RESTBase and its caching, including switching editors.

Great! That was my impression as well.

We have a config option $wgVisualEditorAllowLossySwitching=false to disable switching in this case and avoid the dirty diffs. I wouldn't mind making that the default if you think that's better, although I feel like on most small wikis dirty diffs are not a big problem (since either all the users know and trust each other and don't review diffs, or there are so few edits that reviewing a couple of annoying diffs is not a big deal).

Nah, I think it is good as it is. I don't think dirty diffs are a problem for small wikis. And those who care can flip the config value and disable switching.

We could probably solve this limitation as well if there was a way to send the "original" document back to Parsoid, along with the modified document we're asking it to convert to wikitext.

Let us solve this if it becomes necessary. :-)

Current docs are for Parsoid configured so that Visual Editor talks to it via a network loopback connection (instead of directly). Configuration info is at https://www.mediawiki.org/wiki/Parsoid/PHP, although it needs a bit of polish.

Plan for LTS is to do direct communication w/ Parsoid instead of looping back through the external REST API.

Florian added a subscriber: Florian.Apr 2 2020, 4:32 PM
Jdforrester-WMF triaged this task as High priority.Jun 2 2020, 6:05 PM

Deadline for this is a few weeks away now.

Current docs are for Parsoid configured so that Visual Editor talks to it via a network loopback connection (instead of directly). Configuration info is at https://www.mediawiki.org/wiki/Parsoid/PHP, although it needs a bit of polish.

Plan for LTS is to do direct communication w/ Parsoid instead of looping back through the external REST API.

I think it would be fine to just keep using the loopback connection approach. It's tried and tested by now. It is a bit annoying that you have to write 20+ lines of code to make HTTP requests instead of 1 line to do a function call, but that can be solved with a helper function or two. I've started leaning towards this even more when I learned that Parsoid messes with the GC before PHP 7.3 (T230861), doing the parsing in a separate request/process ensures that this doesn't interfere with anything else.

Change 610156 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[operations/mediawiki-config@master] Explicitly set visualeditor-enable to 0

https://gerrit.wikimedia.org/r/610156

Change 610157 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@master] Enable VisualEditor by default

https://gerrit.wikimedia.org/r/610157

Change 610158 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@master] Zero-configuration VisualEditor + PHP for MediaWiki LTS

https://gerrit.wikimedia.org/r/610158

Change 610146 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] DRY out default ParsoidSettings into a single location

https://gerrit.wikimedia.org/r/610146

Change 610146 merged by jenkins-bot:
[mediawiki/services/parsoid@master] DRY out default ParsoidSettings into a single location

https://gerrit.wikimedia.org/r/610146

Change 610176 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Always enable the Parsoid extension in new installs

https://gerrit.wikimedia.org/r/610176

There are some patches on gerrit now: https://gerrit.wikimedia.org/r/q/topic:%22Zeroconf+VE%22+(status:open%20OR%20status:merged)

Getting the dependency between VisualEditor and Parsoid was tricky. When @Jdforrester-WMF, @ssastry and I discussed this earlier, our plan was to do a "magical" hook between VE registration and Parsoid, so that Parsoid would be loaded automatically from vendor/wikimedia/parsoid if it wasn't already loaded when VE started up. I tried that approach, but ran into problems (T257375).

The current approach is a little uglier but simpler: we just add a line to the LocalSettings.php generated by WebInstaller to load Parsoid from vendor/wikimedia/parsoid. This automatically-loaded Parsoid isn't actually active unless $wgEnableRestAPI is set.
Then we still add a little bit of magic code in VisualEditor to turn on $wgEnableRestAPI and set up $wgVirtualRestConfig when needed.

Last thing needed is to default visualeditor-enable to 1, so that third-party wikis see VisualEditor as soon as they choose to load the VisualEditor extension.

@Jdforrester-WMF is still on the hook for adding VisualEditor to the default download bundle. He noted that we should also add Extension:TemplateData as well.

Change 610261 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/core@master] Move the Parsoid "extension" into MediaWiki core

https://gerrit.wikimedia.org/r/610261

There are some patches on gerrit now: https://gerrit.wikimedia.org/r/q/topic:%22Zeroconf+VE%22+(status:open%20OR%20status:merged)

Getting the dependency between VisualEditor and Parsoid was tricky. When @Jdforrester-WMF, @ssastry and I discussed this earlier, our plan was to do a "magical" hook between VE registration and Parsoid, so that Parsoid would be loaded automatically from vendor/wikimedia/parsoid if it wasn't already loaded when VE started up. I tried that approach, but ran into problems (T257375).

Continuing to load the Parsoid "extension" from vendor/ is a bad idea, and we need to move it into core sooner or later...might as well do it now. Not including your change today, there hasn't been any substantial changes to extension/ in over a month (in fact, the deprecation changes would've been fixed earlier and likely by someone else had it been in core already!).

My patch copies over the folder and adjusts a minimal amount of things for it to work. My patch has a $wgEnableParsoidRoutes, but that's mostly for a placeholder right now.

The current approach is a little uglier but simpler: we just add a line to the LocalSettings.php generated by WebInstaller to load Parsoid from vendor/wikimedia/parsoid. This automatically-loaded Parsoid isn't actually active unless $wgEnableRestAPI is set.

I believe my patch has alleviated the need for your LocalSettings.php patch (which I wasn't a huge fan of because it will require more work from sysadmins in the future when the "extension" is moved). I think we should have a core setting like $wgEnableBasicVisualEditor (maybe not basic, but auto-configured for ideal defaults...simple? idk) that in Setup.php will set $wgEnableRestAPI = true; and configures the VRS stuff and rest.php routes.

Then we still add a little bit of magic code in VisualEditor to turn on $wgEnableRestAPI and set up $wgVirtualRestConfig when needed.

Extensions shouldn't be changing core settings. I think it is reasonable to ask people to write wfLoadExtension( 'VisualEditor' ); $wgEnableBasicVisualEditor = true; I would be in favor of the installer setting $wgEnableBasicVisualEditor if the VE extension is selected...maybe we auto-select VE too?

Last thing needed is to default visualeditor-enable to 1, so that third-party wikis see VisualEditor as soon as they choose to load the VisualEditor extension.

+1

The current approach is a little uglier but simpler: we just add a line to the LocalSettings.php generated by WebInstaller to load Parsoid from vendor/wikimedia/parsoid. This automatically-loaded Parsoid isn't actually active unless $wgEnableRestAPI is set.

I believe my patch has alleviated the need for your LocalSettings.php patch (which I wasn't a huge fan of because it will require more work from sysadmins in the future when the "extension" is moved). I think we should have a core setting like $wgEnableBasicVisualEditor (maybe not basic, but auto-configured for ideal defaults...simple? idk) that in Setup.php will set $wgEnableRestAPI = true; and configures the VRS stuff and rest.php routes.

Then we still add a little bit of magic code in VisualEditor to turn on $wgEnableRestAPI and set up $wgVirtualRestConfig when needed.

Extensions shouldn't be changing core settings. I think it is reasonable to ask people to write wfLoadExtension( 'VisualEditor' ); $wgEnableBasicVisualEditor = true; I would be in favor of the installer setting $wgEnableBasicVisualEditor if the VE extension is selected...maybe we auto-select VE too?

Let's put the burden on those most able to help themselves; out-of-the-box, VE should have wgEnableBasicVisualEditor (or whatever) as the default behaviour, and people with ultra-special Parsoid set-ups (like Wikimedia) can disable it, instead.

Let's put the burden on those most able to help themselves; out-of-the-box, VE should have wgEnableBasicVisualEditor (or whatever) as the default behaviour, and people with ultra-special Parsoid set-ups (like Wikimedia) can disable it, instead.

I'm on board with that. I'll double check but I think then we can just enable all the rest.php routes unconditionally and have $wgEnableBasicVisualEditor just control $wgEnableRestAPI and $wgVirtualRestConfig...patch coming.

Change 610272 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/core@master] [WIP] Introduce $wgAutoconfigureParsoid

https://gerrit.wikimedia.org/r/610272

Let's put the burden on those most able to help themselves; out-of-the-box, VE should have wgEnableBasicVisualEditor (or whatever) as the default behaviour, and people with ultra-special Parsoid set-ups (like Wikimedia) can disable it, instead.

I'm on board with that. I'll double check but I think then we can just enable all the rest.php routes unconditionally and have $wgEnableBasicVisualEditor just control $wgEnableRestAPI and $wgVirtualRestConfig...patch coming.

If we have $wgAutoconfigureParsoid = true; by default, which sets $wgEnableRestAPI = true; we're effectively enabling the Rest API by default, which leads to: why is the Rest API disabled by default?

cscott added a comment.Jul 8 2020, 1:58 PM

The point is that the rest routes are going away. We *don't* actually want to include them by default in the LTS, because they are marked experimental and will go away.

Similarly, we're not ready to move everything to core, especially a few days before the LTS. That will happen, but let's not rush it. (In particular, part of this is because the REST API is not how Parsoid is going to be brought into core; some of the SiteConfig stuff will be brought into core, but the Parser.php class is going to split into an abstract superclass and Parsoid is going to extend that.)

I agree that having a LocalSettings.php hook isn't ideal, but I think we could protect that with an 'if exists' hook so that lone line sitting in LocalSettings.php won't be harmful long term.

cscott added a comment.Jul 8 2020, 2:10 PM

(Also, it should be noted, Parsoid hasn't been through a security review for exposure on public facing interfaces. The REST API of Parsoid is exposed only on our internal network. We were originally hoping to have Parsoid hooked up directly to avoid the need for REST loopback for LTS, but exposing REST is the compromise we came up with. Again, it's not a supported long-term configuration, but the zeroconf configuration (with the fixes @Legoktm identified to LocalSettingsGenerator) should continue to work in 1.36ish when the REST API is replaced by a direct integration with the new ParserCache and Parser.php refactor, etc.)

cscott added a comment.Jul 8 2020, 2:26 PM

Continuing to load the Parsoid "extension" from vendor/ is a bad idea, and we need to move it into core sooner or later...might as well do it now. Not including your change today, there hasn't been any substantial changes to extension/ in over a month (in fact, the deprecation changes would've been fixed earlier and likely by someone else had it been in core already!).

Not to belabor the point, but the reason extension/ hasn't been changed recently is (a) I've been busy with other work and (b) we're blocked on CPT. Indeed there's a pretty substantial refactoring sitting in gerrit waiting for me to complete it ( https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/605687 ).

cscott added a comment.EditedJul 8 2020, 2:38 PM

Some other alternatives that are less disruptive than moving Parsoid into core and enabling the rest API and Parsoid's REST api by default:

  1. Adding a symlink from extensions/Parsoid to vendor/wikimedia/parsoid in the tarball, and having VisualEditor requires: "Parsoid" in extension.json. I've tested this and it works fine in the installed, but there's a bad side effect: VE breaks in production, because there we have VE installed but Parsoid is *not* installed on the main cluster; we could finesse this by hacking the dependency into the installer somehow, or adding the "requires" only on the 1.35 release branch.
  1. Tweaking the extension registry to allow dynamic-depends (T257375: Extensions can't dynamically load other extensions), then the ugliness is contained in VisualEditor's callback function.
  1. Adding the wfLoadExtension( 'Parsoid', ...) to Setup.php (instead of LocalSettings). Also needs a bit of care to avoid breaking production.

Also, @matmarex's comment should also be noted:

I think it would be fine to just keep using the loopback connection approach. It's tried and tested by now. It is a bit annoying that you have to write 20+ lines of code to make HTTP requests instead of 1 line to do a function call, but that can be solved with a helper function or two. I've started leaning towards this even more when I learned that Parsoid messes with the GC before PHP 7.3 (T230861), doing the parsing in a separate request/process ensures that this doesn't interfere with anything else.

Enabling parsoid routes by default disables GC when processing those routes. That's (marginally) appropriate when you're running VE and Parsoid is being invoked by a loopback (the GC disabling only occurs for the PHP process which is doing the parsoid request), but probably not code we want to unilaterally enable on a public interface for all installers.

Although T257462: Parsoid should not disable GC unless a config var is set.

Some other alternatives that are less disruptive than moving Parsoid into core and enabling the rest API and Parsoid's REST api by default:

  1. Adding a symlink from extensions/Parsoid to vendor/wikimedia/parsoid in the tarball, and having VisualEditor requires: "Parsoid" in extension.json. I've tested this and it works fine in the installed, but there's a bad side effect: VE breaks in production, because there we have VE installed but Parsoid is *not* installed on the main cluster; we could finesse this by hacking the dependency into the installer somehow, or adding the "requires" only on the 1.35 release branch.

It's not technically possible to include symlinks in the tarball, it breaks it for Windows users.

  1. Tweaking the extension registry to allow dynamic-depends (T257375: Extensions can't dynamically load other extensions), then the ugliness is contained in VisualEditor's callback function.

I'd rather not, mostly because I've declined this request from other people, and it breaks the caching model, and I don't want to figure out how to work around that.

  1. Adding the wfLoadExtension( 'Parsoid', ...) to Setup.php (instead of LocalSettings). Also needs a bit of care to avoid breaking production.

I'm not against this, but it doesn't really fix the problem of shipping MediaWiki code in /vendor.

The point is that the rest routes are going away. We *don't* actually want to include them by default in the LTS, because they are marked experimental and will go away.

Ack.

Similarly, we're not ready to move everything to core, especially a few days before the LTS. That will happen, but let's not rush it. (In particular, part of this is because the REST API is not how Parsoid is going to be brought into core; some of the SiteConfig stuff will be brought into core, but the Parser.php class is going to split into an abstract superclass and Parsoid is going to extend that.)

I think tagging everything with @unstable would address that, but I do understand the difficulties trying to do refactors across two repositories and keeping it all in sync. If we ultimately conclude that moving the MediaWiki-integrated code into core/VE isn't going to happen, I'd like to at least see the various registration bits move.

I agree that having a LocalSettings.php hook isn't ideal, but I think we could protect that with an 'if exists' hook so that lone line sitting in LocalSettings.php won't be harmful long term.

I'd rather put the code in Setup.php (or in VisualEditor's callback), and introduce a $wgEnableParsoidLoopback or something rather than having function calls/conditional code in LocalSettings.php. It gives us a much better shot at DWIM in the future with minimal work on sysadmins' side.

Summary of yesterday's sync up with cscott, James, Subbu and myself:

  • This will be a breaking change for wikis that previously used Parsoid/JS, they will need to change their configuration.
  • Our goal is to have wfLoadExtension( 'VisualEditor' ) do all the right things to make it just work out of the box. It is a priority to have a simple configuration that continues working across major upgrades.
  • We're going with the loopback approach for 1.35, but this is not the plan long-term when proper MW core integration happens.
    • This should mitigate any gc_disable() side-effects. We're likely going to have this as-is for rc.0, but we can add a setting to make it configurable depending upon feedback
  • The MWParsoid "extension" will live in VisualEditor for now, potentially with renamed classes to avoid conflicts. This again is not the long-term plan when proper MW core integration happens.
    • The REST API routes and others need to be disablable in configuration (though it can default to on) to avoid exposing them in Wikimedia production.
  • Open question: can we enable the REST API by default? T257628: Set $wgEnableRestAPI = true; by default;

Change 610176 abandoned by C. Scott Ananian:
[mediawiki/core@master] Always enable the Parsoid extension in new installs

Reason:
Abandoned in favor of Ic63ce40f59c4be8f4fdc5f9ac17798353fc86866

https://gerrit.wikimedia.org/r/610176

Change 610261 abandoned by C. Scott Ananian:
[mediawiki/core@master] Move the Parsoid "extension" into MediaWiki core

Reason:
Abandoned in favor of Ic63ce40f59c4be8f4fdc5f9ac17798353fc86866 (same idea, different repo).

https://gerrit.wikimedia.org/r/610261

Change 610156 merged by jenkins-bot:
[operations/mediawiki-config@master] VisualEditor: Explicitly set visualeditor-enable to 0 when non-default

https://gerrit.wikimedia.org/r/610156

Mentioned in SAL (#wikimedia-operations) [2020-07-13T08:57:16Z] <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: T248343 Explicitly set visualeditor-enable to 0 when non-default (duration: 00m 57s)

Change 610157 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Enable VisualEditor by default

https://gerrit.wikimedia.org/r/610157

Change 610272 abandoned by Jforrester:
[mediawiki/core@master] [WIP] Introduce $wgAutoconfigureParsoid

Reason:

https://gerrit.wikimedia.org/r/610272

Change 612389 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a20

https://gerrit.wikimedia.org/r/612389

Change 612392 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[operations/mediawiki-config@master] Don't use the 'zeroconf' configuration for VisualEditor

https://gerrit.wikimedia.org/r/612392

Change 612392 merged by jenkins-bot:
[operations/mediawiki-config@master] Don't use the 'zeroconf' configuration for VisualEditor

https://gerrit.wikimedia.org/r/612392

Mentioned in SAL (#wikimedia-operations) [2020-07-13T18:53:25Z] <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: T248343 Don't use the 'zeroconf' configuration for VisualEditor (duration: 00m 55s)

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJul 13 2020, 6:59 PM

Change 610158 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Zero-configuration VisualEditor + PHP for MediaWiki LTS

https://gerrit.wikimedia.org/r/610158

Change 612389 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a20

https://gerrit.wikimedia.org/r/612389

Doesn't work (yet) in Flow: T260648: Zeroconf Parsoid/PHP seems to be considered as unconfigured by Flow Utils. Perhaps ContentTranslation etc also have similar problems. None of these are bundled with the LTS, though, are they? In some sense this is just a documentation error re: setup.

Nothing using Parsoid other than VE ships in the tarball, that's correct. Would be nice to fix, but not a crisis.