Page MenuHomePhabricator

Flesh out Parsoid's interface / boundary wrt MediaWiki that lets it operate in standalone mode in the face of increasing MediaWiki integration
Open, MediumPublic

Description

This is a placeholder / tracker task to work through different Parsoid boundaries / interfaces.

As a separate library component that MediaWiki depends on, Parsoid has many identities.

1. Parsoid as a MediaWiki-integrated wikitext engine
In this incarnation, various clients (VE, Flow, DiscussionTools, ContentTranslation, etc.) interact with Parsoid as a MediaWiki component for wikitext functionality. While right now, these clients are interacting with Parsoid via the REST API (and hence HTTP), the plan is to have them invoke Parsoid directly. To transition over clients, we need to flesh out Parsoid's wikitext handling interface. This is probably blocked on various Parser.php refactoring work that will eventually result in an abstracting Parsing interface / class. [ TODO: Tag the various phab tasks ]

2. Parsoid as an extensible MediaWiki component
In this incarnation, Parsoid provides hooks for extensions to extend wikitext markup / functionality. This is more or less adequately handled by the Parsoid Extension API.

3. Parsoid as a standalone wikitext engine
In this incarnation (primarily in development mode), Parsoid exists independent of a MediaWiki installation but can make MediaWiki API requests to any network-accessible wiki to get wiki context -- this is similar to how Parsoid/JS operated. This identity obviously requires Parsoid to not have any dependency on any MediaWiki core classes.

While all of this mostly works right now (today), these different modes of operation make conflicting demands of Parsoid. For example, Parsoid as a standalone wikitext engine is somewhat at odds with Parsoid as a MediaWiki-integrated wikitext engine. Parsoid,php and other config classes in src/Config and src/Core provide a library / calling interface into Parsoid to support Identity 1. However, as we expand the capabilities of the MediaWiki-integrated wikitext engine (to take over all the wikitext functionality currently provided by Parser.php and affiliated components in MediaWiki core), this standalone wikitext engine functioning is going to come under increasing stress because of the need to avoid any hard dependencies on MediaWiki core classes.

Additionally, Parsoid as an extensible MediaWiki component (Identity 2) also adds similar pressures on the standalone mode. Parser extensions have come to rely on a bunch of functionality (via explicit dependencies on some MediaWiki classes like Parser, ParserOutput, and ParserOptions) in MediaWiki core. So, in order to reduce the burden of making all those extensions compatible with Parsoid ( T258838 ), Parsoid's Extension API would need to provide as much compatible functionality as possible. While one obvious option would be for Parsoid to leverage those existing classes, this would then break Identity 3.

So, this requires us to think through and flesh out a suitable boundary / interface that resolves the conflicting demands of Identities 1 & 2 vs Identity 3.

One solution that @cscott has proposed is to create interfaces and/or abstract base classes for Parser, ParserOptions and ParserOutput classes that live in the Parsoid repository and which Parsoid & MediaWiki then implement / extend. We could even consider moving some classes out of MediaWiki into the Parsoid repository as long as they don't drag along other unrelated code with it. In addition, we would have to rely on various other tricks. For example, rely on duck typing in some places. And/or rely on stub classes in Parsoid and rely on bridging functionality in MediaWiki that bridges the stub classes in Parsoid to existing MediaWiki functionality.

Some of these troubles here are not entirely unrelated to the troubles we experienced with Parsoid/JS as a separate service. It is just that now, where we are running all of this code in the same process, the boundary / interface can be a bit more arbitrary without having to write API endpoints or suffer performance hits.

This task exists primarily to figure out some of these details or to figure out at what point we are forced to abandon this separation as being too burdensome. But, this is nevertheless a beneficial exercise since it lets us modularize as much functionality in related classes as possible.

Event Timeline

ssastry triaged this task as Medium priority.Oct 6 2020, 9:15 PM

An alternate way of thinking about this is that we layer Parsoid and that not all of the integrated mediawiki functionality is available in standalone mode and the integrated mode layer sits on top of the standalone layer. So, Parser* classes are only used in the boundary between those two layers.

However, this doesn't solve the conflicts between Parsoid-as-an-extensible-MediaWiki-component and Parsoid-as-standalone-wikitext-engine because extensions would still be referencing ParserOutput and ParserOption objects and interfaces not to mention other MediaWiki classes they might reference.

One option ois to have all of the ParserOutput and ParserOption functionality go through ParsoidExtensionAPI. And, if extensions reference any MediaWiki classes, then that voids the standalone use warranty for those extensions.

As a meta-comment, this phab task seems to be conflating two different things. One of them is abstracting out a number of different interfaces *in core* in a clean way (and even there, we have Parser, ParserOutput, CacheTime, Content, etc), and the other is abstracting runtime modes *in Parsoid* (standalone, integrated, mocked, API testing, etc). The former is the 'Parser API'; the latter is the 'Config API' (well, it's the stuff living in Parsoid\Config namespace right now).

Also related: the 'zero parsers in core' proposal (T114194), which is an endpoint of the Parser interface issue: once we have a good parser API, in theory only the API needs to live in core, not the parser itself. In that discussion, we also distinguished 'parser' from 'template engine'.