This is a placeholder / tracker task to work through different Parsoid boundaries / interfaces.
As a separate library component that MediaWiki depends on, Parsoid has many identities.
1. Parsoid as a MediaWiki-integrated wikitext engine
In this incarnation, various clients (VE, Flow, DiscussionTools, ContentTranslation, etc.) interact with Parsoid as a MediaWiki component for wikitext functionality. While right now, these clients are interacting with Parsoid via the REST API (and hence HTTP), the plan is to have them invoke Parsoid directly. To transition over clients, we need to flesh out Parsoid's wikitext handling interface. This is probably blocked on various Parser.php refactoring work that will eventually result in an abstracting Parsing interface / class. [ TODO: Tag the various phab tasks ]
2. Parsoid as an extensible MediaWiki component
In this incarnation, Parsoid provides hooks for extensions to extend wikitext markup / functionality. This is more or less adequately handled by the Parsoid Extension API.
3. Parsoid as a standalone wikitext engine
In this incarnation (primarily in development mode), Parsoid exists independent of a MediaWiki installation but can make MediaWiki API requests to any network-accessible wiki to get wiki context -- this is similar to how Parsoid/JS operated. This identity obviously requires Parsoid to not have any dependency on any MediaWiki core classes.
While all of this mostly works right now (today), these different modes of operation make conflicting demands of Parsoid. For example, Parsoid as a standalone wikitext engine is somewhat at odds with Parsoid as a MediaWiki-integrated wikitext engine. Parsoid,php and other config classes in src/Config and src/Core provide a library / calling interface into Parsoid to support Identity 1. However, as we expand the capabilities of the MediaWiki-integrated wikitext engine (to take over all the wikitext functionality currently provided by Parser.php and affiliated components in MediaWiki core), this standalone wikitext engine functioning is going to come under increasing stress because of the need to avoid any hard dependencies on MediaWiki core classes.
Additionally, Parsoid as an extensible MediaWiki component (Identity 2) also adds similar pressures on the standalone mode. Parser extensions have come to rely on a bunch of functionality (via explicit dependencies on some MediaWiki classes like Parser, ParserOutput, and ParserOptions) in MediaWiki core. So, in order to reduce the burden of making all those extensions compatible with Parsoid ( T258838 ), Parsoid's Extension API would need to provide as much compatible functionality as possible. While one obvious option would be for Parsoid to leverage those existing classes, this would then break Identity 3.
So, this requires us to think through and flesh out a suitable boundary / interface that resolves the conflicting demands of Identities 1 & 2 vs Identity 3.
One solution that @cscott has proposed is to create interfaces and/or abstract base classes for Parser, ParserOptions and ParserOutput classes that live in the Parsoid repository and which Parsoid & MediaWiki then implement / extend. We could even consider moving some classes out of MediaWiki into the Parsoid repository as long as they don't drag along other unrelated code with it. In addition, we would have to rely on various other tricks. For example, rely on duck typing in some places. And/or rely on stub classes in Parsoid and rely on bridging functionality in MediaWiki that bridges the stub classes in Parsoid to existing MediaWiki functionality.
Some of these troubles here are not entirely unrelated to the troubles we experienced with Parsoid/JS as a separate service. It is just that now, where we are running all of this code in the same process, the boundary / interface can be a bit more arbitrary without having to write API endpoints or suffer performance hits.
This task exists primarily to figure out some of these details or to figure out at what point we are forced to abandon this separation as being too burdensome. But, this is nevertheless a beneficial exercise since it lets us modularize as much functionality in related classes as possible.