Page MenuHomePhabricator

Adopt a stable interface policy refactor
Open, MediumPublic

Description

Review of https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/900380 renewed some debate about proxying utils for extensions.

@ssastry summarized,

TLDR of discussion: we want to move PHPUtils and DOMUtils to the src/Core/Utils namespace and make them directly usable. Everything else in src/Utils should be proxied through src/Ext.

@cscott adds,

Core and Ext are the only two namespaces that should be used outside parsoid, and everything in them is @stable unless otherwise marked; everything outside those two namespaces is @internal (and shouldn't be marked otherwise).

Event Timeline

Arlolra triaged this task as Medium priority.Fri, Mar 17, 10:10 PM
Arlolra moved this task from Needs Triage to Tech Debt / Big changes on the Parsoid board.

Core and Ext are the only two namespaces that should be used outside parsoid, and everything in them is @stable unless otherwise marked; everything outside those two namespaces is @internal (and shouldn't be marked otherwise)."

I can understand this guideline for extensions but I will note that this might be hard guideline to follow for core because a lot of things in Parsoid are used in core. Here is a list of all Parsoid classes already used in Core along with use counts:

21 use Wikimedia\Parsoid\Parsoid;

10 use Wikimedia\Parsoid\ParserTests\TestMode as ParserTestMode;
10 use Wikimedia\Parsoid\ParserTests\Test as ParserTest;
 4 use Wikimedia\Parsoid\ParserTests\TestFileReader;
 1 use Wikimedia\Parsoid\ParserTests\StyleTag as ParsoidStyleTag;
 1 use Wikimedia\Parsoid\ParserTests\RawHTML as ParsoidRawHTML;
 1 use Wikimedia\Parsoid\ParserTests\ParserHook as ParsoidParserHook;
 1 use Wikimedia\Parsoid\ParserTests\Article as ParserTestArticle;

16 use Wikimedia\Parsoid\Core\PageBundle;
11 use Wikimedia\Parsoid\Core\ClientError;
10 use Wikimedia\Parsoid\Core\TOCData;
 9 use Wikimedia\Parsoid\Core\ResourceLimitExceededException;
 6 use Wikimedia\Parsoid\Core\SectionMetadata;
 5 use Wikimedia\Parsoid\Core\SelserData;
 4 use Wikimedia\Parsoid\Core\ContentMetadataCollector;
 1 use Wikimedia\Parsoid\Core\ContentMetadataCollectorCompat;

 9 use Wikimedia\Parsoid\Config\SiteConfig;
 7 use Wikimedia\Parsoid\Config\PageConfig;
 5 use Wikimedia\Parsoid\Config\DataAccess;
 3 use Wikimedia\Parsoid\Config\PageContent as IPageContent;
 2 use Wikimedia\Parsoid\Config\PageConfig as IPageConfig;
 1 use Wikimedia\Parsoid\Config\SiteConfig as ISiteConfig;
 1 use Wikimedia\Parsoid\Config\PageConfigFactory;
 1 use Wikimedia\Parsoid\Config\DataAccess as IDataAccess;
 1 use Wikimedia\Parsoid\Config\Api\SiteConfig as ApiSiteConfig;
 1 use Wikimedia\Parsoid\Config\Api\PageConfig as ApiPageConfig;
 1 use Wikimedia\Parsoid\Config\Api\DataAccess as ApiDataAccess;

 9 use Wikimedia\Parsoid\Utils\ContentUtils;
 6 use Wikimedia\Parsoid\Utils\DOMUtils;
 4 use Wikimedia\Parsoid\Utils\DOMCompat;
 2 use Wikimedia\Parsoid\Utils\Timing;
 1 use Wikimedia\Parsoid\Utils\Utils;
 1 use Wikimedia\Parsoid\Utils\ScriptUtils;
 1 use Wikimedia\Parsoid\Utils\DOMDataUtils;

 5 use Wikimedia\Parsoid\DOM\Document;
 1 use Wikimedia\Parsoid\DOM\Element;

 1 use Wikimedia\Parsoid\Mocks\MockMetrics;

 1 use Wikimedia\Parsoid\Ext\ExtensionModule;

One answer is to say many of those namespaces should be embedded in Core and that other utils should all get proxied through Ext/ and that is possible, but let us think this through a bit. We can definitely mandate namespaces as entirely internal-Parsoid that core shouldn't have any access to, for example (Wt2html, Html2wt, Tokens, NodeData, Logger). But, I expect (DOM, Config, Wikitext, Language, ParserTests, Utils) will have classes that will be needed in m/w core in some form.

My interpretation would be that @internal is acceptable to use in core, but maybe we should be a little more precise, because it's not clear that we want to use *all* of this stuff indiscriminately in core -- at the very least we ought to mark the things used in core so that we don't 'accidentally' break core by changing them in Parsoid.

Might be worth distinguishing three categories of Parsoid APIs; I'm going to make up @tags for them:

  • For use by Extensions: @stable-ext
  • For use by core: @internal-core
  • For use by parsertests @internal-tests
  • For use by the REST API @internal-rest

The first category is closest to @stable in core's Stable Interface Policy, since there are third parties involved so deprecation/waiting a release cycle/etc is probably warranted.

The second and third categories are more like @internal in core's Stable Interface Policy -- they can be changed without extensive deprecation etc, but I think we still want a little bit of control/enforcement over what exactly is used here to reduce the possibility of unintentional breakage in production.

The last category is perhaps the hardest, since historically the REST API was part of Parsoid, not core, and might want more internal access to Parsoid that otherwise warranted. Daniel has talked about pulling this API into an extension, though. It seems like this *might* warrant a separate marking, since methods exported for use by the REST API may not be appropriate for general third party use.

Going through subbu's list:

The *\Config\* stuff is obviously meant to be used by core, but probably not meant to be used by extensions except possibly via the MediaWiki service which supplies it. (But maybe not even then.) @internal-core

The *\DOM\* stuff is similarly meant to be used, so should probably just be added to the stable interface policy. @stable

There's a bunch of stuff in ParserTests which is intended to be used *in dev mode only* (ie for running tests) but not for production code. I'm not sure how to enforce that, exactly. We used to keep the parser test stuff in the composer -dev path, but that didn't work with how CI was running tests for some reason. Needs investigation. @internal-tests

The troublesome bits are the stuff in *\Utils\*. I can't tell if that's more incidental use in parser tests (which is /probably/ ok?) or more subtle dependencies between core and Parsoid (perhaps in the REST API?). I think we'd already talked about moving DOMUtils/DOMCompat to Core (or to a separate library). But the other stuff probably needs closer examination.

Just as a quick add-on -- I think the real blocking child task here is to configure CI (phan?) to enforce some of these guidelines. If we want to have fine-grained stability markers ("some methods in Util are @stable and some are @internal") instead of the big-hammer "everything in this namespace is @stable") then we need some tooling support.