(BIKESHED WARNING: ParsoidOptions is very similar to ParserOptions and easily visually confused. Bikeshed a better name. T287184 proposes ParserContext or ContentContext for a similar class.)
ParserOptions contains *most* of the options affecting a wikitext parse. Unfortunately, it has some annoying issues:
- ParserOption's dual function is "cache key for ParserCache" -- although not all of its fields are part of the cache key! So there are fields which "affect the output" and other fields which are expected to be effectively constant for a given wiki (like templateCallback or currentRevisionRecordCallback) and so aren't included in the cache key even though they affect the parse. It would be better to split the "site configuration" aspects from the "parser configuration" aspects.
- Some important properties affecting the parse are not included in ParserOptions but instead are part of the Parser object or even passed as arguments to Parser::parse. Most obvious among these are Title, Revision, and User. (User is even stored twice! There's Parser::$mUser which is used unless it is null, in which case ParserOptions::getUser() is used!)
- Many of the more obscure properties of ParserOptions are actually "pass through" data. The parser doesn't actually need to know these, instead they are used by specific parser function implementations or extension. For example, {{CURRENTDAY}} uses ParserOptions::getTimestamp(), {{PAGEID}} uses ParserOptions::getSpeculativePageId(), etc. These aren't part of the cache key *or* used by the parser itself, instead they are just "pass through" to a specific extension or parser function implementation.
- As alluded above, currentRevisionRecordCallback and friends are a somewhat orthogonal interface which is about configuring how page titles are mapped into revision records at a site level. It has a little bit to do with Parser configuration, but is mostly out of place in ParserOptions.
- Title, Revision, and User objects are very complicated and can't be effectively extracted (at this time) from the mediawiki codebase. Parsoid must access only primitive values associated with these objects, for example the title as a prefixed string.
Parsoid contains a somewhat orthogonal organization of parser-affecting state, which for historical reasons was mostly influenced by the way in which various bits of the state were exported via the action API. In parsoid we have SiteConfig (site properties, do not vary per-parse), PageConfig (title, user, and revision information; the most direct analog to the legacy ParserOptions), and DataConfig (maps titles and revisions to article text, most similar to the currentRevisionRecordCallback functions from ParserOptions).
It is probably futile to completely replace ParserOptions with PageConfig/SiteConfig/DataConfig. In particular, a lot of legacy code is depending on the pass-through nature of ParserOptions to get access to obscure wiki details, and the fact that ParserOptions is tightly coupled as the key to ParserCache complicates a potential replacement further.
This task proposes to expose a subset of ParserOptions to Parsoid using a similar approach to that suggested in T287216 for ParserOutput:
- Create a new ParsoidOptions interface in Parsoid. Initially it can be completely empty. We will add only those methods to ParsoidOptions which allow access to information directly used by Parsoid. Methods which are only pass-through for parser functions/extensions will remain in ParserOptions and not be copied to ParsoidOptions.
- Core's ParserOptions will implement Parsoid's ParsoidOptions. Parsoid will have access to a ParsoidOptions object and can pass it to hooks, to parser functions, etc. When running in integrated mode this will be a full ParserOptions object. When running standalone this will initially be a small wrapper around PageConfig and friends. (Core ParserOptions could actually *extend* ParsoidOptions instead of *implement* it; that is, ParsoidOptions could be an abstract class, not an interface, since ParserOptions does not currently have a superclass in core. But I think it's safer to make ParsoidOptions an interface for now.)
- Future work: move Title and Revision and User from Parser to ParserOptions. These wouldn't be directly exposed to ParsoidOptions, however. Instead we'd move Parsoid's PageConfig methods (which return primitive types, not Title/Revision/User objects) to ParsoidOptions. See below.
interface ParsoidOptions { // currently PageConfig::getTitle() function getTitlePrefixedText(): string; // currently PageConfig::getNs() function getTitleNamespace(): string; // currently PageConfig::getPageId() function getPageId(): int; function getRevisionId(): ?int; } class ParserOptions implements ParsoidOptions { // Parsoid doesn’t have access to a full Title object // That’s a core-only concept. function getTitle(): Title { return $this->title; } // These function aren’t needed by core, since it has // direct access to the Title object, but they // implements the ParsoidOptions interface to allow // Parsoid to get at the Title details. function getTitlePrefixedText(): string { return $this->title->getPrefixedText(); } function getTitleNamespace(): string { return $this->title->getNamespace(); } … // similarly, Parsoid doesn’t have access to a full // Revision object, only core does in ParserOptions function getRevision(): Revision { return $this->revision; } // But these functions from ParsoidOptions are what // Parsoid uses (currrently PageConfig::getRevisionId()) function getRevisionId(): ?int { return $this->revision ? $this->revision->getId() : null; } }