Page MenuHomePhabricator

Proposal: Introduce ParserInput object
Open, Needs TriagePublic

Description

Problem: We want to be able to provide rich context to the parser, which is currently done by providing a RevisionRecord. However, RevisionRecords should only exist on "proper" pages, not special pages (T380536). But wikitext needs to be able to be parsed in the context of special pages (or no page at all) (T381982).

Solution: introduce a ParserInput object, to match parserOptions and ParserOutput. ParserInput should be able to replace all kinds of parser input we currently use, namely strings, Content objects, and RevisionRecords. It's conceptually similar to Parsoid's PageConfig object.

Interface draft:

class ParserInput {
    public static function newFromWikitext( string $wikitext, PageReference $page  );
    public static function newFromContent( TextContent $content, PageReference $page  );
    public static function newFromSlots( RevisionSlots $slots, PageReference $page  );
    public static function newFromRevision( RevisionRecord $slots );

    public function getPage(): PageReference;
    public function getPageLanguage(): LanguageCode;

    public function getContent(): TextContent;
    public function getSlots(): RevisionSlots;
    public function getRevision(): ?RevisionRecord;
}

This approach would also allow us to get rid of currentRevisionRecordCallback in parserOptions. Note that we'd probably need to keep speculativePageIdCallback and speculativeRevIdCallback, since we want to record if and when they are used. Incorrect, per discussion. Instead: This might allow us to get rid of speculativePageIdCallback and speculativeRevIdCallback in ParserOptions, but we would still need to record if and when they are acutally used.

Event Timeline

This approach would also allow us to get rid of currentRevisionRecordCallback in parserOptions.

Not really. You'd still need a way for extensions to generate the parser input, given a title.

Response in T380536#10409816 to the effect that this doesn't really solve the issue of the input type for RevisionRenderer (and MCR more generally).

I agree with @Tgr that this doesn't do much about the currentRevisionRecordCallback, which is part of the process used to map a LinkTarget to a "revision" (really maybe a fragment or content?).

The use cases in production are:

  • Default implementation maps {{Foo}} to "the latest revision of link target Foo (on the current wiki)"
  • Flagged revision uses this to map {{Foo}} to "flagged revision XXX of link target Foo (on the current wiki)"
  • Translate uses this to map {{Foo}} to {{Foo/de}} if the current title ends in /de. (This then recurses upwards to match a specific revision of this new title.)
    • If Foo/de doesn't exist, I believe this returns a FakeRevisionRecord for Foo/de so that Foo/de can be added to the page dependencies, since if Foo/de is created later, the output of this page will change.
  • And of course special page transclusion is a thing, so in theory this could return content (a "revision"?) for a special page.
  • There's code in the legacy parser to resolve {{MediaWiki:message-name}} to the result of wfMessage(...)->plain() as well.
  • Redirects are done via this mechanism as well: if the RevisionRecord::hasSlot(SlotRecord::MAIN) then the content corresponding to the main slot is used for Content::getWikitextForTransclusion() as well as Content::getRedirectTarget

This is done via ParserOptions::getTemplateCallback() which defaults to Parser::statelessFetchTemplate. Parser::statelessFetchTemplate calls the onBeforeParserFetchTemplateRevisionRecord hook and then invokes Parser::fetchCurrentRevisionRecordOfTitle. Parser::fetchCurrentRevisionRecordOfTitle calls ParserOptions::getCurrentRevisionRecordCallback() which defaults to Parser::statelessFetchRevisionRecord.

Broadly speaking, we have an extensible transclusion mechanism used to map a LinkTarget to a list of RevisionRecords (specific title and revision id) which are recorded for dependency purposes. The main slot content associated with those RevisionRecord is used to fetch a Content, which is queried for redirect status and for "wikitext for transclusion" and ultimately returned as a [wikitext string, Title] pair. The title is used to index a memoization cache, to identify self-transclusions, and to render a link to the title if the transclusion is not found; it could probably be replaced with a LinkTarget which would work as well for those purposes.

There's some special purpose handling of special page transclusion and message page transclusion that could probably be addressed by a slightly broader definition of RevisionRecord.

I agree with @Tgr that this doesn't do much about the currentRevisionRecordCallback, which is part of the process used to map a LinkTarget to a "revision" (really maybe a fragment or content?).

Right, this is used to load templates. I was confused about that. Silly me.

So maybe this could be used to get rid of speculativePageIdCallback and speculativeRevIdCallback. But we'd still need a way to track whether the outpput actually depends on the page ID or revision ID.

So maybe this could be used to get rid of speculativePageIdCallback and speculativeRevIdCallback.

IIRC those exist because we used to guess the page ID and rev ID when rendering the page before save (as part of the "is this a valid content object that can be saved?" tests), and reused the parser output when the guess was correct. Do we not do that anymore?

IIRC those exist because we used to guess the page ID and rev ID when rendering the page before save (as part of the "is this a valid content object that can be saved?" tests), and reused the parser output when the guess was correct. Do we not do that anymore?

Yes we still do that - my thinking was that they wouldn't have to be callbacks injected into ParserOptions anymore. You'd have a PreviewParserInput class that directly implements the functionality.

ParserOptions is used as a key for ParserCache (in a rather indirect way). Having callbacks in there is odd.

On the other hand, ParserOptions is tracking what options the output actually depends on. We'd probably need a similar mechanism for ParserInput - or at least indicate that certain ParserInput objects producing non-cacheable outputs (because they e.g. depend on speculative IDs).

I'd generally try to avoid classes which are neither services (stateless) nor value objects (pure state). A PreviewParserInput that mixes page content with some DB access logic seems unwieldy. Since (AFAIK) we only ever use one speculative rev / page ID mechanism, we just need to track whether 1) the parse is actually associated with a real page / revision 2) whether it already has a page / rev id, and have a service do the speculating. I don't think either the ParserInput or ParserOptions needs to depend on that service.

I'd generally try to avoid classes which are neither services (stateless) nor value objects (pure state). A PreviewParserInput that mixes page content with some DB access logic seems unwieldy.

I agree that stateful objects should be avoided when possible. However, it seems like PreviewParserInput would be clearer than having callbacks in ParserOptions.

Since (AFAIK) we only ever use one speculative rev / page ID mechanism, we just need to track whether 1) the parse is actually associated with a real page / revision 2) whether it already has a page / rev id, and have a service do the speculating.

This would be true for allmost all parses. What we need to record is whether the speculative ID was actually used during parsing. In the 99.99% of cases where it's not used, we don't have to re-parse the page after saving the revision - we can re-use the rendering we did before saving (from an opportunistic stash, or triggered by an edit filter like AbuseFilter).

I don't think either the ParserInput or ParserOptions needs to depend on that service.

Neither needs to depends on the database. We can unconditionally do the lookup before, so we can avoid the database access in PreviewParserInput . But we'd still need to do the access tracking in a stateful service. We currently do it in ParserOutput (see ParserOutput::SPECULATIVE_FIELDS) via recordOption() called by ParserOptions.