Page MenuHomePhabricator

Make InputBox extension compatible with Parsoid
Open, MediumPublic

Description

Parsoid has its own extension API - see https://www.mediawiki.org/wiki/Parsoid/Extension_API.
In this first phase, we are targeting tag-hook extensions for migration.
The InputBox extension needs an update to work directly with Parsoid.

Known Blockers: Missing Functionality in ParsoidExtensionAPI

  • replaceVariables

Possible blockers - to be discussed

  • getTargetLanguage and getTargetLanguage->getDir
  • getTargetLanguageConverter()->convert
  • getOptions()->getUserLangObj

Should be fine

Event Timeline

Arlolra triaged this task as Medium priority.Feb 25 2021, 5:27 PM
Arlolra moved this task from Backlog to Missing Functionality on the Parsoid board.

Change 745567 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/extensions/InputBox@master] WIP: Support of Parsoid for InputBox

https://gerrit.wikimedia.org/r/745567

Change 745929 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[integration/config@master] Add 'parsoid' to 'InputBox' extension dependencies

https://gerrit.wikimedia.org/r/745929

Change 745929 merged by jenkins-bot:

[integration/config@master] Add 'parsoid' to 'InputBox' extension dependencies

https://gerrit.wikimedia.org/r/745929

Mentioned in SAL (#wikimedia-releng) [2021-12-13T19:12:54Z] <James_F> Zuul: Add 'parsoid' to 'InputBox' extension dependencies for T272943

  • getTargetLanguage and getTargetLanguage->getDir

This should be available in the Parsoid API... ish. We have PageConfig::getPageLanguage() and PageConfig::getPageLanguageDir(). There are some subtle differences between 'target', 'page', and 'interface' language. See T194815: InputBox using interface language in parser hook causing cache pollution. which is probably related, and T114640 discusses some details of what "target language" means. But I think T194815 says this is probably a bug that should be fixed, although hopefully we don't have to solve T85581: Parsoid page views: need to do something about {{int:}} to do it.

  • getTargetLanguageConverter()->convert

This should generate the same markup that the equivalent -{....}- wikitext markup would, and then we'll convert it in the postprocessor along with everything else. However, I strongly suspect that this is actually a case where *interface* language is required, which is covered by T85581: Parsoid page views: need to do something about {{int:}}. Interface messages are pre-rendered into variants, so a specific conversion should not be necessary.

  • getOptions()->getUserLangObj

Explicitly unsupported. Again, see the references above, this is probably a bug and either the "interface" or "page"/"content" language is what is wanted here. See T267059: Spec for precisely positioned, localized error message in Parsoid for a discussion of the {{int}} aspect; basically we'd be calling WTUtils::createLocalizationFragment here, although it wouldn't be surprising if various parts of that are not implemented yet.

Status after investigation

There's two issues with porting this ticket:

  • the one related to language and localization - see Scott's comment above, this requires more discussion, but it's less blocking than the other one
  • the use of replaceVariables.

replaceVariables is a Wikitext to Wikitext transformation that (among others) resolves variables and templates. The way the InputBox extension currently works is essentially passing the content of the extension tag through replaceVariables, splitting the result along new lines, and matching the resulting lines against a variable=value pattern. To reproduce that behaviour, we have two possibilities, as discussed with @cscott and @ssastry:

  • implement and expose a "preprocess" method in Parsoid which would mimic replaceVariables. It could build on the idea of a "Pre-save transform (PST)+" (https://www.mediawiki.org/wiki/Pre-save_transforms) - PST is a desired feature anyway (see T247110), and building that and adding what's needed for a full pre-processing may be a valid avenue. There are some concerns around the idea of exposing the "preprocess" method (this would be a WT2WT transformation, wikitext strings don't compose well - so some care would be required in the use of this method) - marking it as "for internal use only" may be a response to these concerns.
  • implement and expose a way for extension to parse wikitext as a variable/value list (so that the output format is a variable/value list), and use this to feed InputBox.

Both these avenues require significant development that may not be relevant to the critical path we're currently on for Read-Views, so we'll revisit this at a later time.

  • getTargetLanguageConverter()->convert

This should generate the same markup that the equivalent -{....}- wikitext markup would, and then we'll convert it in the postprocessor along with everything else. However, I strongly suspect that this is actually a case where *interface* language is required, which is covered by T85581: Parsoid page views: need to do something about {{int:}}.

No, it’s used to process content in page language. For example the following wikitext on a Serbian-language wiki/page…

<inputbox>
type=comment
hidden=yes
buttonlabel=-{sr-el:Button; sr-ec:Батн}-
</inputbox>

…produces Button in Latin-script mode and Батн in Cyrillic-script mode. Language conversion is run over only overridden values (which are expected to be in page language), not the default ones (which are in interface language). Which however leads to…

  • wfMessage

This global function is used a lot, mostly without overriding the language, so it implicitly uses user language.

  • getOptions()->getUserLangObj

Explicitly unsupported. Again, see the references above, this is probably a bug and either the "interface" or "page"/"content" language is what is wanted here. See T267059: Spec for precisely positioned, localized error message in Parsoid for a discussion of the {{int}} aspect; basically we'd be calling WTUtils::createLocalizationFragment here, although it wouldn't be surprising if various parts of that are not implemented yet.

I just had a moment of "wait, this thing is called and is actually not used.... anywhere? it's not even assigned?
Turns out that that usage of getUserLangObj is used explicitly to fragment the cache along user languages - so this is the point where I slowly and carefully back off and we'll discuss that at the next opportunity :D