Page MenuHomePhabricator

Make language variant a parser option
Open, Needs TriagePublic

Description

Currently the ParserCache is split over the language variant via heavy use of globals:

  1. ParserOptions::optionsHash which stringifies the used ParserOptions to be used as the ParserCache key is using global state to retrieve the page language and get a language converter for it. It calls LanguageConverter::getExtraHashOptions on the converter and appends that to the key.
  2. LanguageConverter::getExtraHashOptions for languages that support variants is calling LanguageConverter::getPreferredVariant, which uses global $wgRequest to figure out what variant was requested.

This system magically works because Parser independently calls LanguageConverter::convert which as well uses the getPreferredVariant and global state to figure out the necessary language variant.

The two places need to be kept in sync, and it seems they are not, for example, ParserOptions::getDisableContentConversion is not respected in ParserOptions::optionsHash, unnecessarily splitting the ParserCache on language variant when the variant conversions might have been disabled. There probably are other bugs in this system.

Instead, neither ParserOptions nor Parser should use global state to find out the language variant.

The following needs to happen (not exact plan, a rough straw man)

  1. target variant should become a cache-splitting ParserOptions key. It should be set automatically when ParserOptions::newFromContext is called, and all the calls that require language conversion have to use the correct way of constructing the ParserOptions
  2. Parser should take the target language variant from ParserOptions and not from global state
  3. ParserOptions shouldn't append language variant from global state to the key.
  4. LanguageConverter::convert should be deprecated and replaced with LanguageConverter::convertTo

Related Objects

Event Timeline

Adding Platform Engineering as Platform Team Workboards (Green) was archived and as open tasks should have an active project tag

matmarex added subscribers: liangent, daniel, Winston_Sung and 5 others.
matmarex subscribed.

T44240 and T244219 were older tasks describing basically the same problem, but I prefer the solution proposed here, so I'm merging this way.

See also:

These overlap with this task, at least in so far that the fact that "language variant" relies on global state leaks throuhgout the application, including in Content->getPageLanguage() and thus the onPageContentLanguage hook.

Those tasks work toward containing the leakage to within the ParserOptions::newFromContext call stack.

Yeah, the way LanguageConverter::getPreferredVariant() works breaks all sorts of abstractions, and I'm frankly surprised it works at all.

I'll just mention here that there may be a benefit to making preferred variant a *postprocessing* option (aka, an option to ParserOutput::getText(), though see T293512: ParserOutput::getText() should be removed from ParserOutput) instead of a ParserOption. That is, variant conversion is usually relatively fast, and it doesn't necessarily need to split the cache. The one caveat is that the *current* legacy parser stores result of parsing the in-line -{}- rules in the *current global LanguageConverter object* and so they affect all subsequent uses of that object. LanguageConverter::convertTo() has side effects on that global state (see rMW4e4008c976fda5d3e7e823255365e72a0ac77aee and the long commit message on e7a762fd598afb0b5954a6a6ef98101eb6880357).

Anyway, I'm on board with any effort to try to clean this mess up. Just try to leave the option open to do language conversion post-cache.

Also worth noting: since LanguageConverter maintains persistent parse state, it should really not be a singleton the way that LanguageConverterFactory::getLanguageConverter() makes it, but instead should be instantiated and kept live only for the duration of a particular parse and then discarded. Probably a good step there would be adding a '$forceNew' or '$forceFresh` option to LanguageConverter::getLanguageConverter() and insisting that the parser (and/or anything else which can call convert or convertTo and thus add new rules to LanguageConverter state) start with a fresh language converter.

(Note that ::autoConvert() *claims* that "this function would not parse the conversion rules" but it actually calls recursiveConvertTopLevel() inside its attribute parsing code, so rules embedded inside attribute values will indeed be parsed and affect the global language converter: T346911.)

Also worth noting: since LanguageConverter maintains persistent parse state, it should really not be a singleton the way that LanguageConverterFactory::getLanguageConverter() makes it

Filed as T349064.

See also T341244: ParserOptions and Title::getPageViewLanguage may disagree on the lang/dir where page directionality may change based on the selected variant. That should reflect the variant from the parser options as well.