Page MenuHomePhabricator

Export LanguageConverter enabled status in page info from core
Closed, ResolvedPublic

Description

PHP core enables -{...}- syntax parsing only if $wgDisableLangConversion is false (which it is for all deployed WMF wikis in production) and the page language for the current page has a defined variant. (Even if all variants for the given page language are disabled via $wgDisabledVariants.)

This is somewhat unfortunate, since it makes parsing depend on the current status of various variant tables in the PHP core, which can change over time. For example T45547: MediaWiki needs a fictitious variant for English for easier variant development work is complicated by the fact that merely adding a variant for English would change the parsing of enwiki by turning on -{...}- parsing on all English pages, even if the pig latin variant is disabled. It also requires you to know the pagelanguage for the current page before tokenizing the wikitext.

The first proposed solution to this problem was to honor only $wgDisableLangConversion, and turn on -{...}- parsing regardless of pagelanguage. This is the cleanest solution, but it made @tstarling quite nervous. It greatly expands the number of pages LanguageConverter is enabled for, and it is generally felt that LanguageConverter is a vulnerable piece of code.

Failing that, we need to export the necessary information to allow Parsoid to determine if languageconverter should be enabled for a page. Specific proposals are discussed below.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Alternatively, we could dump the table of "languages with variants defined" as part of the per-wiki siteinfo.

Another wrinkle: the PHP parser doesn't take into account "disabled variants" when determining whether to parse -{...}- constructs. So disabling a variant will still enable languageconverter. This means you can't just create (say) en-x-pig-latin and leave it disabled, since it will still "turn on" language converter.

This is a problem that could potentially be fixed on the PHP side, and then the siteinfo would just export the *non-disabled* variants.

Looking at the code, it looks like the real deal here is whether the Language object's ->getConverter() returns a LanguageConverter or a FakeConverter. Any language with more than one variant is almost certainly going to return a non-fake converter, but it's possible for a language with only the one variant to have a non-fake converter, especially if $wgDisabledVariants comes into play.

IMO putting a flag for !$pageLang->getConverter() instanceof FakeConverter into ApiQueryInfo wouldn't be the cleanest thing to do.

If we assume that LanguageConverter::$languagesWithVariants is equivalent to both the set of languages with variants and the set of languages with non-fake converters and that it remains a fairly small set, it'd be easy to add output of variants to meta=siteinfo&siprop=languages (but it would be a breaking change for format=xml) or to add a new siprop for it. You'd then determine if LanguageConverter is enabled by looking for the presence of the list of variants for the language, regardless of whether the list has only one element.

Otherwise we'd probably have to instantiate a Language object for every language during the processing of siprop=languages, which is not at all efficient, so we'd be stuck with adding more junk to ApiQueryInfo.

@Anomie My inclination right now is to (1) fix ->getConverter() to return a FakeConverter if all the variants for a given language are disabled -- that solves the problems with T45547: MediaWiki needs a fictitious variant for English for easier variant development work where language conversion gets enabled for all English wikis, even if pig latin is disabled -- and then add a list of non-disabled variants to meta=siteinfo&siprop=languages or some new siprop like meta=siteinfo&siprop=variants.

Then parsoid still has to fetch the pagelanguage for each page, but it can then check against the cached siteinfo to see if the language has at least one non-disabled variant.

Hm. Preventing conversion when all variants are disabled is a little tricky. I believe $wgDisableLangConversion is meant mostly for UI purposes, to hide a specific variant, not to change how the existing wikitext is parsed. I've tweaked things so that the pig latin patch (https://gerrit.wikimedia.org/r/72053) creates a FakeConverter unless pig latin is enabled. This has the same effect (pig latin doesn't show up in Language::getVariants(), etc) but doesn't try to make $wgDisableLangConversion do more than it ought to.

This makes maintenance/language/listVariants.php return reasonable results -- ie, English is listed as a language with variants iff $wgUsePigLatinVariant is true -- but still isn't quite the right thing for parser purposes. Instead meta=siteinfo&siprop=variants should create the language object via Language::factory( ... ) as the listVariants.php script does, then check whether $language->getConverter() instanceof FakeConverter as @Anomie suggests. That will correctly match the behavior of PHP, which still parses -{...}- constructs even if all variants for the language are disabled via $wgDisableLangConversion entries.

Change 340373 had a related patch set uploaded (by C. Scott Ananian):
[mediawiki/core] T153341: Export list of languages which will trigger language conversion.

https://gerrit.wikimedia.org/r/340373

Change 340373 had a related patch set uploaded (by C. Scott Ananian):
[mediawiki/core] Export list of languages which will trigger language conversion

https://gerrit.wikimedia.org/r/340373

Change 340373 merged by C. Scott Ananian:
[mediawiki/core] Export list of languages which will trigger language conversion

https://gerrit.wikimedia.org/r/340373

Change 348998 had a related patch set uploaded (by C. Scott Ananian):
[mediawiki/services/parsoid@master] Fetch LanguageConverter status as part of wiki configuration.

https://gerrit.wikimedia.org/r/348998

Change 348998 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Fetch LanguageConverter status as part of wiki configuration.

https://gerrit.wikimedia.org/r/348998