Page MenuHomePhabricator

Language conversion fixes in extensions using LanguageConverter are leaking conversion rules outside extension tags
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Adding conversion rules with texts expected to be applied by these rules in extension tag(s)
  • Adding texts covered by the conversion rule in tag(s), which should not be affected (especially before the tag)

What happens?:
Conversion rules are applied to texts outside the extension tag(s), both before-tag texts and after-tag texts

What should have happened instead?:
Conversion rules should only applied in the extension tag.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:
(Listed below, see Special:Version)

Links:

Event Timeline

Note that conversion after extension tag (only) is matching the normal behavior, but it's impossible to achieve.
The thing is, extension tags are processed during parsing, but language conversion on the whole content is done at the very end, after tidy or something.
And language converter doesn't support disabling some type of conversion rules, for cases mentioned above, any rules that can affect global states should be disabled.

It currently act like this:

1. outer text (unexpected: got converted caused by 4., should not be converted)

2. tag inner text without conversion enabled (expected)

3. outer text (unexpected: got converted caused by 4., should not be converted)

4. tag inner text with tag-wide conversion rules added and conversion enabled (expected)

5. outer text (unexpected: got converted caused by 4., should not be converted)

6. conversion rules removed page-wide

7. outer text (not converted caused by 4. added + 6. removed conversion rules)

8. tag inner text without conversion enabled (expected)

9. outer text (not converted caused by 4. added + 6. removed conversion rules)

10. tag inner text with tag-wide conversion rules added and conversion enabled (expected)

11. outer text (not converted caused by 4. added + 6. removed conversion rules but didn't affected by 10.)

Change 763939 had a related patch set uploaded (by Func; author: Func):

[mediawiki/core@master] LanguageConverter: Make reloadTables() method public for purging out manual conversion rules

https://gerrit.wikimedia.org/r/763939

The behavior of TemplateData is fine, and PortableInfobox doesn't have a project tag, right?

PortableInfobox is an extension made by Fandom with ported versions, so it's outside the scope of Wikimedia Phabricator.

Change 763943 had a related patch set uploaded (by Func; author: Func):

[mediawiki/extensions/InputBox@master] Purge out manual conversion rules

https://gerrit.wikimedia.org/r/763943

See T484: RfC: Scoped language converter as well, and https://wikimedia.slack.com/archives/C024Z8K9CAU/p1660748317850709 for those with access to WMF slack. tl;dr the fundamental issue here is that LanguageConverter for a given language is a global singleton with persistent state. I don't think just making reloadTables public is a good fix, because it doesn't solve the root cause: now all the rules will be wiped after the extension content is processed, which isn't any better than adding rules after the extension content is processed. (T263082 is mentioned here, but it probably has the same issue with language converter rules "leaking" out, although that might be a Feature rather than a Bug in that case.)