Description
Details
Related Objects
- Mentioned In
- T424159: Parsoid doesn't "guessVariant" at top level on Serbian Wikipedia, and makes sr-ec rendering "unusable" for most pages which are written in Cyrillic (90%+)
T423785: Parsoid Read Views to deploy ~2026-04-20 (Language Converter wikis)
T422961: LanguageConverter doesn't convert inside <indicator> - Mentioned Here
- T191571: LanguageConverter::guessVariant should go away
T422961: LanguageConverter doesn't convert inside <indicator>
Event Timeline
We should drop guessVariant and decide a way to set different Wikitext source code language instead.
100% agreed. This bug is apparently more subtle though -- when I reviewed the code the new LC implementation *does* call ::guessVariant(), but some text on Cyrillic/Latin wikis is being (correctly) translated with the new implementation, but *not* translated using the old language converter implementation. Compare the infobox on:
old: https://sr.wikipedia.org/w/index.php?title=Bitka_kod_Pantine&useparsoid=0&variant=sr-el
and
new: https://sr.wikipedia.org/wiki/Bitka_kod_Pantine?useparsoid=1&parsoidnewlc=1&variant=sr-el
Change #1269716 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):
[mediawiki/core@master] LanguageConverter: Allow disabling top-level variant "guess"
Ok, on investigation, Parsoid does invoke guessVariant(), but the legacy Parsoid invokes it *twice*: once on the overall text of any string to be converted (including the embedded html tags and attribtues) and then again on the text substrings between tags. That seems to be a bug: if the topmost 'guess' returns false, then nothing on the page will be converted at all. It seems like the intended behavior is for the individual strings / paragraphs / etc to be the proper subjects of "guessing".
I've added a patch to experimentally allow disabling the top level "guess" via ?nolcguess=1 on the URL, keeping the lower level guesses. This lets us perform an apples-to-apples comparison with Parsoid's implementation via visualdiff, unblocking that work. We can also take this behavior more easily to the community if we can easily show the difference between the two renderings on specific pages.
Change #1269716 merged by jenkins-bot:
[mediawiki/core@master] LanguageConverter: Allow disabling top-level variant "guess"
Change #1271038 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):
[mediawiki/core@wmf/1.46.0-wmf.24] LanguageConverter: Allow disabling top-level variant "guess"
Change #1271038 merged by jenkins-bot:
[mediawiki/core@wmf/1.46.0-wmf.24] LanguageConverter: Allow disabling top-level variant "guess"
Mentioned in SAL (#wikimedia-operations) [2026-04-14T20:30:12Z] <cscott@deploy1003> Started scap sync-world: Backport for [[gerrit:1271030|ParsoidLanguageConverter: convert inside <indicator> (T422961)]], [[gerrit:1271038|LanguageConverter: Allow disabling top-level variant "guess" (T419328)]]
Mentioned in SAL (#wikimedia-operations) [2026-04-14T20:32:00Z] <cscott@deploy1003> cscott: Backport for [[gerrit:1271030|ParsoidLanguageConverter: convert inside <indicator> (T422961)]], [[gerrit:1271038|LanguageConverter: Allow disabling top-level variant "guess" (T419328)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2026-04-14T20:40:31Z] <cscott@deploy1003> Finished scap sync-world: Backport for [[gerrit:1271030|ParsoidLanguageConverter: convert inside <indicator> (T422961)]], [[gerrit:1271038|LanguageConverter: Allow disabling top-level variant "guess" (T419328)]] (duration: 10m 18s)
For Serbian Wikipedia, there should be no guessing at all. On Serbian Wikipedia, if user selects Cyrillic, it should transliterate to Cyrillic everything outside lang tags and -{...}- syntax. If article is written in sr-Cyril, it is no-op anyway to transliterate Cyrillic to Cyrillic.
@cscott, workaround ?nolcguess=1 does not work on Serbian Wikipedia. Turn on Parsoid and check out https://sr.wikipedia.org/wiki/Џефри_Сакс?nolcguess=1. Still messed up.
How about try these ones (I think I found another problem, which is the interface messages are not loaded according to language variant if nolcguess is activated):
https://sr.wikipedia.org/w/index.php?title=Џефри_Сакс&variant=sr-ec&nolcguess=1&useparsoid=1
https://sr.wikipedia.org/w/index.php?title=Џефри_Сакс&variant=sr-el&nolcguess=1&useparsoid=1
- sr-ec is broken in same way.
- sr-el has no that problem, as troublesome parts in article are already in Latin script.
Basically, I would say that detection of content is totally irrelevant and unnecessary.
- If user select sr-ec, display article as-is. Any Latin text is intentionally Latin, it can stay that way.
- If user select sr-el, convert every Serbian Cyrillic letter to Latin equivalent, avoiding text inside -{...}-, and inside <lang> tags.
Simple as that.
I did some changes to SR Wikipedia, and I came to conclusion that we can proceed with no detection of content.
I updated modules Lang, URL, and Citation/CS1 to implement transliteration prevention.
Lang and URL are straightforward, but CS1 is maybe doing too much. It may need to be reformatted so that if citation is marked as language=sr*, it is still left for transliteration. Basically, I am doing educated guess what to transliterate or not, but it is good enough for now.

