Page MenuHomePhabricator

Multilingual SVG with non-English default language does not display English
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Go to https://commons.wikimedia.org/wiki/File%3ASVG_Test_Russian_Default_Language.svg
It is a multilingual SVG file that uses Russian as the default language:

<svg xmlns="http://www.w3.org/2000/svg"
     viewBox="0 0 400 100"
     version="1.1">
  
  <title>SVG Test Russian Default Language</title>

  <rect width="100%" height="100%" fill="pink"/>

  <switch transform="translate(20,80)" font-family="sans-serif" font-size="80">
    <text systemLanguage="en">Moscow</text>
    <text systemLanguage="ru">Москва</text>
    <text>Москва</text>
  </switch>
</svg>

The Commons File: page displays Russian instead of English.
The File: page uses this thumbnail which should display English but it does not:

This synthetic thumbnail forces English

Original (more complicated) test case:

What happens?:
The first URL displays Russian.
The second URL displays English.

The file is detailed (I will make a simpler one). The key for determining which language is to look at the southwest corner of the map for the blue Moscow River. It will be either

  • Москва-река en
  • Москва-река ru

The first line is the explicit systemLanguage="en" clause, and the second is the default clause.

What should have happened instead?:
Both URLs should display English.

Software version (skip for WMF-hosted wikis like Wikipedia):
Thumbor/7.3.2
librsvg 2.44.10

Other information (browser name/version, screenshots, etc.):

When Thumbor/7.3.2 generates a thumbnail, it does not explicitly force English. Instead, Thumbor relies on librsvg's Unix environment to set the language to English. If that environment does not set librsvg's language to en, then the SVG's default clause would be chosen. For example, if the system Language is en-US, then the default language would be selected rather than the systemLanguage="en" clause.

Perhaps the librsvg Unix environment does not set LC_ALL to something reasonable.

Given that MW parochial semantics assumes an English default, Thumbor should force an unspecified language to always be en.

T337139
T261192
T335361

Event Timeline

This bug should affect all multilingual SVG that does not use English as the default.

For example,

The File: page looks OK (i.e., the image is in English), but that is because it comes from the cache.
Have Thumbor create an image unlikely in the cache, and we get the French default:

I need to change the "File:Moscow metro map multilingual future draft.svg". If you still need it please fill free to revert to yesterday's version.

I need to change the "File:Moscow metro map multilingual future draft.svg". If you still need it please fill free to revert to yesterday's version.

Go ahead and change it. There is a simpler test case now.

Another user has problems:

I suspect that Thumbor does all non-English default SVG with an English clause incorrectly. For that reason, setting high priority.

Glrx triaged this task as High priority.Sep 10 2023, 12:33 PM

Code fix is would be related to T337139 by @hnowlan. Will comment there about the fix.

When a URL does not specify lang, then set env to {'LC_ALL': 'en'} so rsvg-convert` knows the language should be English ("en").

So i think related to the fix for T310235.

MediaWiki considers english the default, but maybe thumbor does not. So if something is set to english, mediawiki sees it equals the default and does not send the language along. Thumbor sees no language specified and assumes undefined.

So i think SvgHandler::SVG_DEFAULT_RENDER_LANG needs to be changed to "und" in order to match thumbor behaviour.

Another file showing an issue is https://commons.wikimedia.org/wiki/File:QR_Code_V4_structure_example.svg

<switch transform="translate(0,148)">
<text systemLanguage="en">3. Data and error correction keys</text>
<text systemLanguage="de">3. Fehlerkorrigierbare Daten</text>
<text>3.</text>
</switch>

Both "default language" and "English" in the language selector lead to the fallback text being shown.

So i think related to the fix for T310235.

MediaWiki considers english the default, but maybe thumbor does not. So if something is set to english, mediawiki sees it equals the default and does not send the language along. Thumbor sees no language specified and assumes undefined.

Related to T337139 (which adds lots of lang processing but still does not set default)

In the default (no lang specified) case, Thumbor does not set any language.

Previous versions of Thumbor/librsvg used the LANG environment variable (which is set in the operating environment). The update of Thumbor uses librsvg that uses LC_ALL (which is not defaulted in the operating environment) and has langtag/locale string issues. I suspect the unset LC_ALL and friends cause librsvg to do something unusual.

So i think SvgHandler::SVG_DEFAULT_RENDER_LANG needs to be changed to "und" in order to match thumbor behaviour.

I do not know the implications of such a change.

I think the straightforward fix is the new version of Thumbor should match the old version's behavior. Many files that worked before are broken now. Some are only working because good PNGs are still in the cache.

I would like to see English dropped as the MediaWiki default language, but that needs a larger discussion as it is a breaking change. Many SVG files have no default text clause. Producing no text is not a good user experience.

Re T337139. I want to keep things simple. The URL

should use

even if tlh is not available. With an explict lang, we should not check whether that lang is available.
That is also the semantics of

  • [[File:SystemLanguage.svg|lang=tlh|...]

It forces the argument.
There are many files over 256 kB that do not collect all available languages.

So i think SvgHandler::SVG_DEFAULT_RENDER_LANG needs to be changed to "und" in order to match thumbor behaviour.

I do not know the implications of such a change.

I think the straightforward fix is the new version of Thumbor should match the old version's behavior. Many files that worked before are broken now. Some are only working because good PNGs are still in the cache.

That was done in @hnowlan's change 962563 although it's not deployed yet. Per my comment there, I think it's fine for Thumbor to interpret a missing language parameter as "en". If MediaWiki wants to show a thumbnail with no defined language, it can send "und" as the language code. Currently Thumbor would log a warning and fall back to "en" in that case, but we can patch it to not do that.

The Thumbor SVG plugin we are using is owned by WMF. There's no need for us to work around upstream behaviour, we can make it do whatever we want.

So i think SvgHandler::SVG_DEFAULT_RENDER_LANG needs to be changed to "und" in order to match thumbor behaviour.

I do not know the implications of such a change.

I think the straightforward fix is the new version of Thumbor should match the old version's behavior. Many files that worked before are broken now. Some are only working because good PNGs are still in the cache.

That was done in @hnowlan's change 962563 although it's not deployed yet.

This has since been deployed

Does this mean that the problem is solved, for non-deployers sake?

TheDJ subscribed.

Looks fixed to me.

Great, thank you. I'll convert the file I work with, 350KB and hundreds of translations. If there will be any problems, I'll be back.

A reminder; the language code still has to be KNOWN to Mediawiki (not all are) and the issue with -postfixes is also not ideal yet, but both are different issues from THIS issue.

Is it fully working? Consider :c:File:Chronologie constitutions françaises.svg, a wide, horizontal timeline image with inner labels and callouts. Due to the size of the diagram, the callouts are really only visible at max zoom, but there seems to be a problem with that. It is used at five Wikipedias, including these three articles in French, Catalan, and German:

Mousing over the image in the article is 'underlying link'; clicking it takes you to click-1; clicking again takes you to click-2.

The first click displays the correct values for the language, but the font is small, just about legible in the labels, but too small for the callouts. Clicking a second time makes everything legible at max zoom, but it's all in English. (The 'click-2' links all have the same url.)

How do I get the text in the correct language at max zoom? Does the fact that they are all in English have something to do with my IP, or my preference settings? Will the largest images be in the proper language for users in other countries or other settings? Or is a piece of the bug still there?

Notice also that the link at the fr-wiki article underlying the image is a commons domain with url query parameter ?uselang=fr, whereas the links at ca and de have identical basepagename, and are on domains ca.wikipedia and de.wikipedia, without a query param.

: There's evidently something ( a lot of things) I'm missing, because history of where you click from seems to matter, and clicking the links above directly doesn't present the same behavior as starting from the article, and clicking in sequence. So, use the links at the bullets above at your own risk, I guess.

Clicking like that takes you to the raw original SVG, which means the browser renders them instead of Mediawiki, and browsers don’t support SVG translations. So yes, this is expected. Translations only work in the thumbnails, and naturally if people make very large drawings, those won’t be very readable, unless you have a desktop screen and use the media viewer. This is documented on the SVG help page and has always been like that.

Clicking like that takes you to the raw original SVG, which means the browser renders them instead of Mediawiki, and browsers don’t support SVG translations. So yes, this is expected. Translations only work in the thumbnails, and naturally if people make very large drawings, those won’t be very readable, unless you have a desktop screen and use the media viewer. This is documented on the SVG help page and has always been like that.

Browsers do support translations. When you click through, the SVG file is loaded into the browser rather than the PNG bitmap. The language displayed is based on the browser's language preferences. If the browser prefers French, then French will be displayed no matter which wiki you clicked from.

There may be a wikipedia option for adding the uselang parameter. Going to Commons from the French wiki apparently wants to force a French language user interface. The other wikis take you to Commons and use your default language interface.

The displayed language of the image is set by the lang parameter. The design default on Commons is to display the English language version when lang is not specified. The user must select an alternate language from the render-this-image-in dropdown.

The File page displays the image at 800px. For a more readable PNG in the selected rendering language, click on one of the other PNG resolutions (e.g., 2560).