Page MenuHomePhabricator

Multilingual SVG with non-English default language does not display English
Open, HighPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Go to https://commons.wikimedia.org/wiki/File%3ASVG_Test_Russian_Default_Language.svg
It is a multilingual SVG file that uses Russian as the default language:

<svg xmlns="http://www.w3.org/2000/svg"
     viewBox="0 0 400 100"
     version="1.1">
  
  <title>SVG Test Russian Default Language</title>

  <rect width="100%" height="100%" fill="pink"/>

  <switch transform="translate(20,80)" font-family="sans-serif" font-size="80">
    <text systemLanguage="en">Moscow</text>
    <text systemLanguage="ru">Москва</text>
    <text>Москва</text>
  </switch>
</svg>

The Commons File: page displays Russian instead of English.
The File: page uses this thumbnail which should display English but it does not:

This synthetic thumbnail forces English

Original (more complicated) test case:

What happens?:
The first URL displays Russian.
The second URL displays English.

The file is detailed (I will make a simpler one). The key for determining which language is to look at the southwest corner of the map for the blue Moscow River. It will be either

  • Москва-река en
  • Москва-река ru

The first line is the explicit systemLanguage="en" clause, and the second is the default clause.

What should have happened instead?:
Both URLs should display English.

Software version (skip for WMF-hosted wikis like Wikipedia):
Thumbor/7.3.2
librsvg 2.44.10

Other information (browser name/version, screenshots, etc.):

When Thumbor/7.3.2 generates a thumbnail, it does not explicitly force English. Instead, Thumbor relies on librsvg's Unix environment to set the language to English. If that environment does not set librsvg's language to en, then the SVG's default clause would be chosen. For example, if the system Language is en-US, then the default language would be selected rather than the systemLanguage="en" clause.

Perhaps the librsvg Unix environment does not set LC_ALL to something reasonable.

Given that MW parochial semantics assumes an English default, Thumbor should force an unspecified language to always be en.

T337139
T261192
T335361

Event Timeline

This bug should affect all multilingual SVG that does not use English as the default.

For example,

The File: page looks OK (i.e., the image is in English), but that is because it comes from the cache.
Have Thumbor create an image unlikely in the cache, and we get the French default:

I need to change the "File:Moscow metro map multilingual future draft.svg". If you still need it please fill free to revert to yesterday's version.

I need to change the "File:Moscow metro map multilingual future draft.svg". If you still need it please fill free to revert to yesterday's version.

Go ahead and change it. There is a simpler test case now.

Another user has problems:

I suspect that Thumbor does all non-English default SVG with an English clause incorrectly. For that reason, setting high priority.

Glrx triaged this task as High priority.Sun, Sep 10, 12:33 PM

Code fix is would be related to T337139 by @hnowlan. Will comment there about the fix.

When a URL does not specify lang, then set env to {'LC_ALL': 'en'} so rsvg-convert` knows the language should be English ("en").

So i think related to the fix for T310235.

MediaWiki considers english the default, but maybe thumbor does not. So if something is set to english, mediawiki sees it equals the default and does not send the language along. Thumbor sees no language specified and assumes undefined.

So i think SvgHandler::SVG_DEFAULT_RENDER_LANG needs to be changed to "und" in order to match thumbor behaviour.

Another file showing an issue is https://commons.wikimedia.org/wiki/File:QR_Code_V4_structure_example.svg

<switch transform="translate(0,148)">
<text systemLanguage="en">3. Data and error correction keys</text>
<text systemLanguage="de">3. Fehlerkorrigierbare Daten</text>
<text>3.</text>
</switch>

Both "default language" and "English" in the language selector lead to the fallback text being shown.

So i think related to the fix for T310235.

MediaWiki considers english the default, but maybe thumbor does not. So if something is set to english, mediawiki sees it equals the default and does not send the language along. Thumbor sees no language specified and assumes undefined.

Related to T337139 (which adds lots of lang processing but still does not set default)

In the default (no lang specified) case, Thumbor does not set any language.

Previous versions of Thumbor/librsvg used the LANG environment variable (which is set in the operating environment). The update of Thumbor uses librsvg that uses LC_ALL (which is not defaulted in the operating environment) and has langtag/locale string issues. I suspect the unset LC_ALL and friends cause librsvg to do something unusual.

So i think SvgHandler::SVG_DEFAULT_RENDER_LANG needs to be changed to "und" in order to match thumbor behaviour.

I do not know the implications of such a change.

I think the straightforward fix is the new version of Thumbor should match the old version's behavior. Many files that worked before are broken now. Some are only working because good PNGs are still in the cache.

I would like to see English dropped as the MediaWiki default language, but that needs a larger discussion as it is a breaking change. Many SVG files have no default text clause. Producing no text is not a good user experience.

Re T337139. I want to keep things simple. The URL

should use

even if tlh is not available. With an explict lang, we should not check whether that lang is available.
That is also the semantics of

  • [[File:SystemLanguage.svg|lang=tlh|...]

It forces the argument.
There are many files over 256 kB that do not collect all available languages.