Page MenuHomePhabricator

SVG language "und" is confused with en
Closed, ResolvedPublic

Description

The Commons SVG language selector for default language (und) seems to always return the en result instead of the true "no-language" fallback element. The language-selection behavior on wikis without an exact match for the SVG language suffers from the same issue.

I changed the sample of File:Multilingual_SVG_example.svg to add a "no-language" entry of three hearts and three question marks (because there's no Unicode for stomach). When I told commons to render the "default language", the en result is returned. When I go to a Wikipedia site for which no translation of the image exists yet (be.wp), the same en result is still returned.

Steps to reproduce:

<text x="400" y="500">
  <tspan dy="-50">❤❤❤</tspan>
  <tspan x="400" dy="200">???</tspan>
</text>

Actual outcome:

Expected outcome:

  • The default entry without a systemLanguage (❤❤❤???) should be returned

Notes:

Event Timeline

Arthur2e5 updated the task description. (Show Details)

Hi @Arthur2e5, thanks for reporting this. Could you please always follow the scheme at https://www.mediawiki.org/wiki/How_to_report_a_bug and provide: a clear list of steps to reproduce (as a list, step by step, including full links), what you expect to happen, and what happens instead? Thanks a lot!

Aklapper changed the task status from Open to Stalled.EditedFeb 7 2021, 10:00 AM

However, as demonstrated on the commons page, forcing a non-existent language correctly returns the no-language fallback.

As expressed in my previous comment, it's unclear what exactly you expect to happen instead, and why.
As far as I know, und does not stand for Default language; it stands for Undetermined language. The edit in the SVG file did not add any und entry either.

As far as I understand, following https://www.mediawiki.org/wiki/How_to_report_a_bug :

Steps to reproduce:

<text x="400" y="500">
  <tspan dy="-50">❤❤❤</tspan>
  <tspan x="400" dy="200">???</tspan>
</text>

Actual outcome:

Expected outcome:

  • ???
Arthur2e5 updated the task description. (Show Details)
Arthur2e5 updated the task description. (Show Details)
Arthur2e5 updated the task description. (Show Details)
Arthur2e5 updated the task description. (Show Details)

Converted the additional observation to notes. Did some reading in the source, and I think it's could be a librsvg bug or a misconfiguration. Either way, this can be worked around in the environment.

  1. The rasterize() part of SvgHandler.php seems to set LANG only when there is one. That makes sense.
  2. librsvg falls back to LC_MESSAGES then LC_ALL [[https://gitlab.gnome.org/GNOME/librsvg/-/blob/2.40.21/rsvg-cond.c#L145 |when there is no LANG]]. That is not really a bug.
  3. librsvg falls back to en when the locale [[https://gitlab.gnome.org/GNOME/librsvg/-/blob/2.40.21/rsvg-cond.c#L148 |is exactly "C"]]. This is very not good. And it's fixed in their Rust version because they used a locale_config crate that doesn't do this.

The en could have come from 2 or 3 depending on the values of the LC_ env vars. We should ideally have a value that is not any real language (like en) but also not exactly C. The Debian (and RedHat) extension value called C.UTF-8 seems to be a good choice. (I check the Debian patch db, and great they didn't change the exact match.) Anyway, would some WMF engineers run locale on the prod machine to check the values?

First, the indicated SVG file, https://commons.wikimedia.org/wiki/File:Multilingual_SVG_example.svg has one switch element with several language clauses. Currently, it does not have a default final clause (i.e., a text element without a systemLanguage attribute). Consequently, if there is not an explicit langtag match, the switch element should not render any text. For the case of "(default language)", no text should be displayed.

First, consider librsvg.

Librsvg handles the switch logic correctly for simple languages (zh-hans is another story).

Librsvg can be tested by requesting PNGs.

Here an explict agent langtag of en displays English text:
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langen-512px-Multilingual_SVG_example.svg.png

Here an explicit langtag of de displays German text:
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langde-512px-Multilingual_SVG_example.svg.png

Explicit langtags that do not match any clause display no text (because there is no default text clause)

An explicit langtag of und displays no text:
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langund-512px-Multilingual_SVG_example.svg.png

An explicit langtag of tlh (Klingon) displays no text:
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langtlh-512px-Multilingual_SVG_example.svg.png

If libsrvg is not given an explicit lang token, then it defaults to en (English).
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/512px-Multilingual_SVG_example.svg.png

A couple years back there was some discussion of defaulting SVG image inclusions. IIRC, non-English wikis will examine an SVG image inclusion [[File:foo.svg]] for langtags. For example, on the de.wiki, if a de langtag is found in the SVG, then the img src attribute will have langde-. If a de langtag is not found, then it will not add a language token to the attribute. (Language fallbacks were also discussed; I do not remember if they were implemented.) That allowed multilingual SVG files to display an appropriate language without adding an explicit |lang=de to the image inclusion. The en.wiki may have different rules.

Second, consider what Commons does with the image page (File:Multilingual SVG example.svg).

The Commons page without an explicit lang
https://commons.wikimedia.org/w/index.php?title=File%3AMultilingual_SVG_example.svg
displays the image in English. Looking at the source, its img element src is
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/512px-Multilingual_SVG_example.svg.png

The Commons page with an explicit lang of de
https://commons.wikimedia.org/w/index.php?lang=de&title=File%3AMultilingual_SVG_example.svg
displays the image with German text as expected.
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langde-512px-Multilingual_SVG_example.svg.png

The Commons page with an explict lang of en
https://commons.wikimedia.org/w/index.php?lang=en&title=File%3AMultilingual_SVG_example.svg
displays the image with English text as expected.
However, despite the lang of en, the img src is the default:
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/512px-Multilingual_SVG_example.svg.png
rather than an explicit request for English.
The code may be assuming that librsvg will insert the en langtag.
(That choice is probably wrong. A wiki server in Germany probably runs in a different locale.)

When the user selects (default language) in the dropdown box, the page is reloaded with lang=und; the page URL is
https://commons.wikimedia.org/w/index.php?lang=und&title=File%3AMultilingual_SVG_example.svg
and the page displays the image with English text rather than the expected blank text.
Examining the page source shows the img used
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/512px-Multilingual_SVG_example.svg.png
That is, the page did not explicitly request a rendering with langtag und; it asked for the default language rendering (i.e., en).
The whole point of using the und langtag was to select the default clause in the switch element.
There should never be the assumption that en is the default language. Many multilingual SVG files have default languages of French or German.

Likewise, if I request Klingon
https://commons.wikimedia.org/w/index.php?lang=tlh&title=File%3AMultilingual_SVG_example.svg
the page displays English instead of no text.
The img element again uses
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/512px-Multilingual_SVG_example.svg.png
The image page did not request the Klingon version (which would have no text).

The SVG image page is not requesting the correct PNG URL.

So for dummies: (default language) in Commons should not return systemLanguage="en", but the one without any systemLanguage. So if Mediawiki sets e.g. export LANG=InvalidLanguageForFallbackLanguage for (default language) the issue would be fixed, so thats a MediaWiki-issue not a librsvg-issue?

@Glrx: Is that correct?

So for dummies: (default language) in Commons should not return systemLanguage="en", but the one without any systemLanguage.

Yes, that should be the meaning for (default language). It is the meaning that Arthur2e5 expected.

So if Mediawiki sets e.g. export LANG=InvalidLanguageForFallbackLanguage for (default language) the issue would be fixed,

Yes, but that is what MediaWiki is trying to do with the IETF langtag of und (which means "Undetermined" in the IETF registry at https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) just as you are suggesting InvalidLanguageForFallbackLanguage. I often use tlh (Klingon) to see the default clause for a switch element because I do not expect any SVG files to have Klingon text.

so that's a MediaWiki-issue not a librsvg-issue?

Yes. When the user selects (default language), the Commons page is reloaded with HTTP parameter lang=und. When lang is specified in the URL, then the page should issue a PNG URL with that language:

instead of the generic

the generic URL does not pass a language to librsvg, so librsvg will probably set the language to en and produce the English text.

So if MediaWiki issued the language-specific PNG URL, then the display would show the default (unless the SVG file actually had an unlikely systemLanguage="und" clause).

@Glrx: Is that correct?

Yes, but there are other subtle issues.

MW often believes that that en is the default langtag. There are plenty of SVG files whose default language is not English.

Another way to achieve the same goal is to have the thumbnailer/librsvg interpret the absence of a language parameter as a request for the default clause. That is, a requests for

would set the preferred language to und rather than en.

The IETF langtag should not be passed through Unix locale variables.

MW should not verify requested or available IETF langtags. If MW does not recognize a langtag in an SVG file, then MW does not put that language in the drop down box.

Considering closed (after deploy). Several other issues remain but they have separate tickets.

TheDJ closed this task as Resolved.EditedOct 24 2023, 11:05 AM
TheDJ claimed this task.

Definitely closing this now.

  1. und and en are split.
  2. when you request a language that is known to the file, that language will be used
  3. when you request a language that is not know to the file, the 'standard' thumbnail is returned (this is implicit en, as most MediaWiki has implicit en in situations without an explicit language value)
  4. If the original has explicit en defined, than that will be selected (due to the implicit en) and returned
  5. If the original does not have an explicit en, text without a language definition is returned
  6. If you explicitly want the text without a language definition, you can use the explicit language und (undetermined, represented as "default language" in the drop down menu of the File page.)