Page MenuHomePhabricator

Rendering multilingual (systemLanguage) SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10
Open, Needs TriagePublic

Description

Setup

  • MediaWiki 1.31.8 (b5f555a) 18:19, 24. Jun. 2020
  • PHP 7.3.19-1~deb10u1 (apache2handler)
  • MariaDB 10.3.23-MariaDB-0+deb10u1
  • librsvg (2.44.10-2.1) upgraded from (2.40.21-0+deb9u1)

Issue
This issue appears after upgrading from Debian 9 to Debian 10 multilingual SVG-files are no longer rendered in other languages. Apart from the upgrade of librsvg nothing else changed, i.e. neither wiki software, wiki configuration or Apache configuration.

Configuration

$wgThumbnailScriptPath = "$wgScriptPath/thumb.php";
$wgGenerateThumbnailOnParse = false;
$wgSVGConverterPath = '/usr/bin';
$wgSVGConverters = [
        'rsvg' => '$path/rsvg-convert -w $width -h $height -o $output $input'
        ];
$wgSVGConverter = 'rsvg';
$wgAllowTitlesInSVG = true;

Note that files uploaded prior to the upgrade still render multilingual.

Examples
Uploaded prior to the upgrade: https://www.jewiki.net/wiki/Datei:Helpers_industry_earns_lots_of_money_with_family_destruction.svg - working.
Uploaded after the upgrade: https://www.jewiki.net/wiki/Datei:Multilingual_SVG_example.svg - not working.

Event Timeline

Aklapper changed the task status from Open to Stalled.Aug 25 2020, 12:20 PM
Aklapper removed a project: dev-images.

@Kghbln: Could you provide a test case which is a valid SVG file? The test case linked above is invalid SVG: https://validator.w3.org/check?uri=https%3A%2F%2Fwww.jewiki.net%2Fw%2Fimages%2Fb%2Fb5%2FYou_also_can_have_no_children_AND_no_career.svg

(And I don't see how this is related to dev-images, hence removing tag.)

Kghbln changed the task status from Stalled to Open.Aug 25 2020, 1:03 PM

Could you provide a test case which is a valid SVG file? The test case linked above is invalid SVG

Sure: https://www.jewiki.net/wiki/Datei:Multilingual_SVG_example.svg This is the same file Wikimedia Commons uses to demonstrate translations of SVG files. Same result. :(

(And I don't see how this is related to dev-images, hence removing tag.)

I figured this one was about developing integration of image files. Please advise about or add an appropriate tag.

Aklapper renamed this task from Rendering multilingual SVG files fails after upgrading librsvg to Rendering multilingual SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10.Aug 25 2020, 4:39 PM
Aklapper renamed this task from Rendering multilingual SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10 to Rendering multilingual (systemLanguage) SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10.Aug 25 2020, 6:36 PM

Thanks for cross-referencing to an existing issue. Looks like MediaWiki needs to be tweaked. Let's see how it goes.

Observations.

The failure happens for simple, non-hyphenated, IETF langtags such as de. Regions are not being set. The system processes some files correctly but fails with other files. That suggests that $LANG=de is not begin expanded into a more complex langtag such as de-DE and then failing to match the available systemLanguage attributes. That files uploaded before the upgrade work sounds like a red herring; all files should be getting the same processing (assuming the cache has been cleared or aged out - ?action=purge did not change the results).

The URLs on the File: page HTML have sensible img elements with sensible src attributes that include the appropriate, dropdown-box-selected, &lang=XX argument.

  • src="/w/thumb.php?f=Helpers_industry_earns_lots_of_money_with_family_destruction.svg&width=600&lang=de"
  • src="/w/thumb.php?f=Multilingual_SVG_example.svg&width=512&lang=de"

Converting src attributes to URLs gives:

First URL gives a PNG with German text. PASS.
Second URL gives a PNG with English text. FAIL.

The image scaler is getting the correct arguments but producing the wrong result.

The SVG are

The first SVG file has multiple switch elements; the second SVG has a single switch element.

Although the number of switch elements could trigger a bug, that mechanism seems unlikely for 'librsvg`. Every switch element is being rendered (no fencepost), and they should be processed by the same code, and that code would have had the language default set before any SVG is processed. Furthermore, all labels on the pig are in German.

Both SVG files looks normal. Both SVG files have systemLanguage="de" clauses with German text.

However

That difference suggests missing a default may be the trigger.

Here are some possible mechanisms.

  1. librsvg is confused when a switch element does not have a default clause, so it renders en. That could be tested locally by running librsvg on both files. It could be tested on jewiki by adding a default clause to the switch element.
  2. the image scaler may parse the SVG before passing it to librsvg (why would it want to do that work?) and be confused by the absence of a default clause. Alternatively, the image scaler may gather all switch elements, do a faulty enumeration (miss one of them), not find the relevant language, and decide to render the image in en. That could be tested on jewiki by cutting the pig image to one switch clause and seeing if it still renders correctly.

Further confirmation could be done by adding a default clause to Datei:Multilingual_SVG_example.svg and checking if the other languages become visible.

https://www.jewiki.net/wiki/Datei:You_also_can_have_no_children_AND_no_career.svg is a failing file. Its img element src attribute is appropriate. Its SVG

has a single switch element, but that switch has a default clause.

So it looks like the number of switch elements is the trigger. Or possibly needing at least two systemLanguage attributes for eaach languge. I do not see the allowReorder attribute or the duplicated id being material.

Thank you for your observations. Admittedly I am only the reporter of the issue and have never really worked with svg-file besides uploading and using them as is without language magic and such.

However it now appears to me that the issue is the contents of the svg and not MediaWiki?

I can add that both files currently do no longer switch the language when used within the wiki with the Go button after selecting a specific language. (MW 1.31.14)

About the default clause for switch. I have no idea how to add on given the respective docu.

Note that the file on commons has apparently been fixed. This version however can no longer be uploaded due to HTML and script warnings, even with $wgVerifyMimeType = false;. Currently I have no idea how to bypass.

However it now appears to me that the issue is the contents of the svg and not MediaWiki?

No. The contents of the SVG files look OK. However, some multilingual SVG files display other languages while others do not. What is different about the working and non-working SVG files? That difference may expose the bug. You proposed that the difference was when the SVG file was uploaded. I looked for differences in the SVG files. It looks like you were right.

MediaWiki outputs img element src attributes look reasonable for both the working and non-working SVG files.

The PNG files returned by the thumbnailer / image scaler are wrong. That suggests something within the thumbnailer gets confused. For some SVG files, it produces the right result, and for other files it produces the wrong result. Files are returned that are in the wrong language.

I can add that both files currently do no longer switch the language when used within the wiki with the Go button after selecting a specific language. (MW 1.31.14)

Here are the files:

On the current site for the first file, en and de work but es does not:

I had not noticed that es was not working yesterday. The failure mechanism is even stranger. Works for English, sometimes works for Deutsch.

I can make de fail by asking for a different width:

I asked for a different size to avoid any cached copies. The request failure suggests the thumbnailer's local cache has been faking us out. It may hold a lot of old images that were correctly done by the previous thumbnailer. When the thumbnailer makes a new image, then it makes it wrong. New thumbnails are not handling the lang parameter correctly. That's could be why all the recently uploaded files show the bug: none of them were processed by the old thumbnailer.

On the current site for the second file, only en works:

About the default clause for switch. I have no idea how to add on given the respective docu.

Note that the file on commons has apparently been fixed. This version however can no longer be uploaded due to HTML and script warnings, even with $wgVerifyMimeType = false;. Currently I have no idea how to bypass.

Commons is running MW 1.37.0-wmf.1 (rMW3d331cd90884) with an older version of librsvg and the thumbnail syntax is different:

The failure to upload sounds like you are trying to upload the HTML description page rather than the actual SVG. Here's the link to the Commons SVG:

But uploading a new SVG file does not seem to be the issue.

You probably have lots of PNG files in the thumbnailer's cache. If those cached PNGs were made before the upgrade, then they processed the language correctly. If they were made after the upgrade, then they are in English no matter what the lang parameter requested. Of course, all files uploaded after the upgrade would appear in English only.

I do not know how to configure MW. The description at

does not address $lang at all. I know that resvg and batik have command line arguments for IETF langtags. IIRC, such an argument is not used but rather the LANG environment variable is set while exec'ing the thumbnailer.

So my guess at this point is the LANG environment variable does not control the language preference for librsvg (2.44.10-2.1).

Wow, you are really digging into things here. Much appreciated. You observations appear to be plausible to me. I guess once WMF moves to a newer version of "librsvg" ...

So my guess at this point is the LANG environment variable does not control the language preference for librsvg (2.44.10-2.1).

Well, we are now in upstream issue land :(

I guess once WMF moves to a newer version of "librsvg" ...

This made me search for "librsvg bugs" in Google and this is what I got:

Commons: Librsvg bugs. WMF is deliberately not upgrading and provides rationale.

@Aklapper or @Kghbln could you try executing librsvg locally on

We need to simulate what happens in rasterize with a $lang argument at

That is, set LANG environment variable to es and

  • rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg

and see if the result is in Spanish.

No matter that result, SVGHandler.php needs an upgrade because several SVG rasterizers take their $lang language preference in a command line argument rather than the LANG environment variable.

$ LANG=es rsvg-convert -w 512 -h 360 -o result.png  Multilingual_SVG_example.svg

Result with rsvg-convert version 2.50.4 (latest):

result.png (360×512 px, 15 KB)

Result with rsvg-convert version 2.44.10 (Debian stable):

result.png (360×512 px, 13 KB)

@AntiCompositeNumber Thanks.

@Kghbln Please try it on your image server. BTW, is your image server Unix? The LANG environment variable would have worked in Windows under librsvg 2.40 but not librsvg 2.44.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster
$ rsvg-convert -v
rsvg-convert version 2.44.10
$ export LANG=es
$ LANG=es rsvg-convert -w 512 -h 360 -o result.png  Multilingual_SVG_example.svg

grafik.png (361×520 px, 61 KB)

I am a bit puzzled that the results differ.

@Kghbln I'm puzzled, too.

based on https://gitlab.gnome.org/GNOME/librsvg/-/issues/356, LC_MESSAGES may have an impact. Try

$ locale

and

$ LANG=es locale

and

$ echo $LC_MESSAGES

and

$ LANG=es LC_MESSAGES=es rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg

@Kghbln I'm puzzled, too.

The result is unchanged, i.e. English only.

Perhaps the reason is that the es locale is not installed on the server. However in case this is a requirement I guess that we do not really need to continue here since I do not think that people will install locales on their server for the sake of svg rendering if they do not have to. I may be wrong though.

$ locale -a
C
C.UTF-8
en_GB.utf8
en_US.utf8
POSIX
$ locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
$ LANG=es locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=es
LANGUAGE=en_GB:en
LC_CTYPE="es"
LC_NUMERIC="es"
LC_TIME="es"
LC_COLLATE="es"
LC_MONETARY="es"
LC_MESSAGES="es"
LC_PAPER="es"
LC_NAME="es"
LC_ADDRESS="es"
LC_TELEPHONE="es"
LC_MEASUREMENT="es"
LC_IDENTIFICATION="es"
LC_ALL=
$ echo $LC_MESSAGES
$ LANG=es LC_MESSAGES=es rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg

@Kghbln Thanks for running the tests.

I hope the specific locales do not need to be installed. SVG need only do string matching; it does not need to know about currency or date formats.

I do not understand what is happening.

The failure of rsvg-convert on jewiki.net to produce de or es versions and only producing en versions would explain why the thumbnail server only produces en versions. The problem may reside with rsvg-convert.

My understanding from Gnome is the LANG environment variable sets the librsvg language preference. That puzzles me even more. Your server's default LANG should produce a langtag preference of en-GB (not en). But if that is the preference, then Multilingual_SVG_example.svg should not produce any text. Neither the jewiki.net nor commons.wikimedia.org copy of that file has a systemLanguage="en-GB" or a default clause. The switch element should not render any text, but it is rendering the systemLanguage="en" clause. Either librsvg is gettng an en default from somewhere, or it is violating SVG systemLanguage semantics.

Right now, I only see your explanation as reasonable. The locale printouts show that LANGUAGE=en_GB:en in the default and es cases.

https://superuser.com/questions/392439/lang-and-language-environment-variable-in-debian-based-systems

If librsvg uses the LANGUAGE variable to set preferences, then its :en fallback could explain the en rendering.

So one test would be to have @AntiCompositeNumber check the value of LANGUAGE on his system from each of

$ locale
$ LANG=es locale
$ LANG=als locale
$ LANG=tlh locale

That would identify how LANG affects the locale on WMF servers. I doubt those servers have installed locales for Tosk Albanian (als) or Klingon (tlh).

A test on jewiki.net could try

$ LANGUAGE=es locale
$ LANG=es LANGUAGE=es locale
$ LANG=es LANGUAGE=es rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg

to see if the LANGUAGE variable overrides. Another test would be

$ unset LANGUAGE
$ LANG=es locale
$ LANG=es rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg

If LANGUAGE is not set, then LANG may control just as on the WMF servers.

Arch laptop
$ locale                                                                                                          

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ LANG=es locale                                                                                                  
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=es
LC_CTYPE="es"
LC_NUMERIC="es"
LC_TIME="es"
LC_COLLATE="es"
LC_MONETARY="es"
LC_MESSAGES="es"
LC_PAPER="es"
LC_NAME="es"
LC_ADDRESS="es"
LC_TELEPHONE="es"
LC_MEASUREMENT="es"
LC_IDENTIFICATION="es"
LC_ALL=
$ LANG=als locale                                                                                                 
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=als
LC_CTYPE="als"
LC_NUMERIC="als"
LC_TIME="als"
LC_COLLATE="als"
LC_MONETARY="als"
LC_MESSAGES="als"
LC_PAPER="als"
LC_NAME="als"
LC_ADDRESS="als"
LC_TELEPHONE="als"
LC_MEASUREMENT="als"
LC_IDENTIFICATION="als"
LC_ALL=
$ LANG=tlh locale                                                                                                 
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=tlh
LC_CTYPE="tlh"
LC_NUMERIC="tlh"
LC_TIME="tlh"
LC_COLLATE="tlh"
LC_MONETARY="tlh"
LC_MESSAGES="tlh"
LC_PAPER="tlh"
LC_NAME="tlh"
LC_ADDRESS="tlh"
LC_TELEPHONE="tlh"
LC_MEASUREMENT="tlh"
LC_IDENTIFICATION="tlh"
LC_ALL=
$ cat /etc/os-release                                                                                             
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux
Debian Buster docker container
$ docker run -v ~/transfer:/srv -it debian                                                                        
root@11699823e6ea:/# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
root@11699823e6ea:/# LANG=es locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=es
LANGUAGE=
LC_CTYPE="es"
LC_NUMERIC="es"
LC_TIME="es"
LC_COLLATE="es"
LC_MONETARY="es"
LC_MESSAGES="es"
LC_PAPER="es"
LC_NAME="es"
LC_ADDRESS="es"
LC_TELEPHONE="es"
LC_MEASUREMENT="es"
LC_IDENTIFICATION="es"
LC_ALL=
root@11699823e6ea:/# LANG=als locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=als
LANGUAGE=
LC_CTYPE="als"
LC_NUMERIC="als"
LC_TIME="als"
LC_COLLATE="als"
LC_MONETARY="als"
LC_MESSAGES="als"
LC_PAPER="als"
LC_NAME="als"
LC_ADDRESS="als"
LC_TELEPHONE="als"
LC_MEASUREMENT="als"
LC_IDENTIFICATION="als"
LC_ALL=
root@11699823e6ea:/# LANG=tlh locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=tlh
LANGUAGE=
LC_CTYPE="tlh"
LC_NUMERIC="tlh"
LC_TIME="tlh"
LC_COLLATE="tlh"
LC_MONETARY="tlh"
LC_MESSAGES="tlh"
LC_PAPER="tlh"
LC_NAME="tlh"
LC_ADDRESS="tlh"
LC_TELEPHONE="tlh"
LC_MEASUREMENT="tlh"
LC_IDENTIFICATION="tlh"
LC_ALL=
root@11699823e6ea:/# cat /proc/version
Linux version 5.11.16-zen1-1-zen (linux-zen@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.36.1) #1 ZEN SMP PREEMPT Wed, 21 Apr 2021 17:22:09 +0000            
root@11699823e6ea:/# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@11699823e6ea:/# 

@AntiCompositeNumber Thanks. Your systems are not setting LANGUAGE. I'm hoping that LANGUAGE being set on jewiki.net explains the behavior.

Yup, just to confirm that:

$ LANGUAGE=en LANG=es rsvg-convert -w 512 -h 360 -o result-2.png  Multilingual_SVG_example.svg

result-2.png (360×512 px, 15 KB)

$ LANGUAGE=es locale
LANG=en_GB.UTF-8
LANGUAGE=es
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
$ LANG=es LANGUAGE=es locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=es
LANGUAGE=es
LC_CTYPE="es"
LC_NUMERIC="es"
LC_TIME="es"
LC_COLLATE="es"
LC_MONETARY="es"
LC_MESSAGES="es"
LC_PAPER="es"
LC_NAME="es"
LC_ADDRESS="es"
LC_TELEPHONE="es"
LC_MEASUREMENT="es"
LC_IDENTIFICATION="es"
LC_ALL=
$ LANG=es LANGUAGE=es rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg
$ unset LANGUAGE
$ LANG=es locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=es
LANGUAGE=
LC_CTYPE="es"
LC_NUMERIC="es"
LC_TIME="es"
LC_COLLATE="es"
LC_MONETARY="es"
LC_MESSAGES="es"
LC_PAPER="es"
LC_NAME="es"
LC_ADDRESS="es"
LC_TELEPHONE="es"
LC_MEASUREMENT="es"
LC_IDENTIFICATION="es"
LC_ALL=
$ LANG=es rsvg-convert -w 512 -h 360 -o result2.png Multilingual_SVG_example.svg

In both cases I get the en version "Love ..." I am using this file:

Well, I was hoping for confirmation rather than another setback.

I'm still betting that the new rsvg-convert is the culprit. Somehow, it believes en is a preferred language.

Out of desperation, I might try

  • LC_ALL=es LC_MESSAGES=es LANG=es LANGUAGE=es rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg
  • LC_ALL= LC_MESSAGES= LANG=es LANGUAGE= rsvg-convert -w 512 -h 360 -o result.png Multilingual_SVG_example.svg

but I would not expect them to work.

Another step would be to find where the system set its LANGUAGE environment variable and comment it out; there may be some other relevant settings nearby.

Revert to librsvg (2.40.x). That should restore simple languages (e.g., de and es but not zh-Hant). That would confirm librsvg (2.44.x) is the culprit and restore previous functionality.

could you try executing librsvg locally

$:acko\> rpm -q librsvg2
librsvg2-2.50.3-1.fc33.x86_64
$:acko\> wget -q https://upload.wikimedia.org/wikipedia/commons/1/1e/Multilingual_SVG_example.svg
$:acko\> LANG=es_ES.utf8 rsvg-convert -w 512 -h 360 -o result.png  Multilingual_SVG_example.svg

and result.png is displayed with Spanish text.

could you try executing librsvg locally

$:acko\> rpm -q librsvg2
librsvg2-2.50.3-1.fc33.x86_64
$:acko\> wget -q https://upload.wikimedia.org/wikipedia/commons/1/1e/Multilingual_SVG_example.svg
$:acko\> LANG=es_ES.utf8 rsvg-convert -w 512 -h 360 -o result.png  Multilingual_SVG_example.svg

and result.png is displayed with Spanish text.

LANG=es_ES.utf8 should morph into the langtag es-ES (Spanish as spoken in Spain). The langtag in the SVG file is just es (generic Spanish). If the user preference is es-ES, then the SVG agent may not display something that is just es.

I am sorry that I have kept you waiting. I was consumed by other stuff. :(

Out of desperation, I might try ... but I would not expect them to work.

I have, hopefully, good news here. Probably I used the wrong originating file for testing and converting thus messing up the situation even more. With

$ wget -q https://upload.wikimedia.org/wikipedia/commons/1/1e/Multilingual_SVG_example.svg

I now get the follwing results:

$ LC_ALL=es LC_MESSAGES=es LANG=es LANGUAGE=es rsvg-convert -w 512 -h 360 -o result-0.png Multilingual_SVG_example.svg
$ LC_ALL= LC_MESSAGES= LANG=es LANGUAGE= rsvg-convert -w 512 -h 360 -o result-1.png Multilingual_SVG_example.svg
$ LANG=es LANGUAGE=es rsvg-convert -w 512 -h 360 -o result-2.png Multilingual_SVG_example.svg

These three render in Spanish.

$ LANG=es rsvg-convert -w 512 -h 360 -o result-3.png Multilingual_SVG_example.svg

This one renders in English.

Hopefully this allows for a break through in detecting the issue and a possible solution.

Here's my take.

Somewhere, you have configured jewiki to set LANG=en_GB.UTF-8 and LANGUAGE=en_GB:en. Those are not unreasonable settings, and I would expect other users to configure their systems for their native languages.

Now, LANG is a common setting, and LANGUAGE is a GNU extension. The GNU extension has become popular because it offers sensible fallbacks. Consequently, subroutine libraries now look to both those environment variables when they try to figure out which languages to use.

The older librsvg only looked at LANG.

I expect that the new librsvg looks at LANGUAGE first. If that variable is set, then it uses those languages. If it is not set, then LANG is consulted. MW sets LANG but it leaves LANGUAGE intact.

If you look above, AntiCompositeNumber's two systems did not set LANGUAGE, so the new librsvg used LANG and got the right result. AntiCompositeNumber also did a test that showed LANGUAGE=en took precedence over LANG=es.

When your system executed locale, it showed LANGUAGE was specifying English and was not changed when LANG was changed. As currently configured, your librsvg thumbnailer will only produce English.

I do not know how to explain your results from T261192#7050538, but since your results are different today, I won't bother to try.

The workaround for your system is to comment out the LANGUAGE setting for the server processes. I do not know how that would be done.

@Gilles

The trivial fix for MediaWiki is to set the LANG and LANGUAGE variables at line 351:

Thumbor should do the same at line 54:

That should provide better functionality.

The better fix is using rasterizers that take $lang via a command line argument. Passing two or three letter langtags via locale environment variables works, but passing hyphenated langtags such as zh-Hant will probably be trouble. MW needs to use rasterizers that take IETF langtags directly rather than sneaking them in through locale strings.

Thanks again for your elaborate analysis of the situation. Really impressive and appreciated.

I do not know how to explain your results from T261192#7050538, but since your results are different today, I won't bother to try.

Yeah, let's forget about this. If it is an issue it is perhaps a specific one to the server which came up all the way from Debian 6. You can never know about side-effects.

JoKalliauer subscribed.

@Glrx: Sorry I do not understand language-interaction between MediaWiki and librsvg. Is it correct it is reported upstream at:

@Glrx: Sorry I do not understand language-interaction between MediaWiki and librsvg. Is it correct it is reported upstream at:

The topic is muddled.

In a simple view, this phabricator issue is not an Upstream issue. It may be a misconfiguration of the thumbnail server (it should not have set the LANGUAGE environment variable). Alternatively, it is the failure of MediaWiki to set the LANGUAGE environment variable to es. MediaWiki needs to update its interaction with librsvg.

In a more complicated view, there are two options to fix systemLanguage issues. In the first, librsvg should set its preference with an IETF langtag argument rather than environment variables (a need for an Upstream feature). In the second, MediaWiki should localize an SVG file before passing it to librsvg (a MediaWiki only fix).

The more complicated view may be outside this phabricator issue's charter. Maybe there should be new tickets, or maybe the issues reside in T125710 or T40010.


There are problems upstream in librsvg. The issues you cite have been fixed in librsvg 2.44, but using librsvg 2.44 or later does not mean the MediaWiki-librsvg language interaction will work. I expect simple langtags such as de, en, and es will continue to work. Hyphenated langtags (such as zh-Hans and sr-EL) that displayed the right language but the wrong script in the past may no longer display the right language after upgrading.

librsvg 2.40 is broken: it only matches the first subtag (the part before the first hyphen, e.g. en) of an IETF langtag (e.g., en-GB).

Issue #131 (closed) is really about the first subtag match problem: it wanted librsvg to distinguish en-GB from en-US. In testing, the OP tried feeding locale strings such as en_GB to librsvg with no good results. Consequently, the issue wanted Unix locale strings to be translated to IETF langtags and permissive matching. Garden path.

Issue #256 (closed) supposedly fixed the matching problem by using a locale library. At the same time, the locale library would translate some locale strings into IETF langtags. Consequently, when librsvg 2.44 starts, it looks at locale environment variables to guess the user's IETF language preferences.

MediaWiki cannot use librsvg 2.44 and newer.

This phabricator issue, T261192, shows that librsvg 2.44 looks at more than just the LANG environment variable to determine IETF language preferences. MediaWiki needs to set other environment variables such as LANGUAGE to make sure its preference is accepted. That fix is not complete because it still does not get around the problem of MediaWiki stuffing an IETF langtag into a Unix environment variable that expects a Unix locale string. MediaWiki does not do the type conversion, and it is likely that the type conversion is futile. It also does not address the problem on non-Unix systems. librsvg 2.40 works on non-Unix systems because it just grabbed the environment variable LANG as an IETF langtag. librsvg 2.44 consults a library for the current locale; on other operating systems, environment variables do not set the the locale.

The issue #256 fix is too permissive for strict SVG agents. I believe setting LANG=en_GB sets the librsvg language preference to en, en-GB rather than the strict en-GB. The permissive setting is a reasonable assumption by librsvg , but it is not the result that MediaWiki expects.

MediaWiki normally would not set LANG=en_GB; instead, it would set LANG=en-GB. I'm not sure how librsvg 2.44 processes such an environment variable, but I do not expect it to do it correctly. There's a type error: on Unix, LANG is supposed to be an opaque Unix locale string type rather than an IETF langtag type. The Unix locale libraries do not expect IETF langtags in the environment variables.

Unix locale strings and IETF langtags do not have a one-to-one mapping. I'm not sure, but librsvg 2.44 may not translate LANG=zh_HANS into the zh-Hans langtag. In addition, SVG allows ill-formed IETF langtags. That means that MediaWiki can use the improper sr-EC and sr-EL langtags instead of the proper sr-Cryl and sr-Latn langtags. Some subroutine libraries will validate and object to improper langtags: sr-EC (Serbian as spoken in Ecuador) may pass, but sr-EL will fail because there is no EL region. It may be that libsrvg 2.44 will object.

MediaWiki does its own type checking: in at least one place, acceptable SVG langtags are kicked out if they are not also MediaWiki language identifiers.

Furthermore, SVG Translate (which is not part of MediaWiki) now violates the SVG spec by setting systemLanguage attributes to Unix locale strings. Instead of zh-Hans, it will use zh_Hans; instead of ku-Arab, it will use ku_Arab; instead of sr-Latn, it will use sr_Latn. See T271000. I think that hack gets around the librsvg 2.40 hyphen matching problem: it tricks librsvg 2.40 into displaying the expected text. The workaround will probably fail in librsvg 2.44 and other SVG agents (ku or ku-Arab will not match systemLanguage="ku_Arab").

The more significant GNOME issue is

which seeks a way of setting user preference directly with IETF langtags. That's the feature that MediaWiki wants, and it is a feature that batik and resvg already have. Such a method would avoid any (langtag) to (locale string) to (langtag) conversion losses. MediaWiki and Thumbor would need updates.

MediaWiki should not set IETF language preferences through locale environment variables.

librsvg would like to make it Wikimedia easy to have localized SVGs. (ref)

So the librsvg-devolper asked to point Wikimedia People to librsvg#735 RFC: meta-issue for localized SVGs and and tell that comments there are welcome. The developer would like to know our opinion on how to do localized SVGs.

librsvg would like to make it Wikimedia easy to have localized SVGs. (ref)

So the librsvg-developer asked to point Wikimedia People to librsvg#735 RFC: meta-issue for localized SVGs and and tell that comments there are welcome. The developer would like to know our opinion on how to do localized SVGs.

Feel free to copy this to GNOME issues.

From GNOME #735:

Historical note: librsvg used to do very simple string-based matching from the languages obtained through Unix environment variables, to the systemLanguage attribute. When it switched to a proper parser for BCP47 language tags, it got more strict.

SVG 1.0, 1.1, and 2.0 require simple string-based matching. See https://svgwg.org/svg2-draft/struct.html#SystemLanguageAttribute which describes systemLanguage evaluation:

  • Evaluates to "true" if one of the language tags indicated by user preferences is a case-insensitive match of one of the language tags given in the value of this parameter, or if one of the language tags indicated by user preferences is a case-insensitive prefix of one of the language tags given in the value of this parameter such that the first tag character following the prefix is "-".
  • Evaluates to "false" otherwise.

The best thing would be librsvg has a command line argument for an HTTP Accept-Languages header string. Looking down the road, such an input is required for SMIL allowReorder processing that is in SVG 2.0. Major browsers support allowReorder processing. See https://svgwg.org/svg2-draft/struct.html#SwitchElement .

MediaWiki will specify only one langtag in its calls to librsvg for the foreseeable future. MediaWiki caches its images, and it does not want to build unnecessary PNGs. There is no reason to build and cache an es.PNG if no es langtags are present in the SVG.

librsvg should NEVER treat systemLanguage="es_MX" as if it were systemLanguage="es-MX". Such an interpretation violates SVG 1.0, 1.1, and 2.0 matching rules. Any Commons software (no names, but its initials are SVG Translate) that generates langtags with underscores violates the SVG specification and should be fixed; that's on us. Any Commons files that have underscore langtags are invalid SVG; those bad files are on us. It is pointless to have librsvg fudge an SVG file when Chrome and Firefox will not; that will just confuse users. A Commons robot can find Commons SVG files with underscore langtags and fix them.

CSS is much less important, but SVG 1.1 requires a CSS 2 parser, and that means :lang pseudo selectors, attribute selectors, at-sign rules, and media attributes on the style elements. Commons files are not using these features at present because librsvg 2.40 did not support them. Some CJK Unicode characters need font selection to draw the glyphs correctly; see https://en.wikipedia.org/wiki/CJK_Unified_Ideographs and https://en.wikipedia.org/wiki/File:CJKV_variant_glyphs.png (which shows one Unicode character with 5 variants). CSS rules can be used to select fonts within a switch that has text children:

text:lang(ja), text[systemLanguage|="ja"] { font-family: Japanese Font; }
text:lang(ko), text[systemLanguage|="ko"] { font-family: Korean Font; }
text:lang(zh-Hans), text[systemLanguage|="zh-Hans"] { font-family: Simplified Chinese Font; }
text:lang(zh-Hant), text[systemLanguage|="zh-Hant"] { font-family: Traditional Chinese Font; }

it would be nice if this option were available, but it is not important right now.

librsvg 2.52.x will have a new --accept-language parameter, which will allow to specify the user's preferred languages by passing the HTTP Accept-Language header to librsvg: https://gitlab.gnome.org/GNOME/librsvg/-/issues/356 (Not sure if it will get backported to the 2.50.x series)

librsvg 2.52.x will have a new --accept-language parameter, which will allow to specify the user's preferred languages by passing the HTTP Accept-Language header to librsvg: https://gitlab.gnome.org/GNOME/librsvg/-/issues/356 (Not sure if it will get backported to the 2.50.x series)

The change implies MW and Thumbor need updates so they pass $lang through command line arguments.

The simple view is that SVGHandler.php

should generalize pattern variable substitution at line 339 to include $lang.

That way, $wgSVGConverters could include a $lang parameter to set the command line argument.

That, in turn, means that $lang should not default to false but rather to an explicit langtag. That requires some thought. A simple view is the default would be en (English is the default everywhere). Choosing the default to be und (trying to force the default clause) is an interesting option, but it might break some files (no text displays where English displayed before).

If the pattern string has $lang, then there is no reason to set the LANG environment variable at line 351.

Thumbor also needs a change, but that change is much simpler.

brion subscribed.

Claiming this, taking a quick look at current status. Worst case this should be doable with a small patch to librsvg's C API and rsvg-convert.

bvibber subscribed.

unclaiming, if anybody wants this cookie. forgot this was on my plate!