Page MenuHomePhabricator

When requesting undeterimined language of an SVG, English language is used instead
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
The File page displays showing the en IETF langtag.

That is, the file page uses this URL:

That URL does not specify the default language. In fact, it selects a rendering for the en langtag.

What should have happened instead?:
The File page should have shown "other"

That is, the File page should have used this URL:

That suggests MW has rejected the request to display the und langtag. Instead, it uses a default en rendering.

MW should always accept the und langtag.

In addition, explicitly demanding tlh also displays en rather than the tlh:

A File page with an explicit lang URL parameter should force the display of that langtag even if the SVG file does not support that langtag.

Event Timeline

Umherirrender subscribed.

It is not about the drop down box. It is about the rendered content of the file/thumb

Umherirrender renamed this task from File page does not display requested language to File is not rendered in requested language.Jun 9 2022, 9:18 PM

A File page with an explicit lang URL parameter should force the display of that langtag even if the SVG file does not support that langtag.

The code suggest do only show supported languages of that file.
See also https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/6ec13b0e2f298d3e6c2f7fc89fe0f14f1b44b986/includes/media/SvgHandler.php#110

The fallback when lang is missing is the content language, on commons that is en (in ImagePage::getLanguageForRendering)

TheDJ subscribed.

I have a fix for this in the works.
Also i'm splitting this up into more specific problems

TheDJ renamed this task from File is not rendered in requested language to When requesting undefined language of an SVG, English language is used instead.Jun 11 2022, 1:27 PM
TheDJ triaged this task as Low priority.
TheDJ renamed this task from When requesting undefined language of an SVG, English language is used instead to When requesting undeterimined language of an SVG, English language is used instead.Jun 11 2022, 1:32 PM

Change 804702 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/core@master] Support requesting undetermined SVG language

https://gerrit.wikimedia.org/r/804702

I love this SvgHandler.php change

	/**
	 * What language to render file in if none selected
	 *
	 * @param File $file Language code
	 * @return string
	 */
	public function getDefaultRenderLanguage( File $file ) {
		return 'und';
	}

And I would lobby to change Commons default en renderings to be default und renderings instead. That would get rid of parochial English on Commons. It would make SVG files that were originally authored in German or Spanish show up in German or Spanish.

Image URLs with .../300px... would be rendered in und rather than en.

The und approach is a lot cleaner.

However, it is a breaking change. Unfortunately, there are screwy SVG files out there that do something like

<switch>
  <text systemLanguage="en">English word</text>
  <text systemLanguage="de">deutsches Wort</text>
</switch>
`

Rendering that file in und would produce no text at all.

Cached SVG files would need to be rerendered in und.

I also believe there is a lot of MW code that just assumes the default is en.

Unfortunately, the current Commons standard is to default them to English.

Change 804702 merged by jenkins-bot:

[mediawiki/core@master] Support requesting undetermined SVG language

https://gerrit.wikimedia.org/r/804702

In looking at SVGHandler.php around line 570, I see a separate issue

	public function parseParamString( $str ) {
		$m = false;
		if ( preg_match( '/^lang([a-z]+(?:-[a-z]+)*)-(\d+)px$/i', $str, $m ) ) {
			return [ 'width' => array_pop( $m ), 'lang' => $m[1] ];
		}
		if ( preg_match( '/^(\d+)px$/', $str, $m ) ) {
			return [ 'width' => $m[1], 'lang' => self::SVG_DEFAULT_RENDER_LANG ];
		}
		return false;
	}

It looks like the first preg_match() is case insensitive. Consequently, it will match

  • langzh-hant-300px
  • langzh-Hant-300px
  • LANGzH-HaNt-300PX

To me, only the first example should be a proper match. Accepting CaMelCaSe varieties could also defeat some caching matches. When MW emits an image URL, this portion of the URL should always be lowercase.

Note also that 300PX will not be matched by the second preg_match().

Color me surprised: Thumbor accepts camel case:

Is there a specification for the URL?

If Thumbor is right, then the second preg_match() should be case insensitive.. Or, even better, just lowercase $str before doing any matches so the return $m[1] is lower case.

Well, for me, that portion of the URL should be lowercase only; any upper case characters should get a rejection. That would prevent caching equivalent URLs at both the client or the server. That is [[File:Foo.svg|lang=ZH-Hans|300px|...]] should always emit a lowercase /langzh-hans-300px/.

Ah! Looking at another patch, and the case insensitive match has been removed. Great.

TheDJ claimed this task.