Page MenuHomePhabricator

German translation of CC radio button texts in UploadWizard display "<!--$2-->" at the end
Closed, ResolvedPublic

Description

Uploading a file found on the internet with the proper license (CC) lets me select one of the following choices (in German, don't know whether this is a general problem or lang specific). Looks as if some html comment slipped through and / or parameter $2 is not set properly. (when uploading a set of files all with the same license information, step 2)

Nicht alle „Creative Commons“-Lizenzen sind für diese Website geeignet. Es muss sichergestellt werden, dass der Urheberrechtsinhaber eine dieser Lizenzen nutzte.
''Creative Commons'' „Namensnennung – Weitergabe unter gleichen Bedingungen 4.0“ ([creativecommons.org/licenses/by-sa/4.0/deed.de Text der Lizenz]<!--$2-->)
''Creative Commons'' „Namensnennung – Weitergabe unter gleichen Bedingungen 3.0“ ([
creativecommons.org/licenses/by-sa/3.0/deed.de Text der Lizenz]<!--$2-->)
''Creative Commons'' „Namensnennung – Weitergabe unter gleichen Bedingungen 2.5“ ([creativecommons.org/licenses/by-sa/2.5/deed.de Text der Lizenz]<!--$2-->)
''Creative Commons'' „Namensnennung 4.0“ ([
creativecommons.org/licenses/by/4.0/deed.de Text der Lizenz]<!--$2-->)
''Creative Commons'' „Namensnennung 3.0“ ([creativecommons.org/licenses/by/3.0/deed.de Text der Lizenz]<!--$2-->)
''Creative Commons'' „Namensnennung 2.5“ ([
creativecommons.org/licenses/by/2.5/deed.de Text der Lizenz]<!--$2-->)
''Creative Commons'' „CC0 1.0 Universal“ (alle Rechte werden freigegeben, analog der Gemeinfreiheit: [//creativecommons.org/publicdomain/zero/1.0/deed.de Text der Erklärung]<!--$2-->)

Event Timeline

Restricted Application added a project: Multimedia. · View Herald TranscriptMar 3 2018, 10:49 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Herzi.Pinki updated the task description. (Show Details)Mar 3 2018, 10:51 AM
Aklapper closed this task as Invalid.Mar 3 2018, 3:38 PM
Aklapper added a subscriber: Verdy_p.

Thanks for reporting this. I could reproduce this on https://test.wikipedia.org/wiki/Special:UploadWizard?uselang=de but not in English.
Looks like @Verdy_p's changes to German translations made this problem appear: https://translatewiki.net/w/i.php?title=MediaWiki%3AMwe-upwiz-license-cc-by-sa-4.0%2Fde&type=revision&diff=7889413&oldid=7544561

Closing this task as invalid as this looks like a translation to fix on translatewiki.net, not something in the code base itself.

Aklapper renamed this task from UW: CC radio button texts are messed up to German translation of CC radio button texts in UploadWizard display "<!--$2-->" at the end.Mar 3 2018, 3:39 PM

This is definitely a bug of the beta upload wizard which does not parse the basic wiki syntax correctly (including core HTML for comments).
For some unknown reasons, it parses it incompletely (to find links between [bracket]) then HTMLizes everything else (meaning that basic HTML needed for some translations will never render correctly.
This resource is to be valid Wikitext and usable on all wikis (not jsut this beta version of the Upload tool currently now deployed only in Commons)

This new UploadTool version then has a bug when converting valid wikicode to HTML, and it bypasses the normal wiki parser by using its own incomplete parser for find links.
It's now IMPOSSIBLE to translate this message and also replace the URL while at the same time removing "$2" in the resource, because when you validate it, TranslateWiki.net will mark the resource imemdiately as fuzzy, will export it only within the optional list of "fuzzy resources" that are ignored by import tools each time there's any error (e.g. unclosed brackets or parentheses).

If you choose to use the fuzzy resources, you're exposed to more bugs. As long as these resources are fuzzy on Translatewiki.net, we cannot progress at all and these resources will constantly reappear in Translatewiki.net statistics as still needing to be translated (with the yellow bar saying there's an error due to the missing "$2" which is still required).

If you want to have strings without $2 considered valid (not fuzzy), you absolutely need to inform Translatewiki.net about placeholders that are optional (because they can be discarded completely from the translated string), without disabling completely the "fuzzy checker" which will still validate the format (e.g. unclosed brackets in this message. And really you should add some basic code in your own parser code to discard HTML comments (a single regexp replace can do that automatically if you want to use your own parser).

Verdy_p reopened this task as Open.Mar 3 2018, 4:29 PM

I see. To summarize that long explanation:
The translation of the message requires $2 to be present to validate on translatewiki.net, your workaround is to put $2 into a comment <!--$2-->, andthat comment is displayed in the UI which is not expected.
(I have no idea what "new UploadTool version" you refer to and I don't have any "own parser code" either here. Really.)

Verdy_p added a comment.EditedMar 3 2018, 5:37 PM

You necessarily use some forme of parsing of the string to change the wiki notation of external links (between [] brackets) into plain HTML links.

You also seems to "HTML-ize" all other characters as if they were plain text (even when it is HTML syntax). This transforms the < and > used for all HTML tags into visible characters. You should not do that this way, and it least you should :

  • replace with a regexp like /<!--.*-->/ (or similar) all HTML comments by empty strings to discard them. As demosntrated in the first workaround for the problem, your code was just incorrectly discarding HTML comments, only the <!-- and --> by canceling them separately, leaving their content intact.
  • keep some safe inline HTML tags (notably <sup>...</sup> needed for some languages, or possibly <br />) and HTML characters entities (like &nbsp;) which may be needed to enter some characters difficult to input (e.g. joiner controls in Indic languages) by keeping them unchanged. Basically we can assume that all HTML present in TranslateWiki.net resources are safe, but you may wonder what would happen if someone used unsafe HTML attributes like scriptable events (onload="javascript:..." and similar). Mediawiki's own parser does this check by default.

If you don't want to change your parser, then ask translators to NOT replace $2 at all, but use a separate translatable entry for the URL itself. Then pass this translated URL to replace (in your code) this translated URL where there's a $2 in the resource (this way, you don't need to instruct TranslateWiki.net : the $2 placeholder must still be present, otherwise it will be fuzzy and you MUST ignore fuzzy resources exported by TranslateWiki.net.

So yes there's a bug in your code (or how you have instructed TranslateWiki.net to allow some placeholders to be optional even if it is present in the English resource) and this is not a problem of TranslateWiki.net.

For now your code unconditionnally replaces any $2 found in the translated string (e.g. in the "original" English shown on Translatewiki.net) by a static (non-localized) URL, or just does not use at all the English translation from Translatewiki.net and just uses its own internal built-in string.

And please do not make your code depend on fuzzy translations: they should ALL be ignored by your app, they are fuzzy for various other reasons then just this problem and you need to find a way to instructe TranslateWiki.net correctly about how it detects/parses the placeholders in source text, and how it then requires their presence in the translated items (possibly with some allowed changes, such as changing "%.2lf" placeholders for currency amounts in C/C++ into "%.0lf" in some languages not using decimal places, or needing more precision: TrranslateWiki.net provides this by an additional properties file you need to submit to qualify the set of translatable units for your project), in order to validate them (not make them fuzzy when edited and submitted): a possible error here the total absence of [brackets] or mismatched brackets pairs, or other restrictions that you provide to TranslateWiki.net with meta-data for each translatable unit.

Probably true, but again, it's not "my code".

"Your" code is the code that "you" want to support, even if it was written initally by someone else. It does not mean it belongs only to you or the initial author of course.
We must find a way to solve this recurring problem. There's an unseen bug, and it's definitely not in TranslateWiki.net but on incorrect assumptions about what Translatewiki.net does or does not perform or how it works. The initial developer was not aware of such caveat. This code was never tested correctly before it was deployed in January on Commons.
(and the initial solution that worked on Commons before January used HTML comments without problems).
It's not something new because the problem was already signaled multiple times last year: the project was incorrectly prepared for translatability on Translatewiki.net and it broke Commons only when the new UploadWizard was deployed there without proper tests with translations (it was visibly tested only in English).

So please don't write "your code". It is as much your code as my code if you want to see something supported. Criticize ideas, not people, basically.

Verdy_p added a comment.EditedMar 3 2018, 6:55 PM

This has never been a critic against people, but statements only about the code itself (there's no assumption at all with the addition of the adjective "your", this is not personal offense but means what it means: what "you" support)

But it was a critic against facts on the code that have been denied repeatedly by those supporting this code (including you) without seeing that there was really a bug, and then constantly rejecting it as invalid (it was also rejected too as invalid on TranslateWiki.net, we cannot progress at all if everyones rejects the fault to others and don't want to correct it, even it has been signaled since long, repeatedly, and multiple workaround have been proposed, then rejected, because they repeatedly denied this was a bug they did not understand, even after demosntrating it, and trying to explain it in various ways...)

So I also criticize the fact that here, bugs are too urgently marked as rejected/invalid: the reviewers doing that do not understand the problem or do not want to really see it. This is a problem of methods. There's a large amount of bugs here on Phabricators that were urgently marked as "invalid" without being instpected at all, and that later proved being correct (when a developer finally see it himself, and corrected it... silently... without even reporting that they had previously incorrectly marked the bug as invalid !)

Basically many developers connected on Phabricator only trust themselves and don't believe they can be incorrect, if this is not another developer they trust that signals it.
These early rejections of bugs are a loss of time for everyone and irritates those that wanted to signal it to help improve the code where it does not work as intended.

I know incorrect rejections are irritating & a loss of time, but there's no ill will, just everyone trying their best to keep the backlog manageable.
It happens that we take a look at something and fail to realize exactly what happens and mislabel it as invalid...

I just started to take a quick look, but I'm not sure I understand exactly what is going on just yet.
The relevant code in UploadWizard seems to have remained unchanged since September 2014.
I can reproduce the problem (jQueryMsg is throwing an error during its attempt to parse the message), but I'm not too familiar with how the i18n messages are parsed and I also couldn't find a relevant change in the last few months there.

@Verdy_p do you have any insight on what exactly changed in January that seems to have caused a regression? (links to specific patches, or other bug reports welcome)
You mentioned this has been signaled before, and that a bug was filed on translatewiki as well - any chance you could link to those discussions for more context?

It seems that what changed was in the .msg() method used in that code, which enforced the HTML-ization of the message (possibly because of security against malicious injection of malicious active HTML, including javascripts and events running in the user agent). Here the parameter of msg is directly the message coming from Translatewiki.net.

I've not investigated how other translatable msg() possibly containing HTML are handled elsewhere, but I suppose that it the translatewiki.net message is not used as is without first parsing it to avoid the HTML-ization (here the conversions of any < or >' into &lt; and &gt`, or that other code jsut uses this .msg() function to build other MediaWiki-formatted content which is finally converted to HTML using a Mediawiki parser at end. But here you are building plain HTML.

That code may have not changed here since 2014, but there are depencies and this single line of code is not alone. There may be an missing parameter to the invokation of the .msg() method to select the behavior (does it have to return Mediawiki code, or plain-restricted HTML with the plain-text message automatically "HTML-ized" ?).

Verdy_p added a comment.EditedMar 5 2018, 3:04 PM

Initially I thought that the code was running on the server, but I see here that this code is implemented on the client side by loaded javascript, and it does not have any MediaWiki parser builtin.
From what I see, the generation of the content in Javascript is using jQuery to generate HTML tags, and all other text elements are "HTMLized" by the javascript msg() method (which seems then to take the message from the server by querying it for the content of "Mediawiki:message/*".

In the previous version, these messages were apprently processed by the server which was using the MediaWiki parser built in the server, but this is apparently not the case here.

Most probably be the ".msg()" Javascript method was changed in the version deployed in January compared to the previous version: where is that method implemented ? Is it really part of jQuery? or a patch in jQuery to handle specific parsing of resources (loaded by querying the server) was modified.

Resources in "MediaWiki:message/*" have always been supposed to be MediaWiki code (wikitext). The .msg() method seems also to be performing itself the substitution of $n placeholders by one of the additional parameters indicated at end of the .msg() parameters.

I note that there's been significant changes in last December for the scripts in
https://github.com/wikimedia/mediawiki-extensions-UploadWizard/blob/master/resources/

Interesting code also in:
https://github.com/wikimedia/mediawiki-extensions-UploadWizard/blob/master/resources/mw.Escaper.js

Here again this is a custom parser (a limited version of the MediaWiki syntax parser)


Analyzing further, it seems that the javascript does not actually query the server, it only uses the strings loaded from TranslateWiki.net into:

https://github.com/wikimedia/mediawiki-extensions-UploadWizard/tree/master/i18n

This must then be a bug in the "import tool/bot" used to update these JSON files from Translatewiki.net exports. But I can't locate the source of this import bot (there's no README file in this folder, and no significant comments when the bot updates these JSON files to help see who was doing that, from where, and with which code)

I suspect this is the (private?) code used by "jenkins-bot", which does not correctly parse the Mediawiki resources from TranslateWiki.net to convert them to JSON.

Change 416629 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/UploadWizard@master] mw.UploadWizardLicenseInput: Fix generating language-specific links

https://gerrit.wikimedia.org/r/416629

Turns out UploadWizard was already supposed to generate language-specific license links, but this functionality was broken. My patch above fixes it.

@Verdy_p Please revert all of the messages you changed to just use the '$2' parameter for the link instead of a custom URL. This hack will no longer be needed.

Change 416629 merged by jenkins-bot:
[mediawiki/extensions/UploadWizard@master] mw.UploadWizardLicenseInput: Fix generating language-specific links

https://gerrit.wikimedia.org/r/416629

matmarex closed this task as Resolved.Mar 6 2018, 4:59 PM
matmarex claimed this task.
matmarex removed a project: Patch-For-Review.

I fixed the affected messages as well: https://translatewiki.net/w/i.php?limit=19&title=Special%3AContributions&target=Matma+Rex&start=2018-03-06&end=2018-03-06 These changes should be included in the next localisation export (they usually happen daily).

German was the only language with this workaround.

I suppose my patch to not treat missing $2 as a problem can be reverted now too? Are message docs updated to say that the url is localised?

Verdy_p added a comment.EditedMar 6 2018, 5:26 PM

Then you should remove completely this now misleading alert in the "/qqq" documentation:

This message will be used at Commons:Template:Cc-by-sa-2.0. Please replace the untranslated URL with //creativecommons.org/licenses/by-sa/2.0/deed.de if possible. You can also find the translated name of the license at that location. If Creative Commons does not have a translation available, please point it to //creativecommons.org/licenses/by-sa/2.0/ Be careful not to point it to the country version; that is a different license. Each Creative Commons license can refer to a particular country, and if so, it will be referenced in a different message."

All the problems came initially FROM THIS misleading demand !

And also modify the description of "$2":

$2 - the URL "//creativecommons.org/licenses/by-sa/2.0/"

The URL in it is not necessarily this one if your new code now adapts it according to the language and so it should be changed to

$2 - the URL "//creativecommons.org/licenses/by-sa/2.0/" or one of its localized version

(please add also a link in that documentation line where the localized URLs are maintained, because it will no longer be adapted by translators in Translatewiki.net !)

In summary, the "$2" MUST now be present (and not changed like it was suggested since very long as this did not work as intended !)

@Verdy_p Done.

(The localised URLs are generated automatically, I updated the example.)

I suppose my patch to not treat missing $2 as a problem can be reverted now too?

I don't know what patch you mean, but possibly.

Thanks now, at least the problem the recognized and correctly solved. (Of course you'll need to maintain the code so the generated localized URLs will be correct for the languages displayed.

This will require sometimes updating this code when CC will have new localizations for their "deeds.*" pages, and use correct mappings (and fallbacks) for languages in the local wiki using the wizard (in the user's UI languages, not necessarily the default content language of the wiki!) to the languages supported in deeds pages of the CC wiki.

However I'm not sure about how the client-side javascript supports fallbacks (note that the rules for fallbacks should always be BCP47 compliant: they are for the UI language, set normally by users's preferences on the server but if it running on client-side javascript, the preference is normally set by user preferences in his client browser!)

Verdy_p added a comment.EditedMar 6 2018, 8:11 PM

Note also that this new UploadWizard requires Javascript.

Clients without javascript enabled should still be able to upload images... using the server-side upload wizard (implemented on the WediaWiki server with the MediaWiki hook for its "Special:" page, and using MediaWiki parsers and libraries which will also use the local server-side Mediawiki fallbacks and the local language preferences of the user on that wiki server) !

In that case it will use the messages imported in "Mediawiki:*" pages from TranslateWiki.net and these messages must be also compatible (i.e. it will be up to the "UploadWizard" server-side extension to adapt the message and make the replacement of "$2" by the relevant localized URL and the server-side fallbacks!

@Verdy_p No, we do not need to handle fallbacks, the Creative Commons site handles them. We can link to https://creativecommons.org/licenses/by-sa/3.0/deed.whatever, and if they don't have a translation for 'whatever', it redirects to a supported language.

I do not understand your second comment. UploadWizard has always only been available for users with JavaScript. These messages are only used in UploadWizard and nowhere else. The non-JavaScript upload form does not display these messages.

How users without Javascript upload files ? There's a special page for that and it should still continue to offer the legacy upload form not requiring any javascript, but using form submissions to the server and input fields for the file, the description, and a way to select the licence. All wikis have this form (or should continue to have it). The javascripted version is just an helper which can be lauched to automate various thinkgs or update the form dynamically.

This is why there was these translatable resources on TranslateWiki.net and why the previous trick (commenting out the "$2" to replace it with the actual URL) was needed and why it worked.

This is what changed on Commons since January, because the legacy upload wizard was entirely server based and did not require Javascript. And this is still the form used on most wikis (including on Wikimedia wikis, notably Wikipedia if they've not disabled local uploads and asked people to upload all images on Commons, which is impossible for some images on English Wikipedia due to policy constraints, notably on licences, as Commons does not accept the "US Fair use" clause for corporate US logos accepted only on that wiki, but also for various other wikis whose files won't fit in Commons, including for the Foundation's wiki, or many wikis that want to host files under specific conditions not suitable for Commons !)

I don't know what patch you mean, but possibly.

9c16a27c0c6f3fd7b5bbaaf1e9fd84615985b4b5

I'll revert that and merge unless someone opposes.

Lofhi added a subscriber: Lofhi.Mar 7 2018, 11:16 PM

@Verdy_p I am sorry but I don't understand what you mean. We are not changing anything related to the non-JS upload page. If you notice any actual issues, please file another bug, otherwise please stop flooding this task.

@Nikerabbit Yeah, looks fine.

Pamputt added a subscriber: Pamputt.Mar 8 2018, 7:09 AM