Page MenuHomePhabricator

Many raw HTML messages in WikiEditor cannot be edited in translatewiki
Closed, ResolvedPublic

Description

The WikiEditor extension is among the most important ones for Wikimedia sites' contributors. Its localization, however, is hindered by the fact that it has several dozens of raw HTML messages. To see a list, go to https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikiEditor/+/refs/heads/master/extension.json and search for RawHtmlMessages.

These messages cannot be translated through the usual translatewiki interface for security reasons, and require administrator intervention.

It would be nice to get rid of this problem and allow usual translatewiki contributors to translate them.

Event Timeline

We could start with marking a bunch of messages as not raw HTML (and, of course, properly escape them when displaying). The following 61 messages are marked as raw HTML, but have no HTML markup in English that would be forbidden in wikitext:

MessageContent
wikieditor-toolbar-help-heading-descriptionDescription
wikieditor-toolbar-help-heading-syntaxWhat you type
wikieditor-toolbar-help-heading-resultWhat you get
wikieditor-toolbar-help-page-formatFormatting
wikieditor-toolbar-help-page-linkLinks
wikieditor-toolbar-help-page-headingHeadings
wikieditor-toolbar-help-page-listLists
wikieditor-toolbar-help-page-fileFiles
wikieditor-toolbar-help-page-referenceReferences
wikieditor-toolbar-help-page-discussionDiscussion
wikieditor-toolbar-help-content-italic-descriptionItalic
wikieditor-toolbar-help-content-italic-syntax''Italic text''
wikieditor-toolbar-help-content-italic-result<em>Italic text</em>
wikieditor-toolbar-help-content-bold-descriptionBold
wikieditor-toolbar-help-content-bold-syntax'''Bold text'''
wikieditor-toolbar-help-content-bold-result<strong>Bold text</strong>
wikieditor-toolbar-help-content-bolditalic-descriptionBold &amp; italic
wikieditor-toolbar-help-content-bolditalic-syntax'''''Bold &amp; italic text'''''
wikieditor-toolbar-help-content-bolditalic-result<strong><em>Bold &amp; italic text</em></strong>
wikieditor-toolbar-help-content-ilink-descriptionInternal link
wikieditor-toolbar-help-content-ilink-syntax[[Page title]]<br />[[Page title|Link label]]
wikieditor-toolbar-help-content-xlink-descriptionExternal link
wikieditor-toolbar-help-content-xlink-syntax[http://www.example.org Link label]<br />[http://www.example.org]<br />http://www.example.org
wikieditor-toolbar-help-content-heading2-description2nd level heading
wikieditor-toolbar-help-content-heading2-syntax== Heading text ==
wikieditor-toolbar-help-content-heading2-result<h2>Heading text</h2>
wikieditor-toolbar-help-content-heading3-description3rd level heading
wikieditor-toolbar-help-content-heading3-syntax=== Heading text ===
wikieditor-toolbar-help-content-heading3-result<h3>Heading text</h3>
wikieditor-toolbar-help-content-heading4-description4th level heading
wikieditor-toolbar-help-content-heading4-syntax==== Heading text ====
wikieditor-toolbar-help-content-heading4-result<h4>Heading text</h4>
wikieditor-toolbar-help-content-heading5-description5th level heading
wikieditor-toolbar-help-content-heading5-syntax===== Heading text =====
wikieditor-toolbar-help-content-heading5-result<h5>Heading text</h5>
wikieditor-toolbar-help-content-ulist-descriptionBulleted list
wikieditor-toolbar-help-content-ulist-syntax* List item<br />* List item
wikieditor-toolbar-help-content-ulist-result<ul><li>List item</li><li>List item</li></ul>
wikieditor-toolbar-help-content-olist-descriptionNumbered list
wikieditor-toolbar-help-content-olist-syntax# List item<br /># List item
wikieditor-toolbar-help-content-olist-result<ol><li>List item</li><li>List item</li></ol>
wikieditor-toolbar-help-content-file-descriptionEmbedded file
wikieditor-toolbar-help-content-file-syntax[[$1:Example.png|$2|$3]]
wikieditor-toolbar-help-content-file-captionCaption text
wikieditor-toolbar-help-content-reference-descriptionReference
wikieditor-toolbar-help-content-reference-syntaxPage text.&lt;ref&gt;[http://www.example.org Link text], additional text.&lt;/ref&gt;
wikieditor-toolbar-help-content-named-reference-descriptionNamed reference
wikieditor-toolbar-help-content-named-reference-syntaxPage text.&lt;ref name="test"&gt;[http://www.example.org Link text]&lt;/ref&gt;
wikieditor-toolbar-help-content-rereference-descriptionAdditional use of same reference
wikieditor-toolbar-help-content-rereference-syntax&lt;ref name="test" /&gt;
wikieditor-toolbar-help-content-showreferences-descriptionDisplay references
wikieditor-toolbar-help-content-showreferences-syntax&lt;references /&gt;
wikieditor-toolbar-help-content-signaturetimestamp-descriptionSignature with timestamp
wikieditor-toolbar-help-content-signaturetimestamp-syntax--~~~~
wikieditor-toolbar-help-content-signature-descriptionSignature
wikieditor-toolbar-help-content-signature-syntax~~~
wikieditor-toolbar-help-content-indent-descriptionIndent
wikieditor-toolbar-help-content-indent-syntaxNormal text<br />:Indented text<br />::Indented text
wikieditor-toolbar-help-content-indent-resultNormal text<dl><dd>Indented text<dl><dd>Indented text</dd></dl></dd></dl>

Only the following 8 messages do have markup that requires raw HTML:

MessageContent
wikieditor-toolbar-help-content-ilink-result<a href='#'>Page title</a><br /><a href='#'>Link label</a>
wikieditor-toolbar-help-content-xlink-result<a href='#' class='external'>Link label</a><br /><a href='#' class='external autonumber'>[1]</a><br /><a href='#' class='external'>http://www.example.org</a>
wikieditor-toolbar-help-content-reference-resultPage text.<sup><a href='#'>[1]</a></sup>
wikieditor-toolbar-help-content-named-reference-resultPage text.<sup><a href='#'>[2]</a></sup>
wikieditor-toolbar-help-content-rereference-resultPage text.<sup><a href='#'>[2]</a></sup>
wikieditor-toolbar-help-content-showreferences-result<ol class='references'><li id='cite_note-test-0'><b><a title='' href='#'>^</a></b> <a rel='nofollow' title='http://www.example.org' class='external text' href='#'>Link text</a>, additional text.</li><li id='cite_note-test-1'><b><a title='' href='#'>^</a></b> <a rel='nofollow' title='http://www.example.org' class='external text' href='#'>Link text</a></li></ol>
wikieditor-toolbar-help-content-signaturetimestamp-result--<a href='#' title='$1:Username'>Username</a> (<a href='#' title='$2:Username'>talk</a>) 15:54, 10 June 2009 (UTC)
wikieditor-toolbar-help-content-signature-result<a href='#' title='$1:Username'>Username</a> (<a href='#' title='$2:Username'>talk</a>)

Even the 8 messages can be reformatted using "tvars", or a template-like (or parserfunction-like) syntax to hide the raw HTML, while still allowing to translate the rest. This is possible because we have the way to use {{#tag: tagname | attributes = values | ''content''}} and most part of it can be "hidden" in short tvars (like "$1") that won't confuse translators:
they will see things like "translatable part 1 {{$1| fully translatable part 2 }} translatable part 3" and will not be able to change tags or attributes like:
"{{#tag:a|href=#|title=something:Username|Username}}" that are partly masked in tvars as "{{$1:Username|Username}}".

Actually we could unify the ...-syntax and ...-result messages: the HTML

Page text.&lt;ref&gt;[http://www.example.org Link text], additional text.&lt;/ref&gt;

is the result of parsing the wikitext

<nowiki>Page text.<ref>[http://www.example.org Link text], additional text.</ref></nowiki>

while

Page text.<sup><a href='#'>[1]</a></sup>

is the result of replacing the href attribute after parsing the wikitext

Page text.<ref>[http://www.example.org Link text], additional text.</ref>

We could just store and localize

Page text.<ref>[http://www.example.org Link text], additional text.</ref>

and once parse it nowiki’d, the other time parse it not nowiki’d and replace the href attribute in the result. In addition to completely avoiding raw HTML, this would also mean half as much work for translators and a guarantee that the example wikitext and the example result will really be the same.

For the signature, we also need to localize the string Username (it doesn’t appear in the source text, but appears once as link text and twice as link title in both results). The timestamp should be auto-generated, which also means that it will be in content language, not UI language, matching the actual signatures’ behavior.

Note that user names HAVE to be Bidi-isolated (many users have variable scripts, let's remember that SUL is effective now in Wikimedia, so users with Arabic names appear in English Wikipedia, as well there are names using Latin, and all can be mixed up if theses names also embed punctuations and other characters with "weak" directions (notably on leading positions, while trailing positions will have an effect the content after the user name).
The same would apply to translatable titles in the internal text of a link, or other attributes like captions of images. Finally some languages require *additional* markup (not present in the English source, for example "sup" elements for abbreviations of ordinal numbers, or even more for complex languages like "hiero" which is fully unusable as plain-text without this markup, or others that require specific layouts).

Consider Traditional Mongolian: normally it uses a vertical layout. But when it is inserted inside a document uwing an horizontal layout, its conversion to horizontal behaves differently in a RTL context, and in a LTR context, because the Mongolian glyphs and direction will be rotated upside down (180 degrees).

When the script is sinographic or Hangul, Kanas and Bopomofo, this is different: glyphs are normally not rotated. only the baseline changes (and different metrics lines)

Now insert Latin/Greek/Cyrillic or Arabic in a Mongolian vertical text: this time Latin and Arabic will be rotated differently, or the Latin/Greek/Cyrillic script may not rotate its letters but will align them vertically like in crosswords...

And in all these cases, there are substitutions of some punctuations and symbols, not all.

These complexities for vertical presentations are still not even solved in existing HTML/CSS specs (or in Unicode specs for BiDi, which just assume an single horizontal direction, and does not even consider the case of Boustrophedon, and scripts with varaible directions like Old Greek and Late Phoenician derivatives, and then just assumes that all vertical scripts handle like LTR, i.e. like Sinograms/Kanas/Hangul)

So the assumption that a translation should only contain "plain-text" and translators must not insert any additional markup (even if its required) brings complications that designers that were not trained with multingual knowledge and awareness of layout constraints often forgot. This goes up to the design of HTML itself (including HTML5) with its old concept of "inline contents" and "block contents", and in CSS (left vs right, when the actual distinction would be top vs. bottom and changinc depending on the context between start vs. end, or the supposed existence of a single vertical or horizontal alignement of baselines).

Translators can do the best they can, sometimes even their demands are ignored because some monolingual technicians asked them to not use any markup (or not alter the existing one which was tested only for English or Chinese: users of Arabic/Hebrew/Divehi/N'ko know what this means for them, as well students of paleographic languages: they simply don'r use HTML, and still publish using 2D-aware document formats, later rendered as PDFs, but for communications they use their own proprietary solutions, or what we illegimitaly call "hacks", or have to abandon their scripts and talk using other scripts or languages).

I don’t think I assumed anywhere that the messages contain plain text only–or do you mean nowiki’ing the result? But the nowiki’d text should represent wikitext input, and you can’t use any markup to control how wikitext is displayed in the <textarea>, either. Translators are free to include any markup in the snippets, and the nowiki’d text will show editors how to achieve that markup in wikitext.

Also, BIDI isolation should not be done by WikiEditor—it should be added by the parser in core (rMW includes/parser/Parser.php:4624-4691 (at e311106fdec7)); but that’s definitely out of scope for this task.

Your reference in Parser.php just speaks about embedding signatures (posting messages in threads). But user names are used as well outside of talks and the basic wiki syntax for such links is just a standard wikilink, whose text content can be anything, and link target can be to different pages or subpages in two namespaces, or going cross-wikis, possibly to other websites using standard external links (just like titles of books or articles or any webpage, possibly also with additional markup needed, and sometimes using embedded images).

I don't know how the parser can automatically guess and generate the markup and if it would not just corrupt the content or the behavior by generating explicit "bdi" isolation elements to contain and isolate them. Finally, not all user names are generating links, they may just be cited, just like artwork titles (most wikis tend to have a poliicy of NOT linking every occurence of the same term in the same page). The same is also true for list of native language names, list of page names (in a multilingual wiki): we may sort them, but then will have to cope with bidi isolation (thanks the former bugs in "bdi" elements, which did not exist in HTML4, and that initially were not self-embeddable is solved today in MediaWiki; without it, we had to choose between either inline and block elements, while "bdi" is using a suitable "mixed content" model which makes it more usable today, and we have deprecated the use of "bdo" and of related Bidi override controls in plaintext, only suitable for some inline contents but hard to manage in context).

Using "bdi" elements is not complex, even in the wikitext of plain articles, but most of its use is within templates that generate a suitable layout and in proper context. I don't how improving a bit more the MediaWiki parser could be much smarter (by guesses with many false positives that will be hard to track and fix?) or by using new "magic keywords" and custom syntaxes that will cause new conflicting interpretations or unexpected changes of rendering?

There are more important fixes to add in MediaWiki (notably better support for tables, including rowgroup's, colgroup's and col's) and a way to more easily integrate flexible layouts (that can adapt to the user demands or device capabilities or accessibility requirements, and also allow easier reuse of contents, and to improve the navigation by new axises of traversal and improved semantics of the content). But the next major change will be the integration of Wikifunctions, and creation of really multilingual sites.

For all this to happen, we need a better separation of contents and styling (and better performance for TemplateStyles while preserving the security of CSS, of course by sanitizing it). And all thatr with lower costs of maintenance (not requiring to manually reedit and review zillions places in zillions pages: this is where automation, by MediaWiki content parsers or scheduled background tasks, will be really helpful).

Feel free to create a new ticket about better support for multidirectional content, including user names, and subscribe me there, I’m happy to discuss these issues. But only at the appropriate place, which is not here: this task has a well-defined scope, namely allowing all translators to edit all messages of WikiEditor. Discussing how to get rid of messages that currently require raw HTML is in scope, discussing how to handle multidirectional content in general is clearly not.

We could start with marking a bunch of messages as not raw HTML (and, of course, properly escape them when displaying). The following 61 messages are marked as raw HTML, but have no HTML markup in English that would be forbidden in wikitext:

Is there any update on when this will be done? Rn this issue is preventing me from translating these parts of MediaWiki Core into Northern Sámi, which will have a massive negative impact on a deadline if no action is taken in the next couple of months.

Yupik triaged this task as High priority.Dec 29 2022, 6:21 PM

As far as I know, there isn't, but Niklas, Raimond, Abijeet and other relevant people can correct me if I'm wrong.

As a workaround, please post the translations to https://translatewiki.net/wiki/Support and I'll submit them. Sorry about the inconvenience :(

(And happy new year!)

As far as I know, there isn't, but Niklas, Raimond, Abijeet and other relevant people can correct me if I'm wrong.

As a workaround, please post the translations to https://translatewiki.net/wiki/Support and I'll submit them. Sorry about the inconvenience :(

(And happy new year!)

Will do. I will wait until I have translated all of them, instead of one at a time :) Perhaps by that time it will even be fixed since after Northern Saami, I still have Skolt and Inari to go.

(Happy New Year! :))

Change 934416 had a related patch set uploaded (by Jon Harald Søby; author: Jon Harald Søby):

[mediawiki/extensions/WikiEditor@master] Stop using autoMsg and use mw.messages directly instead

https://gerrit.wikimedia.org/r/934416

Change 934416 merged by jenkins-bot:

[mediawiki/extensions/WikiEditor@master] Stop using autoMsg and use mw.messages directly instead

https://gerrit.wikimedia.org/r/934416

matmarex assigned this task to jhsoby.

I think this patch will resolve the issue, and all that's needed now is for Translatewiki to update the installed version of WikiEditor. They usually update regularly, so this should happen soon.

Indeed. Thank you very much for the review!