parser tags such as <ref>, <poem>, <timeline> etc. cannot be localized
Open, NormalPublic

Description

Before the Berlin 2011 hackathon i published in several Hebrew Wikipedia-related forums a call for the most annoying RTL issues. This was the most frequent complaint:

Parser tags such as <ref>, <poem>, <timeline> etc. cannot be localized. This is not a terrible issue for left-to-right languages, but it is a serious one for RTL languages. For example, <ref> is very often used with URLs, and these get jumbled up. It's hard to write them in the first place, and it's even harder to correct them after they are written. Replacing <ref> with something like <הערה> would make adding references to RTL wikis a lot easier. The same is true for the other tags of this kind.

I talked to Victor Vasiliev and Tim Starling about it in the Berlin Hackathon 2011 and they said that it's generally doable.


Version: unspecified
Severity: normal

Details

Reference
bz28980
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz28980.
Amire80 created this task.May 14 2011, 6:03 PM

<הערה> would be a nightmare to understand for foreign visitors, though :)

I think we should set a convention for having the alternate hook names in the localization file, then the extension would add the alias in addition to its canonical form.

(In reply to comment #0)

I talked to Victor Vasiliev and Tim Starling about it in the Berlin Hackathon
2011 and they said that it's generally doable.

I believe its been discussed (in relation to ref at least) as doable but not really wanted to be done because of consistent use across sites so uses don't get confused and also for third party tools (eg: api access/bots to name a couple)

(In reply to comment #2)

I believe its been discussed (in relation to ref at least) as doable but not
really wanted to be done because of consistent use across sites so uses don't
get confused

Users of the Hebrew Wikipedia get confused with the current situation. Very confused. Try editing a Hebrew Wikipedia article and think what confuses you more - the RTL-ness or the possibility that the <ref> tag would be different.

Actually, people rarely want to edit projects in languages other than their own. Those that are bold enough to edit in Hebrew will also be clever enough to understand that <ref> is <הערה>.

(I am saying "Hebrew" just because it's my home wiki; it's a drop in the bucket compared to Arabic, Farsi and Urdu.)

third party tools (eg: api access/bots to name a couple)

They can use the same localization files.

I can't imagine that localizing these xml tags would be anymore confusing then how [[ملف:Example.svg]] makes an image link on ar, or using {{#لو:...}} instead of {{#if:...}}, or even the pseudo-entity &رلم; for &rlm; (These example are all from ar because it was handy).

Any bots that would be put out by this, are probably already put out by the existing localization.

  • Bug 28977 has been marked as a duplicate of this bug. ***

Maybe integrate AltTag into MW core?

(In reply to comment #7)

Maybe integrate AltTag into MW core?

AltTag does a good job as an extension, but if we did this in core, we'd probably want to do something more integrated then running a regex on the wikitext before parsing imho.

Certainly not what we want for core. There will be also some cornercases such as <source><ref></source> which would fail (and AltTag regex would be a pain to review, too).

(In reply to comment #8)

AltTag does a good job as an extension, but if we did this in core, we'd
probably want to do something more integrated then running a regex on the
wikitext before parsing imho.

(In reply to comment #9)

Certainly not what we want for core. There will be also some cornercases such
as <source><ref></source> which would fail (and AltTag regex would be a pain to
review, too).

So AltTag's regex style is out. Magic words can be localized, why not tags?

(In reply to comment #10)

So AltTag's regex style is out. Magic words can be localized, why not tags?

I believe they can be, but up to this point its been a undesired function so it hasn't been done in core.

Niklas and I discussed options. We think we must provide a solution for this. Arguments that it is confusing for users speaking other languages, or that robots and other tooling may have issues with it are less important than proper language support for users. We are of the opinion that we should implement this in regular MediaWiki messages, tagged optional, containing a comma separated list of aliases, with the English language parser tag as default value. Message keys will look like <prefix_chosen_by_developer>-<parsertag>. Assigned to Niklas.

Victor, let us know if you have something prepared already, because it's been assigned to you for four months...

We are of the opinion that we should implement this
in regular MediaWiki messages, tagged optional, containing a comma separated
list of aliases, with the English language parser tag as default value.

This would put localized parser tag names in the hands of the users via the mediawiki namespace. This seems like a bad idea. A user could cause wide spread problems by editing such a message ( bug 23287 comes to mind. But the simple fact that changes wouldn't be immediately apparent since every page would need to be re-parsed to see if there is stuff that would now (or no longer) be a parser tag seems confusing enough to the end user to not want users to be able to change these parser tag "aliases")

I think the magic word system would make much more sense for this.

We can skip Mediawiki namespace while enjoying the other benefits of doing it that way.

shealen.clare wrote:

This bug has not been touched in at least six months. With this in mind, I've been asked by the bugmeister to bump this bug's priority down for "High". Concerns can be addressed to mah@everybody.org.

Dalba added a comment.Mar 10 2013, 3:50 PM

(In reply to comment #0)

Parser tags such as <ref>, <poem>, <timeline> etc. cannot be localized.

Replacing <ref> with something like <הערה>
would make adding references to RTL wikis a lot easier. The same is true for
the other tags of this kind.

If it wasn't for bug 43685, I would have suggested to use a template instead (something like {{הערה|reftext|refname|refgroup}}). A neat and painless approach I believe.

(In reply to comment #16)

If it wasn't for bug 43685, I would have suggested to use a template instead
(something like {{הערה|reftext|refname|refgroup}}). A neat and painless
approach I believe.

Well, the Hebrew Wikipedia already uses such a template for most references, because it's just much easier to edit articles using it.

I'm coming back to this bug now for a somewhat surprising and upside-down reason: The VisualEditor. It has a nice button for inserting references, and the button, naturally, inserts <ref>. It can also edit existing footnotes only if they use <ref>.

Ideally it would be fine, because the VE is supposed to hide markup from editors, but as long as the VE is not completely stable and a lot of people continue editing in source, this will be a major problem. Localizing at least the <ref> tag can be a nice compromise.

Pingy.

This is being discussed in the Hebrew Wikipedia again in the same context: VisualEditor. VE inserts <ref> tags for footnotes, but the community strongly prefers a template, because using the explicit tag makes source editing hard. A bot is replacing all <ref> tags to that template. This essentially means that references cannot be edited as such by the VE in the Hebrew Wikipedia, but as templates, which makes VE editing experience confusing - an editor expects to edit a footnote, but gets a template editing dialog. Resolving this bug should make everybody happy.

I'd like to refocus this issue only on <ref> and little else.

In actuality, it's not impossible to have these tags localized. It's already done in LabeledSectionTransclusion. However, I don't actually think that localizing all tags to all languages is important.

In the coming age of VisualEditor localization of tags is supposed to become entirely irrelevant, because ideally they should be used only internally and not typed by editors.

Until that age comes, however, people will do a lot of manual adding, removing and editing of tags in wiki syntax mode. For tags like <poem> and <timeline> it's not actually disastrous and nobody really complains about them (the content of <timeline> is a pain in RTL, but that's an entirely different issue).

For <ref> however, it's a nightmare in RTL languages. What I am imagining at this point is a way to get <ref> localized to RTL languages (and probably not even all of them) using a mechanism that is works with wiki syntax and with VisualEditor and Parsoid, and to be ready to get rid of it in the far future when direct wiki syntax editing becomes unimportant.

I committed an experimental patch for this:
https://gerrit.wikimedia.org/r/#/c/163467/

Ebraminio moved this task from Backlog to Other on the RTL board.Aug 9 2015, 11:34 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 9 2015, 11:34 AM
Amire80 moved this task from Other to MediaWiki-core on the RTL board.Aug 30 2015, 7:29 AM
Huji added a subscriber: Huji.Dec 13 2016, 1:29 AM

@Amire80 I see you had abandoned this back in April. Any chance "some other time" could be December 2016?! ;)

Yes, I want to revive it very soon, at least for <ref>.

This may also need some work to work with Parsoid.

ssastry added a subscriber: ssastry.EditedJun 21 2017, 9:18 PM

This may also need some work to work with Parsoid.

Yes, it won't be a lot of work in Parsoid however. Some minimal code updates. The bigger thing to establish is how extensions specify their localized tags and how this information will be exposed in the API. Given that translations are likely going to be a one-time change (given code breakage risk), I don't think these should be part of system messages. These should be part of the extension config. While that requires wikis to go through devs for establishing translations, I don't think it is an unreasonable burden given the one-time nature of these changes.

Re: API, I propose we just extend the existing *.i18n.alias.php files, which are used for hard-coding special page aliases. I assume that a new variable added there would not automatically get translated by TranslateWiki (pinging @Nikerabbit) so all changes would go through code review, which we want.

Huji awarded a token.Jun 21 2017, 10:31 PM

You don't even need a lot of translations. This is a real problem only for RTL alphabets, where mixing Latin XML-like tags with RTL text is awful. A small number of translations to RTL languages is all that is needed. Translating <ref> to Hindi, Chinese, and Russian can be possible, but I would discourage actually doing it unless there is a good reason to do it.

i18n.alias and i18n.magic files are currently not up for translation in translatewiki.net. I would however encourage to create a new file for keeping it simple and obvious which extensions have tags to translate.

Tangential: In the long term my wish is to migrate also these files to JSON format and make them available for translation in translatewiki.net. But because the values are not plain strings, the translatewiki.net side is not currently possible.

Esanders added a comment.EditedJul 12 2017, 3:20 PM

Tangential: In the long term my wish is to migrate also these files to JSON format and make them available for translation in translatewiki.net.

Changing a tag translation or special page name would be a breaking change with respect to existing wiki content, so we shouldn't allow TW users to do it.

cscott added a subscriber: cscott.Jul 12 2017, 3:39 PM

On multilingual wikis (commons, meta, wikimania, etc) you should probably be aware that multiple tag aliases may be in use. Could be exciting! Probably select which one to use based on the page language... which works for everything except the <translate> extension, which can have chunks of text in different languages on the same page...

I think actually using templates to work around the monolingual-ness of <ref> (as hewiki does) is not a terrible idea. The HTML-ish tags (like <b>, <br> are always going to be English -- why not just establish a convention that you can use {{<b}} and {{>b}} to localize these, with whatever name you like for "b". Then you can just write multiple templates on multilingual wikis if necessary. (Although that pushes the responsibility onto VE to know the proper language for the article it is editing, and use the proper templates.)

cscott added a comment.EditedJul 12 2017, 3:44 PM

Note that LanguageConverter will also raise issues, since like <translate> it mixes content in multiple languages together on the same page. zhwiki can be expected to fight over whether <ref> gets localized in simplified or traditional characters, etc. A single localization isn't really going to work.

why not just establish a convention that you can use {{<b}} and {{>b}} to localize these, with whatever name you like for "b". Then you can just write multiple templates on multilingual wikis if necessary. (Although that pushes the responsibility onto VE to know the proper language for the article it is editing, and use the proper templates.)

To clarify, I'm using one of many bikeshedable syntaxes for heredoc templates here (T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments)), where Template:b (or :es:Template:negrita or whatever ) is just <b>{{{1}}}</b>.

Nikerabbit added a comment.EditedAug 6 2017, 3:37 PM

Note that LanguageConverter will also raise issues, since like <translate> it mixed content in multiple languages together on the same page.

The only mixing of two languages at at time is filling missing translations with content in the source language (and those pages are not editable anyway). It's against the best practice to use non-canonical names for tags/special pages/etc. in text that is going to be translated or in its translations exactly because it is not necessarily known whether any non-canonical names will be available where those translations are displayed.

Changing a tag translation or special page name would be a breaking change with respect to existing wiki content, so we shouldn't allow TW users to do it.

How easy it is to break BC is a different question from how easy it is to provide translations. Naturally, we would take appropriate measures to avoid breaking BC, as we have done so far.

This is long standing request become a more serious issue as VE doesn't support ref templates (which use {{#tag:ref|ref content}}). As handling it is quite challenging in VE (especially if the template contains more than {{#tag}} and possibly more features), it looks like having a native localized <ref> could be the easiest solution for RTL/LTR mix caused by <ref>.

During Wikimania 2017 I talked with @Esanders and @Mooeypoo and they suggested to make sure first there is a community interest in using localized ref before engineering invest time on supporting it.
I opened it to discussion in hewiki WP:VP and there seems to be a great interest of the community to have a localized version of ref tag:
https://he.wikipedia.org/w/index.php?title=%D7%95%D7%99%D7%A7%D7%99%D7%A4%D7%93%D7%99%D7%94:%D7%9E%D7%96%D7%A0%D7%95%D7%9F&oldid=21366986#.D7.AA.D7.92.D7.99.D7.AA_.3C.D7.94.D7.A2.D7.A8.D7.94.3E

Huji added a comment.Aug 13 2017, 8:09 PM

I can assure you, this and T15673 are among the most wanted features in Persian wikis as well.

Liuxinyu970226 rescinded a token.
jeblad added a subscriber: jeblad.Sep 15 2017, 5:42 PM

It is a really bad idea to translate tag functions. Actually I believe it is a really bad idea to translate all such markup and programming constructs without a working translator for those constructs.

I have programmed in localized programming languages for several years, outside Mediawiki, and trying to reuse code really sucks big time.

If some community (like the Hebrew community) want to shoot themselves in the foot with a artillery cannon by reimplementing tag functions as templates, then let them do so, but do not tempt other communities to do the same. It is better to make clean, reusable code (and wikitext), that can be moved between projects.

IKhitron added a comment.EditedSep 15 2017, 5:49 PM

It is a really bad idea to translate tag functions.
If some community (like the Hebrew community) want to shoot themselves in the foot with a artillery cannon by reimplementing tag functions as templates, then let them do so, but do not tempt other communities to do the same. It is better to make clean, reusable code (and wikitext), that can be moved between projects.

If you think that tag translating is a bad idea, and reimplementing as templates is a bad idea, and it is impossible to use the regular tags in rtl texts inline, how do you suggest to manage references usage at all?

Liuxinyu970226 added a comment.EditedSep 17 2017, 2:45 PM

@IKhitron

If you that think tag translating is a bad idea, and reimplementing as templates is a bad idea, and it is impossible to use the regular tags in rtl texts inline, how do you suggest to manage references usage at all?

So you think that

<שפת סימני עריכה לתמליל - על>
<רֹאשׁ>
<כותרת>בָּר</כותרת>
</רֹאשׁ>
<גוּף>
<עמ '>מזון בָּר</עמ '>
</גוּף>
</שפת סימני עריכה לתמליל - על>

is rather better than

<html>
<head>
<title>בָּר</title>
</head>
<body>
<p>בר מזון</p>
</body>
</html>

isn't you?!

This comment was removed by IKhitron.

@IKhitron
So you think that
`
<שפת סימני עריכה לתמליל - על>
<רֹאשׁ>

</רֹאשׁ>
</שפת סימני עריכה לתמליל - על>
`
is rather better than
`
<html>
a
</html>
`

Not at all. I think that

<הערה שם=אבג>טקסט1 טקסט2 טקסט3</הערה>

is much better than

<ref name=אבג>טקסט1 טקסט2 טקסט3</ref>

[Thinking loud] Ideally core wiki syntax should be BIDI neutral: for example links ([[]]), and templates ({{}}) are bidi neutral syntax.
Maybe references are so widely used, that we can think on how we should make them also with bidi neutral code, and maybe also include them as part of the core (they are already leaking to parsoid with some dedicated code for it). But this is probably out of scope of this task.

It is a really bad idea to translate tag functions. Actually I believe it is a really bad idea to translate all such markup and programming constructs without a working translator for those constructs.

I'm not certain I agree; I think if we have reasonable standards and guidelines, translation could be fine.

IMO this is related to the Shadow namespaces/global modules stuff. The way I see it is (to use <ref> as an example):

  1. There is a global template called meta:Template:Ref which expands to <ref>{{{1}}}</ref> (or some similar straightforward thing).
  2. Local wikis can make their own templates which inherit/invoke the global template. For instance hewiki:Template:הערה expands to {{Ref|{{{1}}}}} (or some such)

Then we have a principled way of "translating" tag-based constructs, and analysis tools (editors, bots, etc) can automatically determine that {{הערה}} is equivalent to <ref> by just following the template expansion.

Further, this generalizes straight-forwardly to allow translating template arguments. So long as the "translation" template has a 1-to-1 mapping w/ all arguments to some template in the global namespace, we can automatically recognize it as a translation. Further, this completely decouples the mapping from the core code or parser. You can change the argument or tag translations or even possibly add aliases without writing any code; the semantics follows from the inheritance from the global functions.

Thank you, @cscott. One important extra point to your post: the tags are recognized as references in Visual Editor, the templates aren't.

Thank you, @cscott. One important extra point to your post: the tags are recognized as references in Visual Editor, the templates aren't.

This, in 2017, is indeed the most central point.

The problem is:

  • ref tags are very common
  • editing ref tags in source code in RTL languages is very hard
  • replacing ref tags with templates solves the above point for people who edit in source code, but complicates things in VE: templates are identified as templates and not as refs.

Whatever solves this problem for both wiki syntax and VE editing is good.

@IKhitron Right. But VE has a bunch of stuff awkwardly hard-coded at the moment; there's already a discussion about how to generalize this properly. One of the options is to teach VE about the global templates and inheritance stuff described above, so that VE wouldn't have to have the tags hard-coded. There are some other options, including adding a special semantic marker of some kind to the template that VE could look for. No one is satisfied with the way VE hard-codes things right now.

Good luck with that.

(A year ago I'd be excited about global templates. These days I still think that having them would be a step forward from the current madness. However, my thinking on this evolved: I'd be happier to convert a lot of templates, especially citations and infoboxes, to real features that can be installed and localized like extensions. In this light, converting tags or magic words to templates looks like a regression. I realize that this is quite far fetched thinking, of course.)

@Amire80 we have a bunch of different syntactic constructs, including tags, magic words, parser functions, and templates. The primary argument for solving this via templates is (in my opinion) consistency: we can solve the problem once and be done with it. I've got some other syntax improvements for templates which should make this even nicer.

I published an earlier discussion about semantic tagging in general at T176242: [EPIC] Representing / extracting wiki-specific application-level semantics. That underlies the "but VE only knows X" issue @IKhitron brought up.

It is a really bad idea to translate tag functions. Actually I believe it is a really bad idea to translate all such markup and programming constructs without a working translator for those constructs.

I have programmed in localized programming languages for several years, outside Mediawiki, and trying to reuse code really sucks big time.

If some community (like the Hebrew community) want to shoot themselves in the foot with a artillery cannon by reimplementing tag functions as templates, then let them do so, but do not tempt other communities to do the same. It is better to make clean, reusable code (and wikitext), that can be moved between projects.

In essence, it was already done. In the Hebrew Wikipedia, editors who edit in the wiki syntax, use a template instead of a <ref> tag, and when VE inserts the explicit <ref> tag, it is auto-replaced by a template with a bot. We know that it's suboptimal (for reasons explained in other comments here), but in practice it's less torture for wiki syntax editors, who will remain the majority for the foreseeable future, than using explicit <ref> tags mixed with right-to-left text. That's precisely why we're trying to think of a better solution.

Amire80 moved this task from Untriaged to RTL on the I18n board.Feb 28 2018, 12:35 PM
He7d3r added a subscriber: He7d3r.Mar 7 2018, 1:17 PM