Refactor MessagesXx.php magic words and order the magic words by some logical convention
Closed, ResolvedPublic

Description

MessagesXx.php magic words should be re-factored so the first alias of each magic word is the "most preferred" one. According to T53852#1735780, VisualEditor would then use MagicWord::getSynonym( 0 ) to find out what's the preferred localised form, instead of hardcoding the least preferred version as it currently does.

Usually local translation is the first option, and the last options are in English.

(MessageEn.php)

Note to translators:
  Please include the English words as synonyms. 
  This allows people from other wikis to contribute more easily.
eranroz created this task.Oct 20 2015, 3:10 PM
eranroz updated the task description. (Show Details)
eranroz raised the priority of this task from to Needs Triage.
eranroz added a project: Language-Team.
eranroz added a subscriber: eranroz.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 20 2015, 3:10 PM
eranroz renamed this task from Refactor MessagesXX.php to Refactor MessagesXX.php magic words.Oct 20 2015, 3:10 PM
eranroz set Security to None.

We should post here some statistics on usage of magic words in different wikis based on dumps as suggested in the discussions of T53852 (I'm currently running a script to collect such stats) . However - Keep in mind that software (and in particular Parsoid) can dictate their preferences and bias the results.

TAM TAM TAM
And the results (all wikis which have dumps) are

Change 247914 had a related patch set uploaded (by Eranroz):
Sort img keywords by usage

https://gerrit.wikimedia.org/r/247914

eranroz renamed this task from Refactor MessagesXX.php magic words to Refactor MessagesXX.php magic words and order the magic words by some logical convention.Nov 3 2015, 6:19 PM
eranroz updated the task description. (Show Details)
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 3 2015, 6:19 PM

So the suggested convention is:

  1. the first alias MUST BE the most preferred alias to the wikis in that lang (and software tools are to use the first alias when adding wikitext)
  2. other aliases MAY be ordered by the usage preference of that wiki.
  3. English aliases SHOULD BE mentioned in all languages. The English aliases MAY be the last aliases
Nemo_bis changed the task status from Open to Stalled.EditedDec 5 2015, 10:45 AM
Nemo_bis triaged this task as Lowest priority.
Nemo_bis edited projects, added MediaWiki-Internationalization; removed Language-Team.
Nemo_bis added a subscriber: Nemo_bis.

Please point out what's the expected benefit. I see no use case in task description, commit message or code review comments.

Parsoid generates undesired syntax, with English keywords instead of local keywords (see T53852), which:

  1. makes editing of wiki-syntax harder in RTL languages due to RTL/LTR mix
  2. in general, wiki syntax should prefer local keywords, as MediaWiki isn't English-centric.

That is, Parsoid selects the LAST choice since MessagesEn.php lists thumbnail and thumb as aliases. (so instead of fixing MessagesEn.php, the solution of Parsoid is break all other languages). I had a discussion with Parsoid team how to solve T53852 and we agreed we should fix it in core first.

eranroz changed the task status from Stalled to Open.Dec 5 2015, 11:45 AM
eranroz raised the priority of this task from Lowest to Normal.

Reopenning and assigning priority based on the dependent task T53852

Nemo_bis updated the task description. (Show Details)Dec 5 2015, 11:57 AM
Nemo_bis updated the task description. (Show Details)

Thanks a lot, now things are clearer; I summarised in the task description.

If I understand correctly, the alternative to changing 168 lines would be to change a single line in MessagesEn.php:

'img_thumbnail' => array( 1, 'thumbnail', 'thumb' ),

and then change VisualEditor to use the first option instead of the last. Ok...

Yes, the alternative is just as fine.

eranroz claimed this task.Mar 31 2016, 6:45 PM
eranroz added a subscriber: Language-Team.

Comment summary: an attempt to summarize+consolidate some disparate conversations on this task in an attempt to make the corresponding change a little clearer:

Pieces:

  1. Some dialog deep in Patch Set 2 in https://gerrit.wikimedia.org/r/#/c/247914/ :

Quoted portion of @eranroz's original commit message:

Sorting all image keywords by usage according to the convention:

  1. Local first, English last 10
  2. Most common first, least common last 11

@siebrand - Oct 22 12:36 AM:

How was this determined?
The bug requests only to put the most preferred first, and more was done here.

@eranroz - Oct 22 1:10 AM:

  1. This was determined based on scanning dumps of all Wikipedias using simple python script based on a regular expression that looks like "\|(ALIAS_A|ALIAS_B)[|\]]" , where ALIAS_A etc is derived from MessagesXX.php
  2. The bug requests to put the most preferred first - DONE. Everything else can be considered as refactoring. (but if there is any concern with the ordering of the rest of the aliases based on usage please explain)
  1. wikitech-l thread: Let's make parsoid i18n great again (started 2016-03-31) - as of this writing, the wikitech-l thread involves @eranroz, @Arlolra, @cscott, and @siebrand. On that thread, @cscott wrote:

In my view, by establishing a consistent semantics (first alias preferred) this empowers the local wikis. [...] And the new semanticsordering should really be documented in the code or release notes somewhere, not just in the gerrit/git history.

Better documentation somewhere makes sense, but where should that be? Let's make sure we aren't creating documentation that's overly vulnerable to bit rot.

Restating some conversation/planning from IRC:

  1. We're currently waiting on @siebrand to re-review (since he has a C-1 outstanding), and @eranroz to rebase the patch and add in-code documentation *somewhere*. (I shared @RobLa's concerns, but I'm not going to condition a C+2 on finding the perfect place.)
  2. The plan is then to Be Bold and C+2 the patch to core.
  3. @Arlolra is writing a Parsoid patch which will use the mediawiki version # to switch between "last first" and "first first" behavior, so that we seamlessly change over when the core patch goes live.
  4. Wikis will undoubtedly complain (they had every time I've touched this code before). If/when they express different preferences, those preferences (prefer English, or prefer a different localized variant) will be expressed via a patch to core reordering the aliases there, maintaining the "preferred keyword is first" semantics.
  5. (Would be nice) According to @Nikerabbit when translation of these keywords is reenabled on translatewiki.net, "making the first version the preferred one should be okay."
  6. (Would be nice) Announce in tech-news that a change is coming, so wikis can check the keyword preferences for themselves.

Ideally, #2 happens before a Tuesday so it's included in this next week's deploy cycle, and we do a Parsoid deploy of #3 concurrently (cherry-picking if needed).

siebrand renamed this task from Refactor MessagesXX.php magic words and order the magic words by some logical convention to Refactor MessagesXx.php magic words and order the magic words by some logical convention.Apr 2 2016, 8:16 AM
siebrand updated the task description. (Show Details)
siebrand closed this task as Resolved.Apr 2 2016, 8:39 AM

Patch set merged.

Change 247914 merged by jenkins-bot:
Add prefered magic words first

https://gerrit.wikimedia.org/r/247914

Change 281701 had a related patch set uploaded (by Cscott):
Improve comment to localizers in MessagesEn.php

https://gerrit.wikimedia.org/r/281701

Change 281701 merged by jenkins-bot:
Improve comment to localizers in MessagesEn.php

https://gerrit.wikimedia.org/r/281701

Ricordisamoa awarded a token.
Ricordisamoa added a subscriber: Ricordisamoa.
Ricordisamoa removed a subscriber: gerritbot.