Page MenuHomePhabricator

Image options should be serialized in the wiki's language in rtl locales - localization i18n
Closed, ResolvedPublic

Description

When VisualEditor inserts an image, the thumbnail keywords are written in English: "File:", "thumb", "right" (but see Bug 51851).

This should not, theoretically, be a problem, because the VisualEditor is supposed to make these keywords unimportant and hidden from the end-user. However, while the source editor is still being widely used, this is a problem, especially for right to left languages: it is very hard to edit these English keywords when they are mixed with right-to-left text.

These keywords should be inserted in the language of the wiki.


Version: unspecified
Severity: normal

Details

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:01 AM
bzimport set Reference to bz51852.

In most ltr wikis the English version is actually the most commonly used today, which is why we went with serializing new content with the English defaults.

This is not ideal for rtl languages though, as this leads to a mix of ltr and rtl text.

If you can provide us with a list of rtl language codes and check for each if a common offset in the option aliases is a good default, then we can add a list of rtl languages to the serializer and use the localized defaults for those.

We don't actually need to hard-code the list of rtl wikis, as that information is available in the site info returned by the API. It would still be good to check for each current rtl wiki if the first option alias is always a good one.

I went over the RTL languages and samples practical Wikipedia articles, and it seems that the first alias is always is always the most frequently used.

Is there anything I can do to help move this forward? It's one of the most requested features.

I'm pretty sure our magic words support already has the appropriate localized keyword alias. We just aren't using it (yet).

Change 103082 had a related patch set uploaded by Cscott:
Edited image attributes should override data-parsoid value.

https://gerrit.wikimedia.org/r/103082

The above was merged in January, but this bug isn't fixed AFAICT. Also, we should probably use the local wikitext over the English version in all languages, not just RTL ones.

Yes, that is correct. The patch was merged and then the language-specific parts were reverted because the way it selected the aliases was fragile in LTR languages. This is still on my to-do list for a proper fix; IIRC the blocking issue is we need to export a simple RTL/LTR flag in siteinfo.

(In reply to C. Scott Ananian from comment #8)

Yes, that is correct. The patch was merged and then the language-specific
parts were reverted because the way it selected the aliases was fragile in
LTR languages. This is still on my to-do list for a proper fix; IIRC the
blocking issue is we need to export a simple RTL/LTR flag in siteinfo.

Would it be easier if we just had Parsoid default to the (first?) localised string over the generic wikitext? This presumably doesn't need any MW API changes…

Change 244254 had a related patch set uploaded (by Eranroz):
Use local keyword for image

https://gerrit.wikimedia.org/r/244254

A note specific for English: This fix will introduce "thumbnail" instead of "thumb" in parsoid. This side effect is the correct behavior as aliases are defined by the preferred order, and this is the syntax added by the current WikiEditor image dialog. If you prefer "thumb" and not "thumbnail" (personally I do too) which is shorter - you should swap the aliases in English (MessageEn in core, and enwiki.json in baseconfig) - but this should be a different task.

It's not just english, it's also dewiki IIRC.

To unblock this issue, you need to survey all the wikis (not just enwiki and hewiki) and describe how this change will affect them, then get confirmation from the language team that the change is actually "better" for all these languages.

We were burned before by changing this prematurely; before we're going to make another change we need to be sure we are actually doing the right thing.

@eranroz's code in wikieditor: https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikiEditor/8fa9506e20ce5216240531686fa0bd70fdb2a6d8/WikiEditor.hooks.php#L374

I'm proposing a simple process:

  • Make a table with the various options whose serialization will be changed with this patch, both before and after.
  • Send this in an email to the language team for approval.
    • Alternatively, you could do a grep over the most recent dump for that wiki to count frequency of use for each alias, and verify that the term which would be used by this patch matches the most frequent alias.
  • For any languages where this patch does *not* match that wiki's preferred term, file an issue as a blocker for this bug, and resolve it.
    • Either add special cases to this patch, or else export some general siteinfo property (a boolean to "prefer first alias" or "prefer last alias"), or else work with the community to swap the aliases in mediawiki (MessageEn in core, and then perhaps the actual wikipedias need to be checked for customizations that might override this)
  • After this is done, everyone will have objectively agreed that the terms used by this patch are preferable, and this patch can be merged.

A good compromise to get this moving would be to do this analysis for the top N wikis (N = 10?) -- the expectation is that only a few wikis will need either their messages tweaked or special case exceptions added for them in the code and we can get this going from there.

General Q: Can the same patch also resolve T104057, a similar issue in ContentTranslation?

  • Alternatively, you could do a grep over the most recent dump for that wiki to count frequency of use for each alias, and verify that the term which would be used by this patch matches the most frequent alias.

Sounds more practical and closer to reality.

Parsoid developers, don't you have some kind of a tool that queries lots of dumps quickly? I remember hearing something like this, but maybe I am just imagining.

A good compromise to get this moving would be to do this analysis for the top N wikis (N = 10?)

I'd guess that it's more like 50 or so.

General Q: Can the same patch also resolve T104057, a similar issue in ContentTranslation?

  • Alternatively, you could do a grep over the most recent dump for that wiki to count frequency of use for each alias, and verify that the term which would be used by this patch matches the most frequent alias.

Sounds more practical and closer to reality.

Parsoid developers, don't you have some kind of a tool that queries lots of dumps quickly? I remember hearing something like this, but maybe I am just imagining.

This used to be part of the parsoid repository but is now its own repo @ https://github.com/wikimedia/dumpgrepper

Change 280792 had a related patch set uploaded (by Arlolra):
T53852: Serialize localized image options for rtl languages

https://gerrit.wikimedia.org/r/280792

Change 244254 abandoned by Eranroz:
Use local keyword for image

Reason:
See a better patch: https://gerrit.wikimedia.org/r/#/c/244254/

https://gerrit.wikimedia.org/r/244254

Change 280792 merged by jenkins-bot:
T53852: Serialize localized image options

https://gerrit.wikimedia.org/r/280792