Page MenuHomePhabricator

Update list of most often used messages for MediaWiki core at Wikimedia: 2019
Open, Needs TriagePublic

Description

The time has come to update the most used MediaWiki messages. The previous time was in 2015 (T65416), and there were quite a lot of manual tweaks since then.

I am inviting the people who are experienced with localization, especially in new languages and wikis, to express their opinions about how this should be done this time.

My initial proposal:

  1. The number of core MediaWiki messages in this list should be the same as the number required for export. I think that 10% of core (~370) is OK, but I'm fine with any other number.
  2. Messages from extensions should appear there, too, of course. The total shouldn't be higher than 500.
  3. There needs to be a list of messages that will be filtered out even if they are frequently loaded in practice. For example: https://gerrit.wikimedia.org/r/#/c/translatewiki/+/497118/
  4. The sequence of messages in the group must be the same as it is in the original JSON files. If they are ordered by the alphabet or by frequency, it will be inconvenient for translators. I saw many times that new translators are confused by the mix of out-of-sequence month names, messages about blocking, Exif tags, etc. That's the reason for this manual commit, for example—I witnessed how it made translation much more welcoming for new languages.
  5. Data collection should run in several usage scenarios, and combined:
    1. Logged-in user editing with wiki syntax.
    2. Logged-in user editing with Visual Editor.
    3. Using Content Translation (ideally it should be used a lot in new wikis)
    4. Anonymous user editing with wiki syntax.
    5. Anonymous user editing with Visual Editor.
    6. Anonymous desktop reader.
    7. Anonymous mobile web reader.
    8. The data needs to be collected either from all wikis or from several wikis. My proposal:
      1. English Wikipedia
      2. French Wikipedia
      3. English Wikisource
      4. Hindi Wikisource
      5. Wikidata
      6. Commons

Another possibility is to pick some important message manually by running some of the above scenarios. This will give a very precise and complete experience, but it can also have biases, and it's probably too manual. The process should be more easily repeatable and automatic, so that we'll be able to run it every few months and not every five years.

This is just the initial proposal. Please add your thoughts, and ask other relevant people.

Thanks!

Event Timeline

What's the status of this? Can it be done?

As you can see, there is no assignee, no priority, no team board tags. This is a pretty strong sign that nobody is actively working on this.

Thanks for raising the question, @MF-Warburg!

It's not on an official of any WMF team, and with the current mess with worldwide quarantines and changing priorities, it's hard to plan anything for the near future.

However, most of it can be done by volunteers. As far as I can tell, the only part that really needs WMF engineers involvement is data collection about the most frequently-used messages. @Nikerabbit, @tstarling, can the old scripts that collect these analytics be reused, or is this something that requires non-trivial coding?