Page MenuHomePhabricator

Update list of most often used messages for MediaWiki core at Wikimedia: 2019
Open, Needs TriagePublic


The time has come to update the most used MediaWiki messages. The previous time was in 2015 (T65416), and there were quite a lot of manual tweaks since then.

I am inviting the people who are experienced with localization, especially in new languages and wikis, to express their opinions about how this should be done this time.

My initial proposal:

  1. The number of core MediaWiki messages in this list should be the same as the number required for export. I think that 10% of core (~370) is OK, but I'm fine with any other number.
  2. Messages from extensions should appear there, too, of course. The total shouldn't be higher than 500.
  3. There needs to be a list of messages that will be filtered out even if they are frequently loaded in practice. For example:
  4. The sequence of messages in the group must be the same as it is in the original JSON files. If they are ordered by the alphabet or by frequency, it will be inconvenient for translators. I saw many times that new translators are confused by the mix of out-of-sequence month names, messages about blocking, Exif tags, etc. That's the reason for this manual commit, for example—I witnessed how it made translation much more welcoming for new languages.
  5. Data collection should run in several usage scenarios, and combined:
    1. Logged-in user editing with wiki syntax.
    2. Logged-in user editing with Visual Editor.
    3. Using Content Translation (ideally it should be used a lot in new wikis)
    4. Anonymous user editing with wiki syntax.
    5. Anonymous user editing with Visual Editor.
    6. Anonymous desktop reader.
    7. Anonymous mobile web reader.
    8. The data needs to be collected either from all wikis or from several wikis. My proposal:
      1. English Wikipedia
      2. French Wikipedia
      3. English Wikisource
      4. Hindi Wikisource
      5. Wikidata
      6. Commons

Another possibility is to pick some important message manually by running some of the above scenarios. This will give a very precise and complete experience, but it can also have biases, and it's probably too manual. The process should be more easily repeatable and automatic, so that we'll be able to run it every few months and not every five years.

This is just the initial proposal. Please add your thoughts, and ask other relevant people.


Event Timeline

Amire80 created this task.Oct 6 2019, 11:53 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 6 2019, 11:53 AM
Amire80 updated the task description. (Show Details)Dec 2 2019, 6:59 AM
Nikerabbit moved this task from Backlog to External on the board.Jan 7 2020, 8:31 AM