The time has come to update the most used MediaWiki messages. The previous time was in 2015 (T65416), and there were quite a lot of manual tweaks since then.
I am inviting the people who are experienced with localization, especially in new languages and wikis, to express their opinions about how this should be done this time.
My initial proposal:
# The number of //core// MediaWiki messages in this list should be the same as the number required for export. I think that 10% of core (~370) is OK, but I'm fine with any other number.
# Messages from extensions should appear there, too, of course. The total shouldn't be higher than 600.
# There needs to be a list of messages that will be filtered out even if they are frequently loaded in practice. Here are some examples:
#- https://gerrit.wikimedia.org/r/#/c/translatewiki/+/497118/
# The sequence of messages in the group must be the same as it is in the original JSON files. If they are ordered by the alphabet or by frequency, it will be inconvenient for translators. I saw many times that new translators are confused by the mix of out-of-sequence month names, messages about blocking, Exif tags, etc. That's the reason for [[ https://gerrit.wikimedia.org/r/#/c/translatewiki/+/529072/ | this manual commit ]], for example—I witnessed how it made translation much more welcoming for new languages.
# Data collection should run in several usage scenarios, and combined:
#- Logged-in user editing with wiki syntax.
#- Logged-in user editing with Visual Editor.
#- Anonymous user editing with wiki syntax.
#- Anonymous user editing with Visual Editor.
#- Anonymous desktop reader.
#- Anonymous mobile web reader.
#- The data needs to be collected either from //all// wikis or from several wikis. My proposal:
#-- English Wikipedia
#-- French Wikipedia
#-- English Wikisource
#-- Hindi Wikisource
#-- Wikidata
#-- Commons
Another possibility is to pick some important message manually by running some of the above scenarios. This will give a very precise and complete experience, but it can also have biases, and it's probably too manual. The process should be more easily repeatable and automatic, so that we'll be able to run it every few months and not every five years.
This is just the initial proposal. Please add your thoughts, and ask other relevant people.
Thanks!