Page MenuHomePhabricator

Add DiscordWikiBot to Translatewiki.net
Closed, ResolvedPublic

Description

DiscordWikiBot is a console application that powers WikiBot, a Discord bot used by a number of Wikimedia community servers on Discord. It currently has scattered translations in 9 languages supported by the Discord community server owners (see list). To make translation process easier both for myself and translators, I would like to add the project to Translatewiki.net.

Messages could live under Wikimedia namespace if possible (with prefix dwb- or discordwikibot-). I will probably prefer less frequent commits (every two weeks?). There’s a lot of really small messages, so maybe the threshold also should be higher (25-50%?). Deployments are usually happening whenever there’s an update, but I will probably do them once a month anyway for new translations.

I’ll add message documentation and translators page around the time when we will add the messages to Translatewiki.net.

  • Name: DiscordWikiBot
  • Logo: none yet
  • Repo: https://github.com/stjohann/DiscordWikiBot (MIT licence)
  • Description: Discord bot for Wikimedia projects and wiki sites
  • File format: Same as MediaWiki’s, except for differences in message syntax. Path to messages is ./DiscordWikiBot/i18n.

Localisation notes
Localisation files use SmartFormat library’s syntax. Plural support is available, but is done slightly differently ({0:singular|plural}). Gender support is available, but not used or supported in the bot.

Discord’s Markdown syntax displays line breaks immediately (\n), unlike MediaWiki’s syntax (\n\n). Discord (not the bot) has no RTL support (see feature request).

Possible problems
Existing Serbian localisation is filed under sr.json (Serbian), but possibly should be under sr-ec.json (Serbian Cyrillic).

There are some messages that could count as ‘lego messages’ (bullets/dashes), let me know if it is better to fix it.

Event Timeline

Some observations:

  • Should add mandatory insertable variable validator for {\d+}
  • Should add insertable for {msg:.+}
  • Should add plural form count validator. Needs a custom validator due to different syntax from all others
  • "yes-no": "{0:yes|no}", is problematic because if we want to enforce correct number of plural forms, this message would fail in languages with more or less forms. This ambiguous syntax is not good. Can they use separate messages?

Aside, why is the translatewiki.net deadline in the message on Thursday?

"yes-no": "{0:yes|no}", is problematic because if we want to enforce correct number of plural forms, this message would fail in languages with more or less forms. This ambiguous syntax is not good. Can they use separate messages?

Sure, I’ll change it. But just FYI, that syntax can also mean a true-false statement, like here, not just a plural form. I guess it is problematic to use it due to the ambiguity.

Aside, why is the translatewiki.net deadline in the message on Thursday?

The logic behind this was that on Thursdays WMF deployments happen, which is how the most recent localisation gets displayed. I believe I’ve asked on IRC about this before, but wasn’t sure about how to proceed, so I left it at that.

Tuesday is when new translations will be picked up by WMF, so reviews should happen by Monday.

Apparently all but one of the i18n files have a Byte-order Mark in the beginning, which causes json_decode to silently(1) fail to parse them.

Noted about Monday, thank you.

What you describe is, apparently, Visual Studio’s default way of saving UTF-8 files. The one file that is unaffected was probably the one that was submitted by a pull request on Github (hr.json). I can fix this later today if that is causing problems.

Change 546337 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] SimpleFFS: Strip Byte-Order Mark (BOM)

https://gerrit.wikimedia.org/r/546337

Change 546353 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[translatewiki@master] Add DiscordWikiBot

https://gerrit.wikimedia.org/r/546353

@Nikerabbit: Should I rename sr.json to sr-ec.json or does it not matter? (Expecting to push all the fixes today.)

If you don't rename it, we need to add a codemap. But for consistency you should rename it.

I’ve split yes-no message everywhere, re-saved files without BOM and renamed sr.json to sr-ec.json. (Commit, contains some unrelated changes)

Change 546353 merged by jenkins-bot:
[translatewiki@master] Add DiscordWikiBot

https://gerrit.wikimedia.org/r/546353

Please add infotmation to https://translatewiki.net/wiki/Translating:DiscordWikiBot

We'll add the plural form validator a bit later.

Nikerabbit triaged this task as Medium priority.Oct 28 2019, 8:39 AM

Done, thank you. I will fill message documentation now, too.

Translations were exported from Translatewiki.net today to the repo. See commit - https://github.com/stjohann/DiscordWikiBot/commit/eda91b292b16479526f7848ff12a0c12e4392d01

Leaving this open since we still have to add the plural validators.

Change 547155 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[translatewiki@master] Add plural validator for DiscordWikiBot

https://gerrit.wikimedia.org/r/547155

Change 546337 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] SimpleFFS: Strip Byte-Order Mark (BOM)

https://gerrit.wikimedia.org/r/546337

Question: To be honest, I wanted less frequent updates than the current rate of twice a week, since I don’t really get to use them at that rate and I am not sure if there are any re-users. Is it possible to throttle it somehow, perhaps by adjusting the translation completion level needed for export?

I'd imagine the number of updates goes down as the most active languages complete the translation, unless you add new messages of course. Isn't it better to have the work waiting on git so you can use it whenever you need, rather than having the work stay at translatewiki.net where it certainly isn't going to be used.

Change 547155 merged by jenkins-bot:
[translatewiki@master] Add plural validator for DiscordWikiBot

https://gerrit.wikimedia.org/r/547155

We've deployed the plural validator for SmartForm on translatewiki.net yesterday, and then enabled it for this project today.

Works well. See screenshot below.

image.png (504×1 px, 67 KB)

I'd imagine the number of updates goes down as the most active languages complete the translation, unless you add new messages of course. Isn't it better to have the work waiting on git so you can use it whenever you need, rather than having the work stay at translatewiki.net where it certainly isn't going to be used.

I guess I can live with it.

We've deployed the plural validator for SmartForm on translatewiki.net yesterday, and then enabled it for this project today.

Tested for Russian, which I know:

image.png (258×656 px, 12 KB)

Why 4 forms are required here? I certainly can’t come up with 4 forms myself, hmm, and in MediaWiki it is 3 forms for Russian, not 4.

It comes from CLDR:

Category	Resolved String	Minimal Pair Template
one	из 1 книги за 1 день	из {NUMBER}  книги за {NUMBER}  день
few	из 2 книг за 2 дня	из {NUMBER}  книг за {NUMBER}  дня
many	из 5 книг за 5 дней	из {NUMBER}  книг за {NUMBER}  дней
other	из 1,5 книги за 1,5 дня	из {NUMBER}  книги за {NUMBER}  дня

From http://cldr.unicode.org/index/cldr-spec/plural-rules

Russian in MediaWiki has some particular shortcuts if I remember correctly. But it also has 4 forms, though nobody uses it fourth from in practise I think. It's not possibly to derive min-required and max-possible forms from the CLDR data. We can make the validator less strict, but it means it won't catch issues with too few plural forms provided.

Ah, so the fourth is 1.5 value, got it. Is it possible to set other as not important or can it be really useful in some languages? Not sure how to go around here, really. I guess as long as people can save the edit according to their common sense, it would be alright.

Couple of options here:

  • Degrade it from error to warning
  • Disable this validator for Russian for this project

Well, Russian translations shouldn’t be made anyway since I usually do them myself. Just not sure whether that will add problems in other languages or not (for example, other Slavic languages).

Couple of options here:

  • Degrade it from error to warning
  • Disable this validator for Russian for this project

This is a very belated question to ask in this task, but can we do the first option? I couldn’t save a (Russian) message without filling out the fourth plural form, which would not be used anywhere in the bot anyway, since I don’t have non-integers anywhere, so that is probably inconvenient for some translators into other languages. I would prefer ignoring non-integer plural forms somehow and keeping it as an error, but that is probably too much work.

Change 950074 had a related patch set uploaded (by Saint Johann; author: Saint Johann):

[translatewiki@master] Change DiscordWikiBot config

https://gerrit.wikimedia.org/r/950074

Btw, is there a way to turn off certain localisations? en-gb is probably unneeded since I basically wrote the English locale in British English anyway.

Great, hopefully, the patch above does what’s needed then.

Change 950074 merged by jenkins-bot:

[translatewiki@master] DiscordWikiBot: Disable translations to en-gb

https://gerrit.wikimedia.org/r/950074

Great, hopefully, the patch above does what’s needed then.

Deployed. Going forward translations to en-gb will not be allowed. You can delete the en-gb file that is currently in the DiscordWikiBot repository.

Hey, I’ve tried adding optional parameters to some messages, but that is currently strictly prohibited (use case is at https://translatewiki.net/wiki/Wikimedia:Discordwikibot-comma/qqq — translatewiki.net doesn’t allow ending space, but I need to be able to have some languages enter a version without a comma and some with a comma), is there a way to do that in translatewiki config?

Hey, I’ve tried adding optional parameters to some messages, but that is currently strictly prohibited (use case is at https://translatewiki.net/wiki/Wikimedia:Discordwikibot-comma/qqq — translatewiki.net doesn’t allow ending space, but I need to be able to have some languages enter a version without a comma and some with a comma), is there a way to do that in translatewiki config?

There are a few messages in MediaWiki that have an optional ending space, and the solution there is to use the HTML entity  . I don't know HTML entities work in your bot framework; but if they don't, you could theoretically do something like msg( 'comma' ).replace( ' ', ' ' ) for that specific message.

Yeah, they don’t. I avoided that sort of decision because it makes people think that HTML entities are OK to use in the messages (which is not true in my case). But if that’s preferred way and the only way to avoid this, then I guess I kind of have to.

Probably not the only way, but it is a way that is somewhat familiar to Translatewiki's translators (more so than an optional parameter is).
Besides, the presence of one HTML entity doesn't imply the acceptance of other HTML entities – the list of HTML entities that are accepted by MediaWiki messages in general and can be expected to be parsed correctly is actually surprisingly small.

That’s a great link, thank you. I guess I’ll just implement the same in my own locale parser and fix those messages. (Done.)

Please open a new task for future changes, thanks!