Page MenuHomePhabricator

Mailman password reminder mail (and other texts) has broken encoding in Czech
Closed, ResolvedPublic

Description

The monthly password reminder e-mail sent by Mailman to subscribers has broken encoding in (at least) the Czech version.

The message is received (with Content-Type: text/plain; charset="utf-8" and Content-Transfer-Encoding: base64) as
VGF0byB6cHLvv712YSBqZSB6YXPvv71s77+9bmEgamVkbm91IG3vv71z77+977+9bu+/vSwgYWJ5[…] which is “Tato zpr�va je zas�l�na jednou m�s��n�, aby”[…].

AFAICT, the problem is that, in 711af11b1b87c0cc7732ad93776aa8c8a6d4089b (T261031), the configuration of Mailman was changed so that the text encoding for all languages was set to utf-8. While we all love Unicode and UTF-8, the problem is that the real contents of all the encoded text files in Mailman was not changed. And as this configured encoding is used to convert the contents of the e-mail text template file (/templates/cs/cronpass.txt), which is, in fact, still in iso-8859-2, to Unicode, it means the final e-mail text is full of Unicode replacement characters.

I guess the problem might not be limited to a single template in a single language. (And it is probably the reason why e.g. https://lists.wikimedia.org/mailman/listinfo/wikics-l is broken as well.)

Event Timeline

Joe triaged this task as High priority.Jan 4 2021, 4:38 PM
Joe added subscribers: herron, Ladsgroup, Joe.

@Ladsgroup @herron Can you take a look? I guess we just need to add czech to the exception languages?

oh boy. My suggestion is that for sake of uniformity and ease of maintenance, we should convert all the files to use utf-8 instead. Would that make sense?

+1 from me. I think it should not be worse than the current state. :-) (Read: It might break something but that thing is probably already broken now.) But if there is a Mailman expert (definitely not me), please stand up!

oh boy. My suggestion is that for sake of uniformity and ease of maintenance, we should convert all the files to use utf-8 instead. Would that make sense?

Those files are provided by the debian package, and are not considered convfiguration files. This would mean that every time we update the debian package from the distro, those will get overwritten until the next puppet run; I would prefer not to go down that path, or down the path of rebuilding the package ourselves.

Those files are provided by the debian package, and are not considered convfiguration files. This would mean that every time we update the debian package from the distro, those will get overwritten until the next puppet run; I would prefer not to go down that path, or down the path of rebuilding the package ourselves.

Well, in that case, you’d have to revert the encoding change for all languages, I guess… IIANM all languages not having an encoding compatible with UTF-8 are broken with this right now. (I randomly tested a subscription request in Polish and Japanese, both e-mails were broken.)

The problem is that if we revert that (or for example change the pt_BR to iso-8859-1 instead of utf-8), pages like https://lists.wikimedia.org/mailman/listinfo/unblock-pt-l would break. The proper solution is the upgrade to mailman3 (T52864: Upgrade GNU Mailman from 2.1 to Mailman3)

Yes, they would break because they are in the wrong encoding.

Change 654915 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] mailman: set Czech language to iso-8859-2

https://gerrit.wikimedia.org/r/654915

In my testing this corrected the issue with broken characters on listinfo

Change 654915 merged by Herron:
[operations/puppet@production] mailman: set Czech language to iso-8859-2

https://gerrit.wikimedia.org/r/654915

Was hoping for some feedback on the above patch, but since it's been a few days I've gone ahead and merged it. The listinfo page in this task description looks to have improved to me, in that copy/pasting a sampling of text into a translator gives back a meaningful result. How does it look to you @Mormegil?

Well, yes, for Czech, the subscription confirmation e-mail seems to be sent correctly, now. But as I said above, it is a problem for all languages using non-utf8 encoding in Mailman. E.g. Polish is still broken (“Pro�ba o potwierdzenie zapisania si� na list�”).

And interestingly, the confirmation page at https://lists.wikimedia.org/mailman/subscribe/wikimediacz-l is partially broken for Czech (“WikimediaCZ-l Výsledky přihlášení / ObdrĹželi jsme VaĹĄĂ­ Şådost o přihlĂĄĹĄenĂ­ do konference.”) but correct for Polish (“Została otrzymana prośba o zapisanie się.”). But that is not that important.

That's what I have been saying, if you fix something, it breaks something else. It's a whack-a-mole at the current state.

Well, yes, for Czech, the subscription confirmation e-mail seems to be sent correctly, now. But as I said above, it is a problem for all languages using non-utf8 encoding in Mailman. E.g. Polish is still broken (“Pro�ba o potwierdzenie zapisania si� na list�”).

Does this still need fixing? (Maybe this task should be renamed to be more than just Czech?)

This mailing list is one mm3 now https://lists.wikimedia.org/postorius/lists/wikics-l.lists.wikimedia.org/

Except a very few mailing lists, all are now migrated to mailman3, I call this done.