Page MenuHomePhabricator

Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode
Closed, ResolvedPublic

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Aklapper renamed this task from unreadable mailing list description to Several unreadable mailing list descriptions due to wrong charset encodings, should be Unicode.Aug 22 2020, 10:56 AM

Hi @Aftabuzzaman, thanks for taking the time to report this!

Confirming. This is due to

$:acko\> curl -Is "https://lists.wikimedia.org/mailman/listinfo/wikipedia-bn-admins" | grep Content-Type
Content-Type: text/html; charset=us-ascii

but it should be Content-Type: text/html; charset=utf-8 instead.

See T42971: mailman's public list index (listinfo) has the wrong encoding in its Content-Type header and T39817: lists.wikimedia.org encoding issues in descriptions for older tickets about similar issues.

I don't know how to change it. Please change it for above mailing list or at least for /wikipedia-bn & /wikipedia-bn-admins.

jijiki triaged this task as Medium priority.Aug 24 2020, 10:23 PM
jijiki added a subscriber: herron.
Aklapper renamed this task from Several unreadable mailing list descriptions due to wrong charset encodings, should be Unicode to Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode.Sep 18 2020, 2:38 PM
Aklapper added a project: I18n.
Aklapper added subscribers: jhsoby, Ladsgroup.

imported comment from T263248:

the problem arose some time between May 2019 and August 2020 – I wish I could be more specific.

This would match the upgrading of the mailman server to the newer Debian distro version in T224586.

In the case of Japanese, the encoding set by the server was euc-jp, not ascii. When I manually override the encoding in the browser side, it is fixed until I reload. (Tested on Mozilla Firefox 81.0)

Wondering if backporting https://gitlab.com/mailman/mailman/-/commit/761c268bb7c7c7b91d3f962e5ca45c9a8387095f could help here. (And if that's related at all.)

It is very very likely unrelated, it's mailman3 which is a completely different world (for example it uses cfg for config files while mailman2 uses the good old .py file for storing configuration)

Change 631952 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] mailman: Make apache serve with utf-8 charset

https://gerrit.wikimedia.org/r/631952

Change 631952 merged by Herron:
[operations/puppet@production] mailman: Make apache serve with utf-8 charset

https://gerrit.wikimedia.org/r/631952

Change 632837 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] mailman: Set default charset in mailman2 configs

https://gerrit.wikimedia.org/r/632837

Change 632837 merged by Herron:
[operations/puppet@production] mailman: Set default charset in mailman2 configs

https://gerrit.wikimedia.org/r/632837

Screenshot_2020-10-18-19-57-52-677_com.google.android.gm.jpg (1×720 px, 402 KB)

Today I received the mail (see screenshot) from the mailing list of Bengali Wikipedia. Where, all the Bengali texts mark with???? symbol. What is the solution?

We are facing same issue on:

All text encoded, and emails reach users with (?) symbols.

capture-20201018-171053.png (422×695 px, 28 KB)

Hope if this issue can be fixed soon, as it stop us from using the Arabic Language mailing list, and what is based on from welcome messages and discussions.

I, in my volunteer capacity, have been trying to figure out what's going on and have tried several ideas (I even went through the code of mailman2 to see what's going on) and my latest puppet patch should have fixed it but it seems not... I try to dig deeper but the software behind this is REALLY old and we really should upgrade to its newer version (T52864: Upgrade GNU Mailman from 2.1 to Mailman3) I'm a little biased though.

Hi, we do not need more comments and confirmations that there is a problem. We all know that.
More people willing to investigate the Mailman software would be welcome though. Thanks.

Is the problem larger than the current title and description of this task suggest? The title of this task only mentions mailing list descriptions. Yet the two comments on Oct 18 say that we have a problem in body text of individual emails, at least more recently, too.

Is the problem larger than the current title and description of this task suggest? The title of this task only mentions mailing list descriptions. Yet the two comments on Oct 18 say that we have a problem in body text of individual emails, at least more recently, too.

It is but not too much, you get the mojibake email if you opt-in for digest emails, so general emails should be fine (I'm subscribed to a couple of mailing lists in Persian and they work just fine).

Change 637852 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] mailman: Set the charset utf-8 as charset of English

https://gerrit.wikimedia.org/r/637852

Change 637852 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] mailman: Set the charset utf-8 as charset of English

https://gerrit.wikimedia.org/r/637852

This should fix it for real. Any puppet review/merge of this would be extremely appreciated. @Dzahn @herron

Change 637852 merged by Herron:
[operations/puppet@production] mailman: Set the charset utf-8 as charset of English

https://gerrit.wikimedia.org/r/637852

Ladsgroup claimed this task.

It's fixed now \o/ https://lists.wikimedia.org/mailman/listinfo/wikipedia-bn

We really should upgrade :D

Change 639598 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] mailman: Set utf-8 charset for all languages

https://gerrit.wikimedia.org/r/639598

Change 639598 merged by Herron:
[operations/puppet@production] mailman: Set utf-8 charset for all languages

https://gerrit.wikimedia.org/r/639598