Page MenuHomePhabricator

mailman's public list index (listinfo) has the wrong encoding in its Content-Type header
Closed, ResolvedPublic

Description

$ curl -Is "https://lists.wikimedia.org/mailman/listinfo/wikiuk-l" | grep Content-Type
Content-Type: text/html; charset=utf-8

$ curl -Is "https://lists.wikimedia.org/mailman/listinfo" | grep Content-Type
Content-Type: text/html; charset=us-ascii


As you can see, the index at https://lists.wikimedia.org/mailman/listinfo specifies a charset of us-ascii. This is wrong. It should be specifying utf-8.

This bug is related to bug 37817 ("lists.wikimedia.org encoding issues in description").


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=37817

Details

Reference
bz40971

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:44 AM
bzimport set Reference to bz40971.
bzimport added a subscriber: Unknown Object (MLST).
Dzahn added a comment.Jan 3 2013, 10:02 PM

per suggestion of thehelpfulone:

in /var/lib/mailman/Mailman/htmlformat.py line 300 there is "charset =".

we changed that from us-ascii to utf-8 and deleted the matching .pyc bytecode file too.. but this does not appear to fix it.

Dzahn added a comment.Jan 3 2013, 10:02 PM

so..i would say "upstream bug"... Wikimedia-Mailing-lists channel agrees:

kjetilho> yeah, UTF-8 would be the sensible default in just about any distro

Dzahn added a comment.Jan 3 2013, 10:27 PM

ok, so i found Defaults.py which sets a charset per language, like:

add_language('en', _('English (USA)'), 'us-ascii', 'ltr')

changing that to

add_language('en', _('English (USA)'), 'utf-8', 'ltr')

actually makes the listinfo overview page utf-8.

..but..the list descriptions of non-English lists will still appear broken, because they are not utf-8.. it depends on the language..

and languages and descriptions can bet set by list admins...

No plans to investigate further on the WM site, candidate for upstreaming.

Looks good now?
$ curl -Is "https://lists.wikimedia.org/mailman/listinfo" | grep Content-Type
Content-Type: text/html; charset=utf-8

Yep, this seems to be fixed. Thanks for marking this as such, Nemo. Bug 37817 remains unresolved, I believe.

saper added a subscriber: saper.Jan 26 2015, 12:33 AM

This works only for English pages....

For Polish we still have "iso-8859-2"

radziecki$ curl -Is "https://lists.wikimedia.org/mailman/listinfo" | grep Content-Type
Content-Type: text/html; charset=utf-8
radziecki$ curl -Is "https://lists.wikimedia.org/mailman/listinfo/wikimediapl-l" | grep Content-Type
Content-Type: text/html; charset=iso-8859-2

... although the messages look UTF-8 now...

Restricted Application added a subscriber: Matanya. · View Herald TranscriptSep 30 2015, 6:55 PM