Page MenuHomePhabricator

Mailman cannot correctly decode GB2312-superset mails labelled as GB2312 (non-standard behavior)
Open, MediumPublic

Description

Per the WHATWG (also W3C, choose the one you like) Encoding Technical Recommendation, all MIME text labelled as gb2312 should be treated as gbk (and consequently, a gb18030-gbk decoder) to properly handle mixtures from the later supersets. However, Mailman does not seem to use this behavior for such admixtures and spits back the raw base64 text, as noticed in zhwp's VPT.

This issue needs to be fixed in two steps:

  1. Aliasing. gb2312 should at least be aliased to gbk.
  2. Making a "union" decoder. WHATWG's TR uses a joint decoder of gb18030 and gbk, or in more simple terms a gb18030 decoder that also understands gbk (cp936)'s single-byte euro sign (U+20AC) at 0x80.

In addition to the GB's, Mailman should probably check for more of such aliasing problems highlighted in WHATWG's TR; after all, the wild web has so much non-standard behavior that browser makers did end up writing a "how to work with nonstandard things" guide.

Event Timeline

Does this seem to be an issue with the OTRS system?

我说,OTRS的问题去OTRS那边提issue啊,phabricator这边也管不到啊。还有,我在VPT上提的问题是mailman的,中文维基unblock没用OTRS。

Arthur2e5 renamed this task from OTRS cannot correctly decode GB2312-superset mails labelled as GB2312 (non-standard behavior) to Mailman cannot correctly decode GB2312-superset mails labelled as GB2312 (non-standard behavior).Aug 25 2017, 6:25 PM
Arthur2e5 updated the task description. (Show Details)

Could someone clarify whether this is an issue with Mailman, with OTRS, or both? If it's both, we should have dedicated tickets for each as they will require different solutions.