Page MenuHomePhabricator

convertNamespaceFromWikitext.php corrupts non-ASCII characters in description templates
Closed, ResolvedPublic


Open the wikitext editor on the board description for and you'll see:

{{echo|café blah blahhhh 


{{Pàgina de discussió en wikitext convertida a Flow|archive=Ajuda Discussió:Elena2/Archive 1|date=2015-06-18}}

At you can see that the actual text was meant to be:

café blah blah


So it looks like there's a UTF-8 vs ISO-8859-1 mis-conversion or something like that going on.

Event Timeline

Catrope created this task.Jun 18 2015, 9:54 PM
Catrope raised the priority of this task from to Needs Triage.
Catrope updated the task description. (Show Details)
Catrope added a project: StructuredDiscussions.
Catrope added a subscriber: Catrope.
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptJun 18 2015, 9:54 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I can reproduce this on localhost too, it's not just beta being weird.

Confirmed that this is an ISO-8859-1 -> UTF-8 conversion on text that is already UTF-8. Only happens in the conversion script, not when editing board descriptions.

Change 219315 had a related patch set uploaded (by Catrope):
Import\Wikitext\ImportSource::extractTemplates(): Explictly specify UTF-8

Change 219315 merged by jenkins-bot:
Import\Wikitext\ImportSource::extractTemplates(): Explictly specify UTF-8

Etonkovidova added a subscriber: Etonkovidova.EditedJun 19 2015, 6:16 PM

Checked(with Flow/maintenance/convertNamespaceFromWikitext.php) on with
{{echo|café blah blah
йкчйккйч добавлен тест и другие языки としとしとちとちと ി്കതോേ്േ്}} -- European char, Cyrrilic, HIragana, and Malayalam.

DannyH closed this task as Resolved.Jun 22 2015, 4:55 PM
DannyH added a subscriber: DannyH.