Page MenuHomePhabricator

Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" {dove}
Closed, ResolvedPublic

Description

Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646"

Names should appear "redable" regardless of language.

Event Timeline

Nuria created this task.Mar 17 2015, 11:15 PM
Nuria raised the priority of this task from to Needs Triage.
Nuria updated the task description. (Show Details)
Nuria added subscribers: Nuria, kevinator.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2015, 11:15 PM

Ah, I see the bug now. I had to download the file and open it in a text editor.
My JSON viewer plugin in chrome displayed the chars correctly.

Fhocutt set Security to None.
Fhocutt added a subscriber: Fhocutt.

I'll look into this--looks like I didn't handle encoding properly if this is happening.

@Fhocutt, @Nuria is also looking into this presently. We have run into may encoding issues in the past and she thinks she may be able to quickly solve this one. Encoding issues in Python are not always an easy problem to solve.

Ok, great! If she fixes it I'll see how she did it and know for next time.

Fhocutt removed Fhocutt as the assignee of this task.Mar 18 2015, 9:31 PM
kevinator triaged this task as Normal priority.Mar 19 2015, 2:29 PM

It looks like it may be an issue with default Flask settings, specifically JSON_AS_ASCII. From http://flask.pocoo.org/docs/0.10/config/#builtin-configuration-values:

By default Flask serialize object to ascii-encoded JSON. If this is set to False Flask will not encode to ASCII and output strings as-is and return unicode strings. jsonfiy [jsonify?] will automatically encode it in utf-8 then for transport for instance.

However, adding the line

JSON_AS_ASCII                       : False

to web_config.yaml and then reloading doesn't change the output report for my test UTF-8 cohort, so it's not just that.

Change 199814 had a related patch set uploaded (by Fhocutt):
Make non-Latin characters display in json reports - WIP

https://gerrit.wikimedia.org/r/199814

It looks like my JSON browser plugin, and Apple's TextEdit are messing up the order of the pipe delimited fields when there's a right to left char on the line. The fields are not consistently "username|id|wiki|x". When there's right to left chars on that line, it displays as "id|username|wiki|x" or "wiki|x|ID|username|"
Can you confirm these are bugs in my viewers and not the data in the file?

Chrome JSON Viewer:

Apple TextEdit

Nuria added a comment.Apr 2 2015, 12:33 AM

In this case is the right to left versus left to right what is tricking the browser display. You can verify that using, for example, russian names written in cyrilic. The reordering doesn't happen.

Nuria closed this task as Resolved.Apr 2 2015, 2:07 PM

Moving to Done column as this was deployed last week (around April 1st)

Change 203505 had a related patch set uploaded (by Nuria):
Fixing encoding on json responses at the encoder level

https://gerrit.wikimedia.org/r/203505

Nuria reopened this task as Open.Apr 11 2015, 2:23 AM
Nuria claimed this task.
Nuria moved this task from Done to In Code Review on the Analytics-Kanban board.

Change 204214 had a related patch set uploaded (by Nuria):
Undoing Ide98e20eb54523353153ccd212df5...

https://gerrit.wikimedia.org/r/204214

Change 204214 merged by jenkins-bot:
Undoing Ide98e20eb54523353153ccd212df511a9298bd16

https://gerrit.wikimedia.org/r/204214

@Nuria saw that you reopened this task. Quick question: does it still block @T74747, and if not should I update the task description to indicate that? I want to make sure I'm keeping straight what's deployed and what's not. Thanks!

Nuria added a comment.Apr 20 2015, 6:12 PM

@Cap_swing: T74747 is deployed.

There is a bug on the report when user name has non ascii chars and that is what this bug is about, but functionality that provides user names on json reports is alredy deployed.

Let me know if this solves your question.

Nuria removed Nuria as the assignee of this task.Apr 22 2015, 3:46 PM
kevinator renamed this task from Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" to Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" {dove}.Apr 27 2015, 3:26 PM
mforns claimed this task.Apr 27 2015, 8:25 PM
mforns moved this task from Paused to In Progress on the Analytics-Kanban board.Apr 29 2015, 3:28 PM
mforns moved this task from In Progress to In Code Review on the Analytics-Kanban board.
mforns moved this task from In Code Review to In Progress on the Analytics-Kanban board.

Change 203505 merged by Mforns:
Fixing encoding on json responses at the encoder level

https://gerrit.wikimedia.org/r/203505

mforns closed this task as Resolved.May 12 2015, 3:42 PM
mforns moved this task from Ready to Deploy to Done on the Analytics-Kanban board.