Please set CVS download as unicode
for example see
http://quarry.wmflabs.org/query/1105
and compare the results on the web output and CVS downloads file
Description
Details
Related Objects
- Mentioned In
- rANQWe9d35c1fa16e: Add UTF-16 CSV output option
Event Timeline
Looks fine to me, can you re-check? I fixed a few Unicode issues in the past, this might have been fixed alongside.
Now I get query and download the cvs file and the result is
روز_آزادی_(Ø¢ÙØ±ÛŒÙ‚ای_جنوبی)
روز_آزادی_مستندات
روز_آزادی_Ù†Ø±Ù…â€ŒØ§ÙØ²Ø§Ø±
روز_Ø¢ÙØ±ÛŒÙ‚ا
روز_آنزاک
روز_آنلاین
روز_استقلال
روز_استقلال_(هند)
روز_استقلال_ایالات_Ù…ØªØØ¯Ù‡_آمریکا
روز_Ø§ÙØªØªØ§ØÛŒÙ‡_کلوزآپ
روز_اژدها
روز_باران_(Ùیلم)
and it should be
روز_آزادی_(آفریقای_جنوبی)
روز_آزادی_مستندات
روز_آزادی_نرمافزار
روز_آفریقا
روز_آنزاک
روز_آنلاین
روز_استقلال
روز_استقلال_(هند)
روز_استقلال_ایالات_متحده_آمریکا
روز_افتتاحیه_کلوزآپ
روز_اژدها
روز_باران_(فیلم)
CSV stores no encoding information, so you should tell Excel the file is UTF-8 when opening it. This seems to be non-trivial (see https://stackoverflow.com/questions/6002256/is-it-possible-to-force-excel-recognize-utf-8-csv-files-automatically )
One option might be to add .xlsx output, which probably supports non-MBCS encodings.
Saving as UTF-8, with 'BOM', with a .txt extension also seems to work (but not with a .csv extension)
Hmm, so perhaps we just need to output a BOM. But that makes me feel all kinds of weird for UTF-8, for unknown reasons :)
After some more testing, Excel seems to understand the CSV if it's in UTF-16le instead of utf-8. That should be easy to add as an extra option (one could call it 'UTF-16 CSV' or 'Excel' ;-))
Change 217962 had a related patch set uploaded (by Merlijn van Deen):
Add UTF-16 CSV output option
Btw, separate from the encoding... what Excel considers to be a "CSV" actually depends on language settings in Windows. For example if you use a German Windows, the delimiter character by default will be a ";" and not a "," , So technically it would then be "semicolon seperated values" ("SSV"). Then if you double click on it Excel just opens it.. otherwise you have to manually specify which character is the delimeter and "import" the file in Excel or change it in Windows settings itself.
I cannot think of a clean way to hack a BOM into the response, or cleanly set the response encoding in the generator. Will add a https://github.com/jmcnamara/XlsxWriter-based xlsx format output.
The response is constructed from a generator that's unaware of outside parameters (encoding / BOM), and once the response is constructed, I'm unaware of a way to modify it (for some code that's aware of them).
And regarding why not unconditionally add BOM:
... because BOM doesn't make sense for UTF-8 encodings (the UTF-8 byte-order is constant, unlike in UTF-16). There may be csv parsers that does not expect a BOM and may have unexpected outputs when BOM is included (BOM in first header); it's usually Microsoft programs that wouldn't function without BOM.
Also considering T76126#2327571 & T76126#800965, why bother to add such a feature if it won't work consistently?
So this why I asked you to add this format, not replace. For me it always worked, so you don't remove anything, there will be at least the same functionality. But you are the boss.
Change 367331 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[analytics/quarry/web@master] output: Add xlsx with the support of xlsxwriter
Looks nice.
I don't know why, but it is opened by 7-Zip, not Excel. I hope it's phabticator problem, not Quarry's.
Mentioned in SAL (#wikimedia-cloud) [2017-07-30T20:10:23Z] <zhuyifei1999_> Yuvi gave me access after I asked about T76126
Change 368597 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[operations/puppet@production] Quarry: Add package 'python-xlsxwriter'
Change 367331 merged by jenkins-bot:
[analytics/quarry/web@master] output: Add xlsx with the support of xlsxwriter
Mentioned in SAL (#wikimedia-cloud) [2017-07-30T21:14:51Z] <zhuyifei1999_> sudo git fetch; sudo git checkout 172eb7e on /srv/quarry T76126
Mentioned in SAL (#wikimedia-cloud) [2017-07-30T21:17:07Z] <zhuyifei1999_> sudo service uwsgi restart T76126
Package python-xlsxwriter was hand installed on quarry-main-01 with sudo apt install python-xlsxwriter.
This should be now fixed; please reopen if you experience issues.
@zhuyifei1999, thank you very much! I have not excel today, so I asked somebody to check. He is absolutely happy.
Change 368597 merged by Rush:
[operations/puppet@production] Quarry: Add package 'python-xlsxwriter'