Page MenuHomePhabricator

Encoding issue for CSV imports
Closed, InvalidPublic

Description

Setup

  • MediaWiki 1.27.1 (4804ae5) 10. Nov. 2016, 18:11 / also tested MW 1.24
  • PHP 5.6.26-0+deb8u1 (apache2handler)
  • MySQL 5.5.52-0+deb8u1
  • Data Transfer 0.6.2 (ca04f78) 16. Jun. 2016, 16:56 / same for current master

Issue
When importing a CSV file with UTF-8 encoding invalid chars are imported, e.g.

2016-11-20 18:26:31 dtImport Datei:1530_SK_Cham_Knüsel_4.jpg user_id=2 edit_summary=CSV-Import for_pages_that_exist=overwrite text={{Datei
|Name=1530 SK Cham Knüsel 4.jpg
|Beschreibung=Schwingclub Cham-Ennetsee; Eidgenössische Schwing- und Älplerfest in Sion vom 23./24. August 1986, Harry Knüsel bezwingt Ernst Schläpfer; Quelle: 50 Jahre 1961–2011 Schwingklub Cham-Ennetsee
}}

0.6.1 and lower give me the expected import for the identical import file

2016-11-20 18:30:31 dtImport Datei:1530_SK_Cham_Knüsel_4.jpg user_id=2 edit_summary=CSV-Import for_pages_that_exist=overwrite text={{Datei
|Name=1530 SK Cham Knüsel 4.jpg
|Beschreibung=Schwingclub Cham-Ennetsee; Eidgenössische Schwing- und Älplerfest in Sion vom 23./24. August 1986, Harry Knüsel bezwingt Ernst Schläpfer; Quelle: 50 Jahre 1961–2011 Schwingklub Cham-Ennetsee
}}

So I guess something broke between the 0.6.1 and 0.6.2. Thus I suspect rEDTR8e3beaa3b54f166da8d0405f9f680ede60beacda since this is the only commit touching CSV between these two versions. Still I may be wrong with this assessment.

I wonder why I am the first one to detect this.

Event Timeline

Kghbln created this task.Nov 20 2016, 6:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 20 2016, 6:41 PM
Kghbln updated the task description. (Show Details)Nov 20 2016, 6:41 PM
Kghbln updated the task description. (Show Details)

Could it be that the encoding is actually UTF-16? I just want to make sure it's not something simple.

Kghbln closed this task as Invalid.Feb 1 2017, 10:59 PM

I am first of all sorry not for coming back to this issue for a long time.

Could it be that the encoding is actually UTF-16?

I have now tested using the UTF-16 encoding, i.e. saving the CSV as UTF-16 as well as selecting the UTF-16 option. This indeed works. Also re-saving the UTF-16 encoded file as UTF-8 and then importing as UTF-8 works also. Since only saving the .ods directly as UTF-8 and then trying to import as UTF-8 failed, I presume that the issue may very well be located at the Libre Office Calc software which is somehow having an issue saving a CSV as UTF-8. To cut it short: In case of issue one should try to loop via UTF-16 or use UTF-16 directly. Thus closing as invalid. PS The only weird thing is that earlier versions somehow were more tolerant in such a situation.