Page MenuHomePhabricator

Upload: can't save file names with special characters to ntfs filesystem
Closed, ResolvedPublic

Description

Author: donpaolo

Description:
I'm using mediawiki on a Windows Server 2003.

When I upload a file and I tell mediawiki to store it with a file name with special characters (i.e. accented characters, like à, é, or ñ, etc.), the file is stored in a wrong way: à -> A with a ~ + ï (i think).

It seems a utf-8 to iso-8859-1 (or the contrary) stuff.

I think it's because ntfs stores file names with iso-8859-1 charset, so that when mediawiki passes the file name in utf-8 charset, ntfs interprets it as a iso-8859-1 string.

Experimenting on my own with uploading and saving a file from php page, I found a solution:

In the case that the upload ends with the instruction

copy ( $tempfile , $filename ) ;

you should change it into

copy ( $tempfile , utf8_decode ( $filename ) ) ;

That seems to eliminate the problem on Windows Server 2003.


Version: 1.13.x
Severity: normal
OS: Windows Server 2003
Platform: PC

Details

Reference
bz15863

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:17 PM
bzimport set Reference to bz15863.

Confirmed in trunk. Also, where did you find:

copy ( $tempfile , $filename );

Can't find this in trunk :)

Paolo's patch

It's at FileStore.php, the space is not in trunk.
I'm attaching it as a patch, but I'm sure utf8_decode would need to be added on other places as well. filerepo/FSRepo also does several actions directly on the filesystem, thumb.php...

Attached:

fran wrote:

The utf8_decode solution wouldn't work - this function only converts to ISO-8859-1, and all filenames in non-Latin scripts would be completely corrupted.

NTFS actually uses Unicode internally. The problem lies in PHP, which naïvely assumes all filenames use eight-bit strings... which it does on Unix, but Windows uses wide character strings, and a separate call, _wfopen(), is used to access Unicode filenames on Win32. Until PHP gains proper Unicode support (currently scheduled for right after porcine flight is achieved) the only solution I can think of is for MediaWiki to mangle non-ASCII characters in the filename in a predictable, round-trippable way.

donpaolo wrote:

(In reply to comment #1)

Confirmed in trunk. Also, where did you find:

copy ( $tempfile , $filename );

Can't find this in trunk :)

No, I supposed that the file is saved with some instruction similar to that, either could be a rename or something else.

I didn't submit a patch because I don't know well mediawiki's code.

Duping to bug 1780

  • This bug has been marked as a duplicate of bug 1780 ***