Page MenuHomePhabricator

Unable to migrate images data to Windows boxes because of "special characters"
Open, Needs TriagePublic

Description

Windows boxes (to be exact filesystems) do not support many characters in the file names. For WikiFundi (https://github.com/openzim/wikifundi/issues/72) we create and configure a Mediawiki with data on a Linux/ext4 boxes. But then we need to migrate the content to an exfat fs. Then we have problems, for example we have a many filename swith the character "?" (question mark), something which is not possible to store on a exfat.

In the documentation at https://www.mediawiki.org/wiki/Manual:FileBackend.php it is written that filenames are saved in a way to be supported on Windows ("Use ASCII file names (e.g. base32, IDs, hashes) to avoid Unicode issues in Windows")... but seems not true and make the dataset in general impossible to migrate properly.

What is the solution of this problem? Do we have a way to force "normalised filenames" for images?

This bug is related to https://phabricator.wikimedia.org/T3780 and to some extend I suspect that a patch might have introduced the problem here.

Infos:

  • MediaWiki: 1.31.0-rc.0
  • PHP: 7.0.30-0+deb9u1 (fpm-fcgi)

Event Timeline

Kelson created this task.Jul 9 2018, 6:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 9 2018, 6:27 PM
Kelson updated the task description. (Show Details)Jul 9 2018, 6:28 PM

I think the best and most future-proof solution will be to enable sha1 filenames. Something like:

$wgLocalFileRepo = [
	'class' => 'LocalRepo',
	'name' => 'local',
	'directory' => $wgUploadDirectory,
	'scriptDirUrl' => $wgScriptPath,
	'url' => $wgUploadBaseUrl ? $wgUploadBaseUrl . $wgUploadPath : $wgUploadPath,
	'hashLevels' => $wgHashedUploadDirectory ? 2 : 0,
	'thumbScriptUrl' => $wgThumbnailScriptPath,
	'transformVia404' => !$wgGenerateThumbnailOnParse,
	'deletedDir' => $wgDeletedDirectory,
	'deletedHashLevels' => $wgHashedUploadDirectory ? 3 : 0,
	'storageLayout' => 'sha1' # <-- THIS IS THE MAIN CHANGE
];

To migrate an existing wiki, you should be able to run php maintenance/migrateFileRepoLayout.php --oldlayout=name --newlayout=sha1 before you update the configuration in your LocalSettings.php.

@Legoktm Thank you for your solution proposal. How would that behave if users want to download an image file? Would that mean that they get a file name with something like hash_value.jpg ?