Non-alphanumeric characters in wgDBname break file uploads
Open, NormalPublic

Description

The default upload directory contains the database name, and is validated by FileBackendStore::isValidContainerName() which only allows alphanumberic characters, - and _, so any other character in the database name breaks uploads completely and results in an error message saying Could not create directory "mwstore://local-backend/local-public/<path>". The dot character (disallowed, apparently, for Swift compatibility) especially is a common pain point since databases are sometimes named after domain names.

The container name is generated from the DB name in FileBackendStore::fullContainerName(), so invalid characters should probably be escaped there.

Details

Reference
bz44066
bzimport raised the priority of this task from to Low.
bzimport set Reference to bz44066.
bzimport added a subscriber: Unknown Object (MLST).
bzimport created this task.Jan 17 2013, 4:28 PM

Hi grotlek, thanks for reporting and tracking this down!

In current git master (1.21wmf8) this function is located in the file
./includes/filebackend/FileBackendStore.php and you refer to this regex:

return preg_match( '/^[a-z0-9][a-z0-9-_]{0,199}$/i', $container );

grotlek wrote:

Hi Andre,

Yep, that's the one. It looks like it must have moved since 1.19. Sorry I don't have the code near me -- it's in a cold, dark and noisy room -- but from that, it looks like it's doing a case insensitive match anyways, so the only addition I'd suggest is a . character in the second square brackets. (I'm not sure whether I missed spotting that previously -- I was on a "just get it working" mission :-).

Of course, while this fixes my problem, I could not tell you whether:
a) it would have any unwanted side effects or implications elsewhere.
b) it would then make the regex cover all valid scenarios (i.e., do people use other characters?)

I was a bit hesitant from posting this at all, as I couldn't even figure out why it was fiddling with with the database name at all during file upload (it isn't using it in any obvious way -- the uploaded files don't go to a directory containing the DB name, for example). Given this, I'll back out at this point and let you more-knowledgeable folks handle it (if any change is warranted).

  • Grotty.
Ejegg added a comment.Sep 11 2013, 4:08 AM

44949 is another report of this bug. The regex is so strict to catch any container names that would break cloud storage backends that may use a '.' to separate namespaces. I just submitted a patch that leaves the strict regex in FileBackendStore but overrides it in FSFileBackend with a much more relaxed one that allows anything but '/'.

Ejegg added a comment.Sep 11 2013, 4:55 AM

I suppose it would be appropriate to link to that commit: https://gerrit.wikimedia.org/r/83774/

FWIW due to how misleading the resulting error message is I think at least that should be improved, it is really confusing for end users particularly because searching for the error string leads to completely unrelated problems

Tgr renamed this task from Files cannot be uploaded when database name contains certain characters due to regex check in FileBackendStore.php to Non-alphanumeric characters in wgDBname break file uploads.Jan 19 2015, 10:40 PM
Tgr updated the task description. (Show Details)
Tgr added a project: MediaWiki-File-management.
Tgr set Security to None.
Tgr added subscribers: Krenair, Umherirrender, SmartK, aaron.
Tgr added a subscriber: Tgr.Jan 19 2015, 10:43 PM

Replacing dots with underscores would be an easy solution, as long as we can ensure uniqueness.

Steinsplitter moved this task from Incoming to Uploading on the Commons board.
Tgr added a comment.Apr 7 2015, 4:49 PM

Swift container names can contain any UTF-8 character except /. S3 bucket names can only contain lowercase letters, numbers, and hyphens; the current FileBackendStorage container name rules already fail that. So there doesn't seem to be much to lose from making the rules less strict.

Tgr raised the priority of this task from Low to Normal.EditedApr 7 2015, 4:50 PM

Upping priority; at a minimum there should be a sensible error message explaining workarounds.

Tgr claimed this task.Apr 7 2015, 4:50 PM
Legoktm added a subscriber: Legoktm.Jun 7 2015, 3:37 AM

@Tgr any update?

https://scripts.mit.edu/trac/ticket/412 is a downstream ticket I was pinged about.

Change 83774 had a related patch set uploaded (by Legoktm):
filebackend: FS needn't be so strict with names

https://gerrit.wikimedia.org/r/83774

Change 219507 had a related patch set uploaded (by Aaron Schulz):
Allow "." in filebackend container prefixes

https://gerrit.wikimedia.org/r/219507

Change 219507 merged by jenkins-bot:
Allow "." in filebackend container prefixes

https://gerrit.wikimedia.org/r/219507

Restricted Application added subscribers: Steinsplitter, Matanya. · View Herald TranscriptJul 11 2015, 1:28 AM
Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 6:32 PM

Change 83774 abandoned by Ejegg:
filebackend: FS needn't be so strict with names

https://gerrit.wikimedia.org/r/83774