Page MenuHomePhabricator

Non-alphanumeric characters in wgDBname break file uploads
Open, MediumPublic

Description

The default upload directory contains the database name, and is validated by FileBackendStore::isValidContainerName() which only allows alphanumberic characters, - and _, so any other character in the database name breaks uploads completely and results in an error message saying Could not create directory "mwstore://local-backend/local-public/<path>". The dot character (disallowed, apparently, for Swift compatibility) especially is a common pain point since databases are sometimes named after domain names.

The container name is generated from the DB name in FileBackendStore::fullContainerName(), so invalid characters should probably be escaped there.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:21 AM
bzimport set Reference to bz44066.
bzimport added a subscriber: Unknown Object (MLST).

Hi grotlek, thanks for reporting and tracking this down!

In current git master (1.21wmf8) this function is located in the file
./includes/filebackend/FileBackendStore.php and you refer to this regex:

return preg_match( '/^[a-z0-9][a-z0-9-_]{0,199}$/i', $container );

grotlek wrote:

Hi Andre,

Yep, that's the one. It looks like it must have moved since 1.19. Sorry I don't have the code near me -- it's in a cold, dark and noisy room -- but from that, it looks like it's doing a case insensitive match anyways, so the only addition I'd suggest is a . character in the second square brackets. (I'm not sure whether I missed spotting that previously -- I was on a "just get it working" mission :-).

Of course, while this fixes my problem, I could not tell you whether:
a) it would have any unwanted side effects or implications elsewhere.
b) it would then make the regex cover all valid scenarios (i.e., do people use other characters?)

I was a bit hesitant from posting this at all, as I couldn't even figure out why it was fiddling with with the database name at all during file upload (it isn't using it in any obvious way -- the uploaded files don't go to a directory containing the DB name, for example). Given this, I'll back out at this point and let you more-knowledgeable folks handle it (if any change is warranted).

  • Grotty.

44949 is another report of this bug. The regex is so strict to catch any container names that would break cloud storage backends that may use a '.' to separate namespaces. I just submitted a patch that leaves the strict regex in FileBackendStore but overrides it in FSFileBackend with a much more relaxed one that allows anything but '/'.

I suppose it would be appropriate to link to that commit: https://gerrit.wikimedia.org/r/83774/

FWIW due to how misleading the resulting error message is I think at least that should be improved, it is really confusing for end users particularly because searching for the error string leads to completely unrelated problems

Tgr renamed this task from Files cannot be uploaded when database name contains certain characters due to regex check in FileBackendStore.php to Non-alphanumeric characters in wgDBname break file uploads.Jan 19 2015, 10:40 PM
Tgr updated the task description. (Show Details)
Tgr added a project: MediaWiki-File-management.
Tgr set Security to None.
Tgr added subscribers: Krenair, Umherirrender, SmartK, aaron.

Replacing dots with underscores would be an easy solution, as long as we can ensure uniqueness.

Swift container names can contain any UTF-8 character except /. S3 bucket names can only contain lowercase letters, numbers, and hyphens; the current FileBackendStorage container name rules already fail that. So there doesn't seem to be much to lose from making the rules less strict.

Tgr raised the priority of this task from Low to Medium.EditedApr 7 2015, 4:50 PM

Upping priority; at a minimum there should be a sensible error message explaining workarounds.

Change 83774 had a related patch set uploaded (by Legoktm):
filebackend: FS needn't be so strict with names

https://gerrit.wikimedia.org/r/83774

Change 219507 had a related patch set uploaded (by Aaron Schulz):
Allow "." in filebackend container prefixes

https://gerrit.wikimedia.org/r/219507

Change 219507 merged by jenkins-bot:
Allow "." in filebackend container prefixes

https://gerrit.wikimedia.org/r/219507

Change 83774 abandoned by Ejegg:
filebackend: FS needn't be so strict with names

https://gerrit.wikimedia.org/r/83774

@Tgr / @Ejegg: Hi, all related patches in Gerrit have been merged or abandoned. Is there more to do in this task? Asking as you are task assignees / patch authors. Thanks in advance!

Aklapper removed Tgr as the assignee of this task.Jun 19 2020, 4:27 PM
Aklapper removed a subscriber: wikibugs-l-list.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Aklapper assigned this task to Tgr.

@Tgr / @Ejegg: Hi, all related patches in Gerrit have been merged or abandoned. Is there more to do in this task? Asking as you are task assignees / patch authors. Thanks in advance!

No reply from @Tgr / @Ejegg hence resolving.

Tgr removed Tgr as the assignee of this task.

I don't think this has been fixed, the dot character has just been special-cased. T120071: isValidContainerName in /includes/filebackend/FileBackendStore.php not accepting databases starting with "_". for example complains about a starting underscore and T95037: Some valid MySQL database names (with a + plus sign in it) break thumbnails complains about a plus character, and those still produce the error.