Page MenuHomePhabricator

Duplicate-archive files not detected (previously deleted files when uploading identical [SHA1] files) when file extension is upper case
Closed, ResolvedPublic

Description

I do not get any warning when I upload a file with an upper case file extension whose content consists of entirely the same data and therefore calculates to the same SHA1 hash at test.wikipedia.org and it is reported to be the same at commons.wikimedia.org.

This is because MediaWiki looks up the files by {hash}.{file-extension}==fa_storage_key. Since fa_storage_key contains only lower-case file extensions and upper case extension from input are not yet normalized, nothing is found upon uploading files with upper case file extensions. This should be trivial to fix: Look up the SHA1 and not the storage key.

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:55 AM
bzimport set Reference to bz72070.
Krinkle set Security to None.
gerritbot subscribed.

Change 190488 had a related patch set uploaded (by Rillke):
Detect duplicate files by SHA1 search

https://gerrit.wikimedia.org/r/190488

Patch-For-Review

Change 190488 merged by jenkins-bot:
Detect duplicate archived files by SHA1 search on upload

https://gerrit.wikimedia.org/r/190488

Fix will be part of MediaWiki1.25wmf19, see https://www.mediawiki.org/wiki/MediaWiki_1.25/Roadmap for the timeline