Page MenuHomePhabricator

file_sha1 must be given in lowercase
Open, Needs TriagePublic

Description

I was creating a filter to prevent some images from being re-uploaded, but the filter didn't work because I used SHA1 hashes with uppercase letters. Would it be possible to make edit filters work with uppercase SHA1 hashes?

Filter in question : https://commons.wikimedia.org/wiki/Special:AbuseFilter/206

Event Timeline

contains_any is case-sensitive. For case in-sensitive matching you can ucase or lcase before matching.

I'm not sure that we should change the case of the returned hash: while your filter would start working, many other could potentially stop. Anyway, this isn't something strictly caused by AbuseFilter: we compute the sha1 in this single line, by taking it from UploadBase and then converting it to hex, with no further changes. Right now I can't determine which function is capitalizing it, but we should probably consider changing either UploadBase or base_convert.
Summing up, I think that the best solution would just be to explicitly state on MWwiki that sha1 is uppercase, and change this filter accordingly. Specifically, a

file_sha1 irlike "xxx|yyy|..."

should do the trick. I'm not closing this bug in case someone has a better idea.

I think that no assumption should be made about the case of file_sha1, as it is subject to change without notice. Also, in the context of a hexadecimal string, "a" and "A" have exactly the same significance.

I could additionally suggest:

lcase(file_sha1) == 'a94a8fe5ccb19ba61c4c0873d391e987982fbbd3'

or (maybe uppercase is more common/conventional):

ucase(file_sha1) == 'A94A8FE5CCB19BA61C4C0873D391E987982FBBD3'

On a related note, IPv6 addresses are subject to the same considerations.