commonsMedia values are stored as strings that contain the file name only, e.g. Example_en.svg refers to https://commons.wikimedia.org/wiki/File:Example_en.svg. There is a validator in place that checks if the file name is valid and exists on Commons. But there is no normalization/parsing except for whitespace trimming. This means all the following can exist side by side, while all referring to the same file on Commons:
- Example en.svg
- example en.svg
This is a problem in all situations where one specific form of a page title is expected, e.g. with spaces for human-readable labels, but with underscores for links. E.g. the issue T99664: [Bug] Diff does not show stored capitalisation of first letter would not have happened with normalization in place.
- Decide which form should be in the database. (Personally, I suggest to store the human readable form Example en.svg with spaces and the first character capitalized because this is what people see and expect the most.)
- Implement a parser that applies this to all new and edited values.
- Optionally walk through all existing values and normalize them accordingly.