Page MenuHomePhabricator

Normalize Wikimedia Commons file names via a ValueParser
Open, LowPublic

Description

commonsMedia values are stored as strings that contain the file name only, e.g. Example_en.svg refers to https://commons.wikimedia.org/wiki/File:Example_en.svg. There is a validator in place that checks if the file name is valid and exists on Commons. But there is no normalization/parsing except for whitespace trimming. This means all the following can exist side by side, while all referring to the same file on Commons:

  • Example_en.svg
  • Example en.svg
  • example en.svg

This is a problem in all situations where one specific form of a page title is expected, e.g. with spaces for human-readable labels, but with underscores for links. E.g. the issue T99664: [Bug] Diff does not show stored capitalisation of first letter would not have happened with normalization in place.

Proposal:

  1. Decide which form should be in the database. (Personally, I suggest to store the human readable form Example en.svg with spaces and the first character capitalized because this is what people see and expect the most.)
  2. Implement a parser that applies this to all new and edited values.
  3. Optionally walk through all existing values and normalize them accordingly.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 18 2018, 2:37 PM
thiemowmde triaged this task as Low priority.Sep 18 2018, 2:37 PM
Addshore moved this task from incoming to ready to go on the Wikidata board.Sep 18 2018, 2:48 PM
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptNov 22 2019, 1:25 PM

Hi there,
I would like to fix this bug however i am unable to access the file itself. I would really like to get as much information as possible.

Hi there,
I would like to fix this bug however i am unable to access the file itself. I would really like to get as much information as possible.

Hey!

Which file?

I didn't know how to access the code to edit