Page MenuHomePhabricator

Undiscoverable log entries
Open, Needs TriagePublic

Description

There is no upload log entry under File:鼈甲 祭器.JPG - this is where the file was initially uploaded. However the file is listed under the user's upload log as "File:\u9f08\u7532\u3000\u796d\u5668.JPG" (File:鼈甲 祭器.JPG). Note the \u3000 aka IDEOGRAPHIC SPACE. Now, when entering this title with ideographic space into Special:Log as the target, MediaWiki normalized the title to File:鼈甲 祭器.JPG (normal space). As a result, it's impossible to find this log entry by page name.

Downstream bug: https://commons.wikimedia.org/wiki/User_talk:Rillke/Discuss/2015#VFC_error_at_File:.E9.BC.88.E7.94.B2_.E7.A5.AD.E5.99.A8.JPG
Which could be described as "API Log reports title with IDEOGRAPHIC SPACE, page was never moved from its version with IDEOGRAPHIC SPACE to a normalized version but imageinfo normalizes the title to use the normal space \u0020. As a result the script that uses the upload title from the logs as a key doesn't find the file in the result set.

Event Timeline

Rillke created this task.Aug 1 2015, 9:36 PM
Rillke raised the priority of this task from to Needs Triage.
Rillke updated the task description. (Show Details)
Rillke added a subscriber: Rillke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 1 2015, 9:36 PM

I suspect the upload function back in 2006 did not normalize like it does today. Could someone with a higher sql-foo than me shoot up a query to find more cases if they exist?

jcrespo claimed this task.Aug 14 2015, 2:19 PM
jcrespo added a subscriber: jcrespo.
jcrespo moved this task from Triage to Backlog on the DBA board.Aug 14 2015, 2:22 PM
jcrespo removed jcrespo as the assignee of this task.Aug 18 2015, 3:50 PM

I found 780 entries on the Commons log with the \u3000 character. Please note that I restricted this paste view to people under the NDA and @McZusatz as there may be non-public information here:

{P1895}

I am unsure if this is what was needed, please review it as I may have made a mistake (non-latin characters are harder to process for me) or I may have searched on the wrong filed, but at least it contains the original example.

Please advise on how to proceed? Should we convert this log entries? Ping me back if the answer is positive.

Might make more sense to do this with a MW maintenance script

Marostegui added a subscriber: Marostegui.