Page MenuHomePhabricator

Mismatch finder incorrectly capitalizes values
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
Value is Tt2953050

What should have happened instead?:
Value is tt2953050

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

https://mismatch-finder.toolforge.org/results?ids=Q103372692

I only get this message:

No mismatches have been found for: Encanto (Q103372692)

Did the mismatches already expire?

No, looks like the two existing mismatches for that item ID got reviewed (one as “none”, the other as “external”, but neither is “pending” anymore).

https://mismatch-finder.toolforge.org/results?ids=Q20979182 has the same example. There are many like this so using "random mismatches" should give you some example even if this one is reviewed.

And are we sure that the values got capitalized by Mismatch Finder? The CSV file stored on Toolforge has the Tt capitalized as well:

tools.mismatch-finder@tools-sgebastion-10:~/mismatch-finder-repo$ grep Q20979182 storage/app/mismatch-files/20221024_175148-mismatch-upload.12920334.csv 
"Q20979182$01B4F040-9F10-4D50-B42E-F8BBA1F9C5C2","P345","Tt2091256","2091256","http://en.wikipedia.org/wiki/Captain_Underpants:_The_First_Epic_Movie"

And if I’m reading the code correctly, that should be the original uploaded CSV file (as opposed to, something that Mismatch Finder generated again, potentially after already mangling the values) – though I’m not sure about that.

Lydia_Pintscher claimed this task.
Lydia_Pintscher added a subscriber: Mike_Peel.

Ah good catch! I just checked the original CSV I got from Mike and it does indeed contain the data capitalized. I'm adding him to the ticket for info. I am closing the ticket as I don't think there is anything to do for the dev team here.

Just to note that I've now gotten to the bottom of this in the pywikibot code. If after getting the property claim for an external-id, I request 'clm.getTarget()' - then I get lower case. If, however, I do 'clm.getTarget().title()' then the first letter is capitalised. To generalise the code, I've added a check for if a claim is an external-id or not, and use the appropriate call accordingly. This diff: https://github.com/mpeel/wikicode/commit/3f2c18ef8c559a02d04b81aa330ad9a843fb3a52