Page MenuHomePhabricator

checkimages.py unhandled exception
Closed, ResolvedPublic

Description

it.wiki is the second or third project by number of local uploads, thereof we love this script ^^
here's another minor bug

Checking if [[I bambini sanno scena.png]] is on commons...
Traceback (most recent call last):
  File "/home/.../bottuzzu/core/pwb.py", line 239, in <module>
    if not main():
  File "/home/.../bottuzzu/core/pwb.py", line 233, in main
    run_python_file(filename, argv, argvu, file_package)
  File "/home/.../bottuzzu/core/pwb.py", line 88, in run_python_file
    main_mod.__dict__)
  File "/home/.../bottuzzu/core/scripts/checkimages.py", line 1837, in <module>
    main()
  File "/home/.../bottuzzu/core/scripts/checkimages.py", line 1818, in main
    if not Bot.checkImageOnCommons():
  File "/home/.../bottuzzu/core/scripts/checkimages.py", line 908, in checkImageOnCommons
    hash_found = self.image.latest_file_info.sha1
  File "/home/.../bottuzzu/core/pywikibot/page.py", line 2078, in latest_file_info
    self.site.loadimageinfo(self, history=True)
  File "/home/.../bottuzzu/core/pywikibot/site.py", line 2488, in loadimageinfo
    raise NoPage(page)
pywikibot.exceptions.NoPage: Page [[it:File:I bambini sanno scena.png]] doesn't exist.
<class 'pywikibot.exceptions.NoPage'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

Event Timeline

Vituzzu raised the priority of this task from to Needs Triage.
Vituzzu updated the task description. (Show Details)
Vituzzu added a project: Pywikibot.
Vituzzu subscribed.

How did you get that page? I mean why would it want to check missing files?

I simply ran it with -duplicates and -commons options, it seems to be unable to skip already deleted files found in log.

Okay I'll take a closer look tomorrow but there is a comment (the first line is the line that failed):

hash_found = self.image.latest_file_info.sha1
if not hash_found:                                                                          
    return  # Image deleted, no hash found. Skip the image.

So maybe it previously returned None instead of raising that exception. The quick and easy fix would be to just catch that exception and handle it the same way as None. But I want to check whether we introduced a regression somewhere that it initially returned None instead of raising an exception. Additionally I don't see why that should be None if the image exist so I want to check whether the try-except could replace the current implementation there.

Change 224589 had a related patch set uploaded (by XZise):
[FIX] checkimages: Expect NoPage exception

https://gerrit.wikimedia.org/r/224589

Okay looking at it in Pywikibot-compat the script is using getHash() which doesn't exist in Pywikibot. So it was changed to used getFileSHA1Sum() (see also T75024: Instance of 'FilePage' has no 'getHash' member) which, as far as I can tell, never returned None if the file was missing. Instead it got None for the image info and then did ['sha'] so it should raised an TypeError.

Change 224589 merged by jenkins-bot:
[FIX] checkimages: Expect NoPage exception

https://gerrit.wikimedia.org/r/224589

Change 242520 had a related patch set uploaded (by XZise):
[FIX] checkimages: Expect NoPage exception

https://gerrit.wikimedia.org/r/242520

Change 242520 merged by jenkins-bot:
[FIX] checkimages: Expect NoPage exception

https://gerrit.wikimedia.org/r/242520