Page MenuHomePhabricator

Handle image files where infotext has been created, but no image
Closed, ResolvedPublic

Event Timeline

@Alicia_Fagerving_WMSE Some infotexts where uploaded without an image (I didn't even think pywikibot allowed that on Commons File: name space). They are images where the files are missing from the museum or some other corrupt files etc that needs some attention. What would be a good way to address those pages for now? A maintanence category or an ask for deletion?

I think they qualify for speedy deletion, per https://commons.wikimedia.org/wiki/Commons:Criteria_for_speedy_deletion#File, empty file-namespace, and can be tagged with {{speedydelete|broken file upload}}.

Do you have any idea how many of these have been created (and if there's more to come)? I guess if there's more of them, a maintenance category to keep track of them is a good idea, just like there's https://commons.wikimedia.org/wiki/Category:Image_pages_created_for_Flickr_upload_bot_without_files. But the point of this category is that the pages in it should be removed as fast as possible.

According to this script it is 30 files affected. I think it's the least painful to just add {{speedydelete|broken file upload}} manually then. Will fix. Thanks!

Oh, and I just discovered this: https://commons.wikimedia.org/wiki/Special:AbuseLog?wpSearchUser=COHBot&wpSearchTitle=&wpSearchFilter=135

Should make it easier to notice any new empty pages without having to run scripts :)

Aha, great!. Thank you!

I tried to interprest the exception pywikibot.exceptions.PageRelatedError in the script. It gave 30 results which was sane. But I got the wrong names of files. Is that due to have exception handling works? See output on gist

The reason I tried that approach was that the more obvious ones didn't work:

a_new_page = pywikibot.FilePage(site, 'File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif')

a_new_page.fileIsOnCommons()
WARNING: /Users/mos/anaconda/bin/ipython:1: DeprecationWarning: pywikibot.page.FilePage.fileIsOnCommons is deprecated; use fileIsShared instead.
  #!/bin/bash /Users/mos/anaconda/bin/python.app

---------------------------------------------------------------------------
PageRelatedError                          Traceback (most recent call last)
<ipython-input-31-3ddda58e2fce> in <module>()
----> 1 a_new_page.fileIsOnCommons()

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/tools/__init__.py in wrapper(*args, **kwargs)
   1324             depth = get_wrapper_depth(wrapper) + 1
   1325             issue_deprecation_warning(name, instead, depth)
-> 1326             return obj(*args, **kwargs)
   1327 
   1328         def add_docstring(wrapper):

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in fileIsOnCommons(self)
   2376         @rtype: bool
   2377         """
-> 2378         return self.fileIsShared()
   2379 
   2380     def fileIsShared(self):

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in fileIsShared(self)
   2392                 u'http://wikitravel.org/upload/shared/')
   2393         else:
-> 2394             return self.fileUrl().startswith(
   2395                 'https://upload.wikimedia.org/wikipedia/commons/')
   2396 

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in fileUrl(self)
   2367         """Return the URL for the file described on this page."""
   2368         # TODO add scaling option?
-> 2369         return self.latest_file_info.url
   2370 
   2371     @deprecated("fileIsShared")

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in latest_file_info(self)
   2319         """
   2320         if not len(self._file_revisions):
-> 2321             self.site.loadimageinfo(self, history=True)
   2322         latest_ts = max(self._file_revisions)
   2323         return self._file_revisions[latest_ts]

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/site.py in loadimageinfo(self, page, history)
   2969                 raise PageRelatedError(
   2970                     page,
-> 2971                     u"loadimageinfo: Query on %s returned no imageinfo")
   2972 
   2973         return (pageitem['imageinfo']

PageRelatedError: loadimageinfo: Query on [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif]] returned no imageinfo

And with the recommended method:

`
a_new_page.fileIsShared()
---------------------------------------------------------------------------
PageRelatedError                          Traceback (most recent call last)
<ipython-input-32-2e3a11f312ef> in <module>()
----> 1 a_new_page.fileIsShared()

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in fileIsShared(self)
   2392                 u'http://wikitravel.org/upload/shared/')
   2393         else:
-> 2394             return self.fileUrl().startswith(
   2395                 'https://upload.wikimedia.org/wikipedia/commons/')
   2396 

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in fileUrl(self)
   2367         """Return the URL for the file described on this page."""
   2368         # TODO add scaling option?
-> 2369         return self.latest_file_info.url
   2370 
   2371     @deprecated("fileIsShared")

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in latest_file_info(self)
   2319         """
   2320         if not len(self._file_revisions):
-> 2321             self.site.loadimageinfo(self, history=True)
   2322         latest_ts = max(self._file_revisions)
   2323         return self._file_revisions[latest_ts]

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/site.py in loadimageinfo(self, page, history)
   2969                 raise PageRelatedError(
   2970                     page,
-> 2971                     u"loadimageinfo: Query on %s returned no imageinfo")
   2972 
   2973         return (pageitem['imageinfo']

PageRelatedError: loadimageinfo: Query on [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif]] returned no imageinfo

In [33]: a_new_page = pywikibot.FilePage(site, 'File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif')

In [34]: a_new_page.fileIsShared()
---------------------------------------------------------------------------
PageRelatedError                          Traceback (most recent call last)
<ipython-input-34-2e3a11f312ef> in <module>()
----> 1 a_new_page.fileIsShared()

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in fileIsShared(self)
   2392                 u'http://wikitravel.org/upload/shared/')
   2393         else:
-> 2394             return self.fileUrl().startswith(
   2395                 'https://upload.wikimedia.org/wikipedia/commons/')
   2396 

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in fileUrl(self)
   2367         """Return the URL for the file described on this page."""
   2368         # TODO add scaling option?
-> 2369         return self.latest_file_info.url
   2370 
   2371     @deprecated("fileIsShared")

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/page.py in latest_file_info(self)
   2319         """
   2320         if not len(self._file_revisions):
-> 2321             self.site.loadimageinfo(self, history=True)
   2322         latest_ts = max(self._file_revisions)
   2323         return self._file_revisions[latest_ts]

/Users/mos/anaconda/lib/python3.5/site-packages/pywikibot/site.py in loadimageinfo(self, page, history)
   2969                 raise PageRelatedError(
   2970                     page,
-> 2971                     u"loadimageinfo: Query on %s returned no imageinfo")
   2972 
   2973         return (pageitem['imageinfo']

PageRelatedError: loadimageinfo: Query on [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif]] returned no imageinfo

I think fileIsOnCommons() and fileIsShared() will have the same result, so it doesn't really matter which one is used: http://pywikibot.readthedocs.io/en/latest/_modules/pywikibot/page/#FilePage.fileIsOnCommons

I've also tried fileUrl() and it does the same thing -- it raises a PageRelatedError if there's no file. I think it's because all these methods assume that a FilePage has imageinfo, and if it doesn't, an error is raised:
http://pywikibot.readthedocs.io/en/latest/_modules/pywikibot/site/#APISite.loadimageinfo

So, there's no built-in way to handle missing imageinfo gracefully.

I've run the script now and just printed out the contents of bad_images. It does seem like all of them contain no files. Do you get different results?

30
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0159.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0161.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0163.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0164.b.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0166.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0167.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0168.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0170.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0173.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0175.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0212.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0213.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0215.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0216.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0217.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0219.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0221.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0223.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0225.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0228.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0230.tif
File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0231.tif
File:Från utgrävningarna vid Xolalpan. 206-233 från familjen Reyes gård, som står på delar av husruinen - SMVK - 0307.a.0208.tif
File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0235.tif
File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0236.tif
File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0237.tif
File:Solpyramiden - SMVK - 0307.a.0005.tif
File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.d.0020.tif
File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.g.0108.tif

However, I checked 5 of the images in the list in bad_images and none of them has an image uploaded, so the script seems to work. I'll add the deletation template by modifying the script. It would be great if you would have a look at it before I run it for a sanity check.

JuJa has already deleted the pages on the abuse page since they obviously had no text neither. It seems the files in bad_filenames doesn't get caught in the abuse filter since they have infotext, but no file.

A bonus is that I found that the following file in the printout has been created on Commons with different names:

File:Från utgrävningarna vid Xolalpan. 206-233 från familjen Reyes gård, som står på delar av husruinen - SMVK - 0307.a.0208.tif`

It was already was uploaded without the comma after "Reyes gård". I marked the old one for speedy deletion. That is caused by me not using Andrés package BatchUploadTools that does some nifty preprocessing before uploading.

@Alicia_Fagerving_WMSE Would you please have a look at the current script with pywikibot.filePage.save() implemented before I run it?
I haven't throttled the writes for instance. Is that reasonable?

Do you normally throttle right in the script? IIRC when I run a bot without specifying a throttle, pywikibot defaults to whatever is in the user-config file and spaces out the edits anyway. I think the default for writes is 10 seconds.

So anyway, it modifies the page text as expected, and finds the affected pages correctly, so I don't see anything that could go disastrously wrong.

There's still no 'normally' for me - this is the first time I run pywikibot directly. The two earlier batch uploads I ran for Wikimedia pywikibot was integrated into Andrés BatchUploadTools. I still don't feel familiar with conventions and policies on Wikiprojects.

The settings in my user-config.py are:

´´´

throttling

put_throttle = 0
maxthrottle = 10
maxlag = 0
´´´
What confuses me is that put_throttle is 0 in this user-config.py I got from André and the manual says that the flag ´-putthrottle´not e.g. ´-maxthrottle´ (which doesnt exist) says it sets the delay to 10 secs.
´´´
-putthrottle:nn
Set the minimum time (in seconds) the bot will wait between saving pages. The default value is 10.
´´´
I´ll run it now anyways. Thanks for review!

I don't remember ever editing my throttle settings, so I thought 10 seconds was the default for writes. I'd put 10 for a bot account and 20 for a regular user account. But then it's a tiny batch so in the end it doesn't matter that much (and it'll let you see how it behaves in practice).

~/Dropbox/wmse/Medelhavsmuseet_2016-08[master !?]$ python check_pages_without_images_smvk-em.py 
Bad image no 1 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif]]

Logging in to commons:commons as COHBot@SMVK
Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif]] saved
Bad image no 2 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0159.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0159.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0159.tif]] saved
Bad image no 3 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0161.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0161.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0161.tif]] saved
Bad image no 4 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0163.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0163.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0163.tif]] saved
Bad image no 5 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0164.b.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0164.b.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0164.b.tif]] saved
Bad image no 6 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0166.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0166.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0166.tif]] saved
Bad image no 7 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0167.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0167.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0167.tif]] saved
Bad image no 8 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0168.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0168.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0168.tif]] saved
Bad image no 9 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0170.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0170.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0170.tif]] saved
Bad image no 10 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0173.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0173.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0173.tif]] saved
Bad image no 11 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0175.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0175.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0175.tif]] saved
Bad image no 12 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0212.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0212.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0212.tif]] saved
Bad image no 13 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0213.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0213.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0213.tif]] saved
Bad image no 14 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0215.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0215.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0215.tif]] saved
Bad image no 15 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0216.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0216.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0216.tif]] saved
Bad image no 16 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0217.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0217.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0217.tif]] saved
Bad image no 17 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0219.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0219.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0219.tif]] saved
Bad image no 18 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0221.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0221.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0221.tif]] saved
Bad image no 19 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0223.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0223.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0223.tif]] saved
Bad image no 20 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0225.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0225.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0225.tif]] saved
Bad image no 21 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0228.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0228.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0228.tif]] saved
Bad image no 22 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0230.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0230.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0230.tif]] saved
Bad image no 23 error: [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0231.tif]]
--- Added deletion template to file [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0231.tif]]

Page [[File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0231.tif]] saved
Bad image no 24 error: [[commons:File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0235.tif]]
--- Added deletion template to file [[commons:File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0235.tif]]

Page [[File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0235.tif]] saved
Bad image no 25 error: [[commons:File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0236.tif]]
--- Added deletion template to file [[commons:File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0236.tif]]

Page [[File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0236.tif]] saved
Bad image no 26 error: [[commons:File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0237.tif]]
--- Added deletion template to file [[commons:File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0237.tif]]

Page [[File:Husruin, Teopancaxco. Barrio, San Sebastian - SMVK - 0307.a.0237.tif]] saved
Bad image no 27 error: [[commons:File:Solpyramiden - SMVK - 0307.a.0005.tif]]
--- Added deletion template to file [[commons:File:Solpyramiden - SMVK - 0307.a.0005.tif]]

Page [[File:Solpyramiden - SMVK - 0307.a.0005.tif]] saved
Bad image no 28 error: [[commons:File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.d.0020.tif]]
--- Added deletion template to file [[commons:File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.d.0020.tif]]

Page [[File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.d.0020.tif]] saved
Bad image no 29 error: [[commons:File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.g.0108.tif]]
--- Added deletion template to file [[commons:File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.g.0108.tif]]

Page [[File:Utgrävningar i Teotihuacan (1932) - SMVK - 0307.g.0108.tif]] saved
Total number of bad images: 29 1313