Page MenuHomePhabricator

WMFTimeoutException on non-existent files
Open, MediumPublic

Description

Pywikibot keeps attempting to access non-existent files when running with allimages().

2020-02-14 06:43:34,471 corruptBackwards: DEBUG Skipping check on File:JR久慈 駅 - panoramio (1).jpg
Traceback (most recent call last):
  File "fullscan_reversed.py", line 23, in <module>
    scan.process_file() # Do the work
  File "/home/ccc/Commons-image-corruption-detector/base_scan.py", line 65, in process_file
    for image_page in pwb_wrappers.allimages(reverse=self.reverse):
  File "/usr/local/lib/python3.8/site-packages/pywikibot/data/api.py", line 2808, in __iter__
    self.data = self.request.submit()
  File "/usr/local/lib/python3.8/site-packages/pywikibot/data/api.py", line 1985, in submit
    rawdata, use_get = self._http_request(use_get, uri, body, headers,
  File "/usr/local/lib/python3.8/site-packages/pywikibot/data/api.py", line 1767, in _http_request
    self.wait()
  File "/usr/local/lib/python3.8/site-packages/pywikibot/data/api.py", line 2088, in wait
    raise TimeoutError('Maximum retries attempted without success.')
pywikibot.exceptions.TimeoutError: Maximum retries attempted without success.

@AntiCompositeNumber got pywikibot to fail attempting to access

https://commons.wikimedia.org/w/api.php?gaifrom=%F0%A9%B5%8F&prop=info%7Cimageinfo%7Ccategoryinfo&inprop=protection&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata&iilimit=max&generator=allimages&action=query&indexpageids=&continue=gaicontinue%7C%7Cuserinfo&gaidir=descending&gailimit=500&meta=userinfo&uiprop=blockinfo%7Chasmsg&maxlag=5&format=json&gaicontinue=%EF%BC%90%EF%BC%97%E5%BC%8F%E6%A9%9F%E5%8B%95%E6%94%AF%E6%8F%B4%E6%A9%8B%EF%BC%88%E8%A3%85%E5%82%99%E5%93%81%E5%B1%95%E7%A4%BA%E3%83%BB%E6%96%BD%E8%A8%AD%E5%AD%A6%E6%A0%A1%EF%BC%89_%284%29.jpg

Without the percent-encoding, that's

https://commons.wikimedia.org/w/api.php?gaifrom=𩵏&prop=info|imageinfo|categoryinfo&inprop=protection&iiprop=timestamp|user|comment|url|size|sha1|metadata&iilimit=max&generator=allimages&action=query&indexpageids=&continue=gaicontinue||userinfo&gaidir=descending&gailimit=500&meta=userinfo&uiprop=blockinfo|hasmsg&maxlag=5&format=json&gaicontinue=07式機動支援橋(装備品展示・施設学校)_(4).jpg

Pywikibot is using POST and not GET, but it fails the same. Most recent attempt resulted in

{"error":{"code":"internal_api_error_WMFTimeoutException","info":"[XknCBQpAAEsAAJ7w@woAAACH] Caught exception of type WMFTimeoutException","errorclass":"WMFTimeoutException"},"servedby":"mw1280"}

Could someone with logstash access look into what's going on with the media-wiki end?

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript

The file that got skipped is a red herring, I think. Something about the API query is more expensive than expected. With a gailimit=500, I get a timeout almost every time. Between 100 and 500, I sometimes get a response, sometimes a timeout, and sometimes a PHP out-of-memory error. I can continue through the 500 files in blocks of 100 just fine with about 15s from request to response. That includes a response containing the file that got skipped.

I am missing the error traceback and log entry in front of the wait() statement in line 1767. The Code is

except Exception:
    # for any other error on the http request, wait and retry
    pywikibot.error(traceback.format_exc())
    pywikibot.log('{}, {}'.format(uri, paramstring))
else:
    return data, use_get
self.wait()

and there must be an exception from http.request raised in additon to the APIError

jbond triaged this task as Medium priority.Feb 17 2020, 12:01 PM

The code I used:

#!/bin/env python3
import pywikibot
import logging

logging.basicConfig(filename="test.log", level=logging.DEBUG)
site = pywikibot.Site("commons", "commons")
ai = site.allimages(
    reverse=True,
    start="𩵏",
    prefix="",
)

for page in ai:
    print(page.title())

The debug logging of the failed request is P10426.

Anomie subscribed.

Pywikibot is using POST and not GET, but it fails the same. Most recent attempt resulted in

{"error":{"code":"internal_api_error_WMFTimeoutException","info":"[XknCBQpAAEsAAJ7w@woAAACH] Caught exception of type WMFTimeoutException","errorclass":"WMFTimeoutException"},"servedby":"mw1280"}

Could someone with logstash access look into what's going on with the media-wiki end?

The exception logged by MediaWiki for that error was

2020-02-16 22:29:22 [XknCBQpAAEsAAJ7w@woAAACH] mw1280 commonswiki 1.35.0-wmf.19 exception ERROR: [XknCBQpAAEsAAJ7w@woAAACH] /w/api.php?gaifrom=%F0%A9%B5%8F&prop=info|imageinfo|categoryinfo&inprop=protection&iiprop=timestamp|user|comment|url|size|sha1|metadata&iilimit=max&generator=allimages&action=query&indexpageids=&continue=gaicontinue||userinfo&gaidir=descending&gailimit=500&meta=userinfo&uiprop=blockinfo|hasmsg&maxlag=5&format=json&gaicontinue=%EF%BC%90%EF%BC%97%E5%BC%8F%E6%A9%9F%E5%8B%95%E6%94%AF%E6%8F%B4%E6%A9%8B%EF%BC%88%E8%A3%85%E5%82%99%E5%93%81%E5%B1%95%E7%A4%BA%E3%83%BB%E6%96%BD%E8%A8%AD%E5%AD%A6%E6%A0%A1%EF%BC%89_(4).jpg   WMFTimeoutException from line 39 of /srv/mediawiki/wmf-config/set-time-limit.php: the execution time limit of 60 seconds was exceeded {"exception_id":"XknCBQpAAEsAAJ7w@woAAACH","exception_url":"/w/api.php?gaifrom=%F0%A9%B5%8F&prop=info|imageinfo|categoryinfo&inprop=protection&iiprop=timestamp|user|comment|url|size|sha1|metadata&iilimit=max&generator=allimages&action=query&indexpageids=&continue=gaicontinue||userinfo&gaidir=descending&gailimit=500&meta=userinfo&uiprop=blockinfo|hasmsg&maxlag=5&format=json&gaicontinue=%EF%BC%90%EF%BC%97%E5%BC%8F%E6%A9%9F%E5%8B%95%E6%94%AF%E6%8F%B4%E6%A9%8B%EF%BC%88%E8%A3%85%E5%82%99%E5%93%81%E5%B1%95%E7%A4%BA%E3%83%BB%E6%96%BD%E8%A8%AD%E5%AD%A6%E6%A0%A1%EF%BC%89_(4).jpg","caught_by":"mwe_handler"} 
[Exception WMFTimeoutException] (/srv/mediawiki/wmf-config/set-time-limit.php:39) the execution time limit of 60 seconds was exceeded
  #0 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiResult.php(370): {closure}(integer)
  #1 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiResult.php(371): ApiResult::validateValue(array)
  #2 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiResult.php(411): ApiResult::validateValue(array)
  #3 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiQueryBase.php(496): ApiResult->addValue(array, NULL, array)
  #4 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiQueryImageInfo.php(183): ApiQueryBase->addPageSubItem(integer, array)
  #5 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiQuery.php(255): ApiQueryImageInfo->execute()
  #6 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiMain.php(1586): ApiQuery->execute()
  #7 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiMain.php(522): ApiMain->executeAction()
  #8 /srv/mediawiki/php-1.35.0-wmf.19/includes/api/ApiMain.php(493): ApiMain->executeActionWithErrorHandling()
  #9 /srv/mediawiki/php-1.35.0-wmf.19/api.php(84): ApiMain->execute()
  #10 /srv/mediawiki/w/api.php(3): require(string)
  #11 {main}

Unfortunately that doesn't tell us a whole lot.

When I tried the URL in the description just now, instead of a timeout it gave an error page stating "Allowed memory size of 692060160 bytes exhausted (tried to allocate 40960 bytes)". Overall, handling of metadata in ApiImageInfo is pretty crappy and the whole thing needs a rewrite.

I'm going to untag MediaWiki-Action-API from this task, as the API part is likely a duplicate of one of the tasks linked to T89971.

Tempting to just merge this ticket into T89971 then.

jbond subscribed.

I removed SRE from this task as im not sure there is any relevent work for sre but please re-add with additional information/pointers if i got this wrong