Page MenuHomePhabricator

Getting the referring pages takes up too much memory (due to included image metadata)
Closed, DuplicatePublic

Description

request and response headers for API call

Using pywikipediabot-core from git (synced today), I try to obtain the pages referring to Template:Monument istoric on commons, but I get a 500 error code (see attachement for defails on the request and response). The problem is solved by requesting fewer pages at a time.

From IRC I found out the script takes up too much memory:

[01-Sep-2013 17:15:25] Fatal error: Allowed memory size of 183500800 bytes exhausted (tried to allocate 512 bytes) at /usr/local/apache/common-local/php-1.22wmf14/includes/api/ApiQueryImageInfo.php on line 470

This seems to be a regression issue, since this used to work back in June, using the svn version of what is now pywikipediabot-compat. I checked the API request there and it hasn't changed.

I believe that this issue should be investigated, since the request used to work in the past, so this is a performance degradation.


Version: 1.22.0
Severity: normal

Attached:

Details

Reference
bz53663

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:11 AM
bzimport set Reference to bz53663.
bzimport added a subscriber: Unknown Object (MLST).

The problem reported at https://ro.wikipedia.org/wiki/Wikipedia:Cafenea#Eroare might be related (problems saving large pages, the error message is available at http://i.imgur.com/CbVHWu4.jpg )

Has the usage of the template increased since June? If someone went on a major tagging spree, that could have increased the memory usage.

Yes and no. :) Yes, the usage has increased, as it does every September in WLM. But since I noticed the bug on Sept 1st, the increase must have been 1% tops (there were 28.5k files at the beginning of the contest [1] and only 68 photos were loaded in the first day [2], plus say 1-200 before that).

Since I'm asking for 5k pages at a time, I expect the first 5 requests to be the same back in June and now. Is this assumption wrong?

[1] There were 30099 pictures last night and ~1300 uploads during the contest
[2] https://toolserver.org/~superzerocool/wlm/?pais=romania

I've narrowed it down to the inclusion of image metadata, apparently the many small objects involved there use quite a bit of memory. When I locally hack things to just shove in the serialized string (i.e. returning all the same data, just encoded differently) the memory usage is much less.

CCing Bawolff since he's been working with image metadata recently and might have more insights on what may have changed or how to fix it.

As a workaround, you could use a lower geilimit. Gerrit change 83935 (coming with 1.22wmf17, which is scheduled to hit Commons on Monday) will make this happen anyway for people who aren't bots or admins.