Page MenuHomePhabricator

Pywikibot memory leak after accessing BasePage.botMayEdit
Closed, ResolvedPublicBUG REPORT

Description

BasePage.botMayEdit is annotated with @cache

@cache
def botMayEdit(self) -> bool:

This is an unbounded strongref cache, so whenever botMayEdit is called, self will be added to the cache as key, and the key will never be deleted from cache. The self will then go on to reference way too many objects and keeping them alive. One of my long-running bots used up 4 gigabytes of memory before hitting rlimit and getting killed.

I tested that script with guppy3 to visualize just how much is being kept alive by botMayEdit cache: (dominos is "The set 'dominated' by the set of objects in x. This is the objects that will become deallocated, directly or indirectly, when the objects in x are deallocated.")

When the bot first starts:

>>> import guppy
>>> hp = guppy.hpy()
>>> hp.heap()
Partition of a set of 270151 objects. Total size = 32191012 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 114602  42  9772353  30   9772353  30 str
     1  10927   4  5288952  16  15061305  47 dict (no owner)
     2  42074  16  4030904  13  19092209  59 tuple
     3  20385   8  1538948   5  20631157  64 bytes
     4   1565   1  1524032   5  22155189  69 type
     5  10142   4  1464520   5  23619709  73 types.CodeType
     6   9655   4  1390320   4  25010029  78 function
     7  27941  10   796952   2  25806981  80 int
     8   1565   1   790760   2  26597741  83 dict of type
     9   5001   2   690752   2  27288493  85 list
<552 more rows. Type e.g. '_.more' to view.>
>>> hp.iso(pywikibot.page.BasePage.botMayEdit).dominos
Partition of a set of 11848 objects. Total size = 1652473 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   2474  21   752160  46    752160  46 dict (no owner)
     1   6271  53   498823  30   1250983  76 str
     2   1145  10    95872   6   1346855  82 list
     3    150   1    64576   4   1411431  85 collections.OrderedDict
     4    146   1    56016   3   1467447  89 dict of pywikibot.page.Page
     5    193   2    42328   3   1509775  91 dict of pywikibot.page.Link
     6    158   1    36656   2   1546431  94 set
     7     85   1    21080   1   1567511  95 dict of pywikibot.page.Claim
     8    265   2    19480   1   1586991  96 tuple
     9     46   0    13040   1   1600031  97 dict of pywikibot.page.ItemPage
<28 more rows. Type e.g. '_.more' to view.>
>>> pywikibot.page.BasePage.botMayEdit.cache_info()
CacheInfo(hits=0, misses=5, maxsize=None, currsize=5)

After running for a few minutes:

>>> hp.heap()
Partition of a set of 1603419 objects. Total size = 208454157 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 230842  14 71763136  34  71763136  34 dict (no owner)
     1 840630  52 60144617  29 131907753  63 str
     2 129444   8 13086752   6 144994505  70 list
     3  36258   2 13024648   6 158019153  76 collections.OrderedDict
     4  19909   1  7455544   4 165474697  79 dict of pywikibot.page.Link
     5  28394   2  7041712   3 172516409  83 dict of pywikibot.page.Claim
     6  52982   3  4815056   2 177331465  85 tuple
     7  12188   1  3108712   1 180440177  87 dict of pywikibot.page.ItemPage
     8   7112   0  2737112   1 183177289  88 dict of pywikibot.page.Page
     9   8177   1  2124904   1 185302193  89 set
<557 more rows. Type e.g. '_.more' to view.>
>>> hp.iso(pywikibot.page.BasePage.botMayEdit).dominos
Partition of a set of 1049673 objects. Total size = 141565142 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 193668  18 53690232  38  53690232  38 dict (no owner)
     1 500122  48 33602128  24  87292360  62 str
     2  35999   3 12842600   9 100134960  71 collections.OrderedDict
     3 115539  11 10840032   8 110974992  78 list
     4  19908   2  7455168   5 118430160  84 dict of pywikibot.page.Link
     5  28394   3  7041712   5 125471872  89 dict of pywikibot.page.Claim
     6  12188   1  3108712   2 128580584  91 dict of pywikibot.page.ItemPage
     7   7111   1  2736736   2 131317320  93 dict of pywikibot.page.Page
     8  28394   3  1817216   1 133134536  94 pywikibot.page.Claim
     9   7567   1  1755544   1 134890080  95 set
<42 more rows. Type e.g. '_.more' to view.>
>>> pywikibot.page.BasePage.botMayEdit.cache_info()
CacheInfo(hits=0, misses=225, maxsize=None, currsize=225)

This is since rPWBC8b952048ab61: [bugfix] cache botMayEdit result / T267770: wikibase_tests fails for TestLoadRevisionsCaching.test_page_text

Event Timeline

I had similar issue, installing mwparserfromhell and optimizing my script helped me.

Xqt triaged this task as High priority.May 29 2021, 1:44 PM
Xqt changed the subtype of this task from "Task" to "Bug Report".

Didn't expected this but reading the doc it seems cache/lru_cache is for functions only whereas cache_propery is made for properties.

Change 697138 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] remove lru_cache from botMayEdit method

https://gerrit.wikimedia.org/r/697138

@zhuyifei1999: what is your Python version running?

See also https://bugs.python.org/issue19859

Python 3.5.3 on Toolforge (where it used 4G memory and got killed), Python 3.7.10 local (where I was debugging with guppy)

Change 697138 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] remove lru_cache from botMayEdit method

https://gerrit.wikimedia.org/r/697138