running that script with -always option over hundreds of pages putting page changes asynchronously leads to infinity loop or the script hangs consuming all resource of on CPU (25% of a 4 core machine) and couldn't be stopped via Ctrl-C.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
[bugfix] Re-enable stopme() called inside a script | pywikibot/core | master | +40 -28 |
Event Timeline
c:\Pywikipedia\ssh\pywikibot\core>pwb.py version
Pywikibot: [ssh] pywikibot-core (rPWBC117134afd32d, g4806, 2014/12/20, 15:04:24, OUTDATED)
Release version: 2.0b3
httplib2 version: 0.9
cacerts: C:\Pywikipedia\ssh\pywikibot\core\externals\httplib2\python2\httplib2
\cacerts.txt
certificate test: ok
Python: 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]
unicode test: ok
PYWIKIBOT2_DIR: Not set
PYWIKIBOT2_DIR_PWB: C:\Pywikipedia\ssh\pywikibot\core
PYWIKIBOT2_NO_USER_CONFIG: Not set
Config base dir: c:\Pywikipedia\ssh\pywikibot\core
This was not due to solve_disambiguation script. Investigating in that matter, the ^C was not recognized if the async buffer exceeds max_queue_size. This happened because no page was putted to live wiki. I found another hint to the api cache while restarting the script:
Could not load cache: TypeError("'int' object is not iterable)
Very strange but clearing the api_cache folder solves the problem. Maybe pywikibot.async_manager returns None?
Another idea: Maybe it had sth to do with cache implementation of https://gerrit.wikimedia.org/r/#/c/131017/ which was fetched on my repository as I started this script.
I found my xqbot/afd-notice.py script hanging several times when putting a page. ctrl-c doesn't work; I've no CPU consumption time in that case. That bug occurs inside SingleSiteBot.userPut() after the summary was printed; there is no async saving in this script.
Reading old article list... 307 articles found Reading ignoring lists... 155 users found to opt-out Check for moved pages... Processing data... >>> Bauwelt Delmes Heitmann <<< is tagged for deleting >>> Creator is Delmesheitmann >>> Benutzer Diskussion:Delmesheitmann <<< @@ -0,0 +1 @@ + {{ers:user:xqbot/LD-Hinweis|Bauwelt Delmes Heitmann|angelegte}} Edit summary: Bot: Benachrichtigung über Löschdiskussion zum Artikel [[Bauwelt D elmes Heitmann]]
Looks like 'async' does the problem:
if 'async' not in kwargs and self.getOption('always'): kwargs['async'] = True
Investigating deeper into this issue I found a problem inside BaseBot class:
BaseBot.userPut invokes BaseBot._save_page with parameters page for the page and page.save for the function and **kwargs at last.
Now BaseBot._save_page tries to call the given function:
func(*args, **kwargs)
with should be equal as calling
page.save(*args, **kwargs)
page.save made an async_request with page._safe as function but page._save itself was never reached and there was no exception or message or any other hint to that problem; the task just hangs and couldn't be stopped with ctrl-c. Killing the task was the only way to kill him.
Maybe we should use page_put_queue.get_nowait() or use a timeout parameter for page_put_queue.get() in async_manager?
Indeed there is a problem with async put.
async_request() calls
page_put_queue.put((request, args, kwargs)) in its last line but the called method never terminates.
page_put_queue is a Queue object with mal queue length of config.max_queue_size which is 64 by default (and in this case too)
I also figured out that async_manager was never reached also in that cases the script works correctly which looks courious.
Now he have a problem inside thy python library (py 2.7.10 here) or an unhandled timeout problem. Any other suggestions?
I propose to deactivate async stuff until the problem is found.
Ah that might be the problem. async_manager was never reached because the buffer was full from previous puts to the queue and it seems the queue was not decreased. Maybe there is a timeout error when putting the page but the request wasn't removed from the queue. My problem is that that bug occurs after a long delay which might be several days. I will decrease the max_queue_size now to enforce the problem.
Change 276428 had a related patch set uploaded (by Xqt):
[bugfix] reset global stopped variable and unfreeze async put.
Hm, there is sth wrong with stopme():
I got
Waiting for 1 pages to be put. Estimated time remaining: 0:00:05
but the queue should be empty because there were no page to put!
Could you provide a new summary of the bug, explaining why it was hanging. Which part of this changeset does the 'unfreeze' part? It seems like the changeset has a few other closely related fixes in it, but they are more cosmetic, making it hard to see the exact 'freeze' bug and its fix.
Change 276428 merged by jenkins-bot:
[bugfix] Re-enable stopme() called inside a script