Page MenuHomePhabricator

weblinkchecker.py goes into infinite loop when running on free software directory
Closed, DuplicatePublic

Description

At least 50% of the time, it hangs until I kill it.

To repro, I populate a user config, then

python generate_family_file.py https://directory.fsf.org/wiki/Main_Page
python3 pwb.py weblinkchecker.py -log -max_external_links:15 -start:!

When running with "python3 -m trace --trace", the output ends up in an infinite loop (3 iterations pasted):

weblinkchecker.py(864):                 while threading.activeCount() >= config.max_external_links:
 --- modulename: threading, funcname: active_count
threading.py(1245):     with _active_limbo_lock:
threading.py(1246):         return len(_active) + len(_limbo)
weblinkchecker.py(865):                     time.sleep(config.retry_wait)
weblinkchecker.py(864):                 while threading.activeCount() >= config.max_external_links:
 --- modulename: threading, funcname: active_count
threading.py(1245):     with _active_limbo_lock:
threading.py(1246):         return len(_active) + len(_limbo)
weblinkchecker.py(865):                     time.sleep(config.retry_wait)
weblinkchecker.py(864):                 while threading.activeCount() >= config.max_external_links:
 --- modulename: threading, funcname: active_count
threading.py(1245):     with _active_limbo_lock:
threading.py(1246):         return len(_active) + len(_limbo)
weblinkchecker.py(865):                     time.sleep(config.retry_wait)

I tried running with -max_external_links:2 and got the same issue

The log file just stops after processing what seems like some random amount of pages. For example, in the log for the last run, you see normal output, then frozen for over a day when I killed it:

2016-10-31 06:19:27             bot.py, 1231 in       current_page: STDOUT   Working on 'Appctl'
2016-10-31 06:20:03             bot.py, 1231 in       current_page: STDOUT   Working on 'Appdirs'
2016-10-31 06:20:04  weblinkchecker.py,  588 in                run: INFO     *[[Appctl]] links to http://sourceforge.net/tracker/?group_id=150275 - 404 Not Found.
2016-11-01 13:20:30             bot.py, 1452 in                run: INFO     
KeyboardInterrupt during WeblinkCheckerRobot bot run...
2016-11-01 13:20:30             bot.py, 1370 in               exit: INFO     
607 pages read
0 pages written
2016-11-01 13:20:30             bot.py, 1379 in               exit: INFO     Execution time: 1 days, 26350 seconds
2016-11-01 13:20:30             bot.py, 1384 in               exit: INFO     Read operation time: 185 seconds
2016-11-01 13:20:30             bot.py, 1392 in               exit: INFO     Script terminated successfully.

I've attached the last few thousand lines before the infinite loop from the tracing run:

It hangs on both python2 and python3.

I'm running on debian stretch.
I have python3-requests package installed, version 2.11.1-1
Pywikibot is from git, commit 62ba14a1825, oct 27 2016.