Steps to reproduce
- run python pwb.py weblinkchecker -start:! -ns:0 on a large wiki (cswiki with 400 000 articles is enough)
Expected behavior
weblinkchecker.py should run through all pages in chosen namespace smoothly
Current behavior
weblinkchecker.py after some amount of time/pages read begins to slow itself down (let's say 1 hour). After another amount of time (2 hours) it slows also the whole OS by increasing RAM usage up to 100% (4 hours). It finally stops on 100% RAM usage when processing some page (>>> Some page <<< is the last what it outputs) and does nothing more (freezes). The whole OS (per RAM usage) is making itself slower until it freezes too. If I interrupt it by keyboard when RAM usage less than 100%, it usually says: Waiting for remaining 49 threads to finish, please wait...; on 100% RAM usage I can not do anything else than hard shutdown my PC (100% RAM usage makes the whole OS freeze).
Yes, I can set my OS not to reach 100% (still slows the whole OS after a while). Yes, I can limit weblinkchecker (pywikibot) to use only some memory max (OS and RAM ok, but weblinkchecker.py still slows and freezes itself). Yes, I can interrupt it after a while and start it from the last article checked (currently the best solution I think, I use timeout --signal=SIGINT 20m python pwb.py weblinkchecker -ns:0 -start:"Last article processed in last run" and hope the data will not be broken at the end). But I think there is some issue in weblinkchecker.py, maybe duplicite calling of threads or functions; or threads or functions not properly terminated after a success or fail; or maybe functions or threads exceeds some limits which should not. And this issue makes weblinkchecker.py slowly grow until it reaches the cutoff and freezes.
Configuration
Python 3.6.4, Pywikibot last master commit, OS: Arch Linux; 80 Mbit/s connection; 1,9 GiB RAM (+ 4 GiB of swap space)