It looks like even
extract_sections('<!-- -->', site)
causes an infinite loop, and when I interrupt the program, the error looks like this:
File "/data/project/archiving/pkgsrc/core/scripts/archivebot.py", line 451, in load_page header, threads, footer = extract_sections(text, self.site) File "/mnt/nfs/labstore-secondary-tools-project/archiving/pkgsrc/core/pywikibot/textlib.py", line 917, in extract_sections last_section_content).group().lstrip() File "/mnt/nfs/labstore-secondary-tools-project/archiving/venv/lib/python3.5/re.py", line 173, in search return _compile(pattern, flags).search(string)
pointing to this code segment:
footer = re.search( r'(%s)*\Z' % r'|'.join((langlink_pattern, cat_regex.pattern, r'\s+')), last_section_content).group().lstrip()
The regex has effectively '(\s+)*$' in it, which can be problematic: https://www.regular-expressions.info/catastrophic.html.
Originally found in https://commons.wikimedia.org/w/index.php?title=Commons:Bar&oldid=347447603 .