Page MenuHomePhabricator

textlib.extract_sections hangs up for a text containing too many continous whitespace
Closed, ResolvedPublic


It looks like even

extract_sections('<!--                                         -->', site)

causes an infinite loop, and when I interrupt the program, the error looks like this:

File "/data/project/archiving/pkgsrc/core/scripts/", line 451, in load_page
  header, threads, footer = extract_sections(text,
File "/mnt/nfs/labstore-secondary-tools-project/archiving/pkgsrc/core/pywikibot/", line 917, in extract_sections
File "/mnt/nfs/labstore-secondary-tools-project/archiving/venv/lib/python3.5/", line 173, in search
  return _compile(pattern, flags).search(string)

pointing to this code segment:

footer =
    r'(%s)*\Z' % r'|'.join((langlink_pattern, cat_regex.pattern, r'\s+')),

The regex has effectively '(\s+)*$' in it, which can be problematic:

Originally found in .

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript

Change 508473 had a related patch set uploaded (by Whym; owner: Whym):
[pywikibot/core@master] textlib: avoid infinite execution of regex

Change 508473 merged by jenkins-bot:
[pywikibot/core@master] textlib: avoid infinite execution of regex