Page MenuHomePhabricator

textlib.replaceExcept() doesn't work properly with template exceptions
Closed, ResolvedPublic

Description

There are some problems with textlib.replaceExcept() using 'templates' as exceptions e.g. menioned in T63024 or T105620. CPU is time consuming, I found 12% on 8 core processor. I guess there are problems with nested templates.
You may check it with

pwb.py cosmetic_changes -page:Kalle_Svensson -simulate -lang:cs

Event Timeline

Xqt raised the priority of this task from to Medium.
Xqt updated the task description. (Show Details)
Xqt subscribed.

Change 224301 had a related patch set uploaded (by Xqt):
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/224301

removing 'templates' from exceptions list the given sample doesn't hang anymore.

Xqt set Security to None.

I was investigationg in this problem: The main problem is the new NESTED_TEMPLATE_REGEX which consumes a lot of time resolving nested template recursivly. e.g. this [1] version needs 6 seconds to process, whereas [2] needs high increasing 149 secondes, both running cosmetic_changes.removeUselessSpaces() only. There was only that [3] small difference between the versions. In result the bug can be solved by going back from umpteenth attempt using regex for nested template to the previous iterative solution by inline markers which is very faster.

[1] https://de.wikipedia.org/w/index.php?title=Benutzer:Xqt/Labor&oldid=143951267
[2] https://de.wikipedia.org/w/index.php?title=Benutzer:Xqt/Labor&oldid=143951307
[3] https://de.wikipedia.org/w/index.php?title=Benutzer%3AXqt%2FLabor&type=revision&diff=143951307&oldid=143951267

I don't understand why adding that template at the bottom is making it that slow. And doing modifications on a string should be more complex than just iterating over the text.

Anyway it needs 150 seconds for the 2nd test run. I bet that page on cs-wiki will also terminate but uses few minutes.

I breaked cc on cs:Kalle_Svensson after 61937 seconds. Seems that regex needs exponentially increasing time.

Xqt raised the priority of this task from Medium to High.Jul 13 2015, 3:48 AM

Yes you are right. When I run the regex it won't even try the text from your test page regex101 and mentions catastrophic backtracking.

Xqt raised the priority of this task from High to Unbreak Now!.Jul 16 2015, 3:32 PM

Breaks various scripts e.g. while removing templates. Currently it fails for different pages for

pwb.y template.py -remove "Navigationsleiste Oscar Bester fremdsprachiger Film" -summary:"Bot: Entferne gelöschte Navigation Oscar Bester fremdsprachiger Film"

and cc is mainly unusable.

Change 226500 had a related patch set uploaded (by Xqt):
[WIP] Bugfix for T105621

https://gerrit.wikimedia.org/r/226500

Change 226531 had a related patch set uploaded (by John Vandenberg):
Reduce complexity of NESTED_TEMPLATE_REGEX

https://gerrit.wikimedia.org/r/226531

Change 224301 abandoned by Xqt:
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/224301

d3e68d3fc rebuild the template regex, to work around sf.net bugs #3603994, #2819291, #3158761 ('new' ids #1575 , 973, 1283)

This task has been "Unbreak Now!" priority for nearly two weeks. Please set an assignee.

Change 226531 merged by jenkins-bot:
Fix NESTED_TEMPLATE_REGEX

https://gerrit.wikimedia.org/r/226531

Xqt lowered the priority of this task from Unbreak Now! to Low.Jul 29 2015, 4:04 AM

Is there outstanding problems here?

Re-open if there are still issues to be resolved.

reopened due to problems with nested template which caused the bot blocked.

Change 299137 had a related patch set uploaded (by Xqt):
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/299137

Change 299137 merged by jenkins-bot:
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/299137

Xqt raised the priority of this task from Low to High.Jul 18 2016, 4:25 AM

This problem, related to extremely high CPU usage and pywikibot hanging, is fixed.

I have created T140608 to investigate your problem.