Page MenuHomePhabricator

textlib.replaceExcept() doesn't work properly with template exceptions
Closed, ResolvedPublic

Description

There are some problems with textlib.replaceExcept() using 'templates' as exceptions e.g. menioned in T63024 or T105620. CPU is time consuming, I found 12% on 8 core processor. I guess there are problems with nested templates.
You may check it with

pwb.py cosmetic_changes -page:Kalle_Svensson -simulate -lang:cs

Details

Related Gerrit Patches:

Event Timeline

Xqt created this task.Jul 12 2015, 7:54 AM
Xqt raised the priority of this task from to Medium.
Xqt updated the task description. (Show Details)
Xqt added a subscriber: Xqt.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJul 12 2015, 7:54 AM

Change 224301 had a related patch set uploaded (by Xqt):
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/224301

Xqt added a comment.Jul 12 2015, 8:05 AM

removing 'templates' from exceptions list the given sample doesn't hang anymore.

Xqt updated the task description. (Show Details)Jul 12 2015, 9:05 AM
Xqt set Security to None.
Xqt added a comment.Jul 12 2015, 10:02 AM

I was investigationg in this problem: The main problem is the new NESTED_TEMPLATE_REGEX which consumes a lot of time resolving nested template recursivly. e.g. this [1] version needs 6 seconds to process, whereas [2] needs high increasing 149 secondes, both running cosmetic_changes.removeUselessSpaces() only. There was only that [3] small difference between the versions. In result the bug can be solved by going back from umpteenth attempt using regex for nested template to the previous iterative solution by inline markers which is very faster.

[1] https://de.wikipedia.org/w/index.php?title=Benutzer:Xqt/Labor&oldid=143951267
[2] https://de.wikipedia.org/w/index.php?title=Benutzer:Xqt/Labor&oldid=143951307
[3] https://de.wikipedia.org/w/index.php?title=Benutzer%3AXqt%2FLabor&type=revision&diff=143951307&oldid=143951267

XZise added a subscriber: XZise.Jul 12 2015, 10:24 AM

I don't understand why adding that template at the bottom is making it that slow. And doing modifications on a string should be more complex than just iterating over the text.

Xqt added a comment.Jul 12 2015, 10:34 AM

Anyway it needs 150 seconds for the 2nd test run. I bet that page on cs-wiki will also terminate but uses few minutes.

Xqt added a comment.Jul 13 2015, 3:48 AM

I breaked cc on cs:Kalle_Svensson after 61937 seconds. Seems that regex needs exponentially increasing time.

Xqt raised the priority of this task from Medium to High.Jul 13 2015, 3:48 AM
XZise added a comment.Jul 13 2015, 8:12 AM

Yes you are right. When I run the regex it won't even try the text from your test page regex101 and mentions catastrophic backtracking.

Xqt raised the priority of this task from High to Unbreak Now!.Jul 16 2015, 3:32 PM

Breaks various scripts e.g. while removing templates. Currently it fails for different pages for

pwb.y template.py -remove "Navigationsleiste Oscar Bester fremdsprachiger Film" -summary:"Bot: Entferne gelöschte Navigation Oscar Bester fremdsprachiger Film"

and cc is mainly unusable.

Change 226500 had a related patch set uploaded (by Xqt):
[WIP] Bugfix for T105621

https://gerrit.wikimedia.org/r/226500

jayvdb updated the task description. (Show Details)Jul 23 2015, 2:16 PM

Change 226531 had a related patch set uploaded (by John Vandenberg):
Reduce complexity of NESTED_TEMPLATE_REGEX

https://gerrit.wikimedia.org/r/226531

Change 224301 abandoned by Xqt:
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/224301

jayvdb added a subscriber: jayvdb.EditedJul 24 2015, 6:28 AM

d3e68d3fc rebuild the template regex, to work around sf.net bugs #3603994, #2819291, #3158761 ('new' ids #1575 , 973, 1283)

This task has been "Unbreak Now!" priority for nearly two weeks. Please set an assignee.

Xqt assigned this task to jayvdb.Jul 27 2015, 9:02 AM

Change 226531 merged by jenkins-bot:
Fix NESTED_TEMPLATE_REGEX

https://gerrit.wikimedia.org/r/226531

Xqt lowered the priority of this task from Unbreak Now! to Low.Jul 29 2015, 4:04 AM

Change 226500 abandoned by Xqt:
[WIP] Bugfix for T105621

https://gerrit.wikimedia.org/r/226500

Is there outstanding problems here?

jayvdb closed this task as Resolved.Oct 23 2015, 12:58 AM

Re-open if there are still issues to be resolved.

Xqt reopened this task as Open.Jul 15 2016, 12:31 PM

reopened due to problems with nested template which caused the bot blocked.

Change 299137 had a related patch set uploaded (by Xqt):
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/299137

Change 299137 merged by jenkins-bot:
Deactivate removeUselessSpaces due to several bugs

https://gerrit.wikimedia.org/r/299137

Xqt raised the priority of this task from Low to High.Jul 18 2016, 4:25 AM

This problem, related to extremely high CPU usage and pywikibot hanging, is fixed.

I have created T140608 to investigate your problem.

jayvdb closed this task as Resolved.Jul 18 2016, 12:26 PM