template.py fails removing a template
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	Xqt
	Jul 6 2019, 7:19 PM

Description

template.py fails removing a template; the bot removes a following table:

https://de.wikipedia.org/w/index.php?title=Flughafen_Tivat&diff=190185925&oldid=189561758&diffmode=source

Probably this is caused by cosmetic_changes.py

Related Objects
Search...

Status	Assigned	Task
Open	None	T229723 Insufficient wikitext regex parser functions in textlib (tracking)
Duplicate	None	T227386 template.py fails removing a template
Resolved	Xqt	T106763 Mandatory dependency on mwparserfromhell
Resolved	Ladsgroup	T88069 Add pure python mwparserfromhell to nightlies

Event Timeline

Xqt created this task.Jul 6 2019, 7:19 PM

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJul 6 2019, 7:19 PM

Xqt triaged this task as High priority.Jul 6 2019, 7:19 PM

Something similar happened to me lately, I think there could be some issue with TemplateMatchBuilder in textlib.py?

In T227386#5311165, @Dvorapa wrote:

Something similar happened to me lately, I think there could be some issue with TemplateMatchBuilder in textlib.py?

Probably yes. I never trust them because there is a restriction on nested templates. Maybe we should use mwparserfromhell or @Dalba's wikitextparser and make it mandatory.

Okay, just tested, this is the issue:

template.py

builder = textlib._MultiTemplateMatchBuilder(self.site)
template_regex = builder.pattern(old)
elif self.getOption('remove'):
    separate_line_regex = re.compile(
        r'^[*#:]* *{0} *\n'.format(template_regex.pattern),
        re.DOTALL | re.MULTILINE)
    replacements.append((separate_line_regex, ''))

    spaced_regex = re.compile(
        r' +{0} +'.format(template_regex.pattern),
        re.DOTALL)
    replacements.append((spaced_regex, ' '))

    replacements.append((template_regex, ''))

template.py compiles new regexes from builder: first removes template + newline, then template + space and finally template itself. This is not really good approach as the regex from textlib is not prepared to be extended like this. It then tries to fullfill the mandatory newline/space at the end and matches way more than it should:

$ python pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> s=pywikibot.Site('de')
>>> p=pywikibot.Page(s, 'Wikipedia:Spielwiese')
>>> from pywikibot import textlib
>>> builder = textlib._MultiTemplateMatchBuilder(s)
>>> t='Flughafen-Verkehrsaufkommen'
>>> template_regex = builder.pattern(t)
>>> import re
>>> separate_line_regex = re.compile(
...                     r'^[*#:]* *{0} *\n'.format(template_regex.pattern),
...                     re.DOTALL | re.MULTILINE)
>>> spaced_regex = re.compile(
...                     r' +{0} +'.format(template_regex.pattern),
...                     re.DOTALL)
>>> l=str(p.text)
>>> re.search(spaced_regex, l)
>>> re.search(separate_line_regex, l)
<re.Match object; span=(4903, 5663), match='{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=>
>>> re.search(template_regex, l)
<re.Match object; span=(4903, 4964), match='{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=>
>>> re.search(template_regex, l).group(0)
'{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=|width=800}}'
>>> re.search(separate_line_regex, l).group(0)
'{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=|width=800}}<!--  ENDE der Grafikdefinition   -->\n<!--                              -->\n\n{| class="wikitable sortable zebra" style="text-align:right;"\n|+ Flughafen Tivat – Verkehrszahlen 2005–2017<ref name="statistics" />\n|-\n! Jahr !! Fluggastaufkommen !! Flugbewegungen\n|-\n| 2017 || 1.129.716 || 6.323\n|-\n| 2016 || 979.432 || 5.985\n|-\n| 2015 || 895.050 || 5.422\n|-\n| 2014 || 910.264 || 5.281\n|-\n| 2013 || 868.343 || 5.198\n|-\n| 2012 || 725.412 || 4.605\n|-\n| 2011 || 647.184 || 4.531\n|-\n| 2010 || 541.870 || 4.017\n|-\n| 2009 || 532.080 || 4.226\n|-\n| 2008 || 570.636 || 4.630\n|-\n| 2007 || 574.011 || 4.079\n|-\n| 2006 || 451.289 || 3.261\n|-\n| 2005 || 377.013 || 2.522\n|}\n\n== Weblinks ==\n{{commonscat|Tivat Airport}}\n'
>>> separate_line_regex
re.compile('^[*#:]* *\\{\\{ *([Vv][Oo][Rr][Ll][Aa][Gg][Ee]:|[Tt][Ee][Mm][Pp][Ll][Aa][Tt][Ee]:|[mM][sS][gG]:)?[Ff]lughafen\\-Verkehrsaufkommen(?P<parameters>\\s*\\|.+?|) *}} *\\n', re.MULTILINE|re.DOTALL)

In T227386#5311328, @Xqt wrote:

I never trust them because there is a restriction on nested templates.

Me too

Maybe we should use mwparserfromhell or @Dalba's wikitextparser and make it mandatory.

Maybe in the future, I like the idea of using and cooperating with other py-wiki projects.

The regex from textlib is not prepared to be extended like this. It then tries to fullfill the mandatory newline/space at the end and matches way more than it should.

Okay, this will need a better approach. On both template.py and textlib.py sides we can not do much. We can a) prepare a better regex in template.py in-place b) use mwparser/wtparser here instead - adds mandatory dependency c) fix the TemplateMatchBuilder regex for these cases d) use recursive patterns from PyPI regex library (?R) in textlib.py - adds mandatory dependency (any other possibilities?)

Probably
\{\{ *(Vorlage:|Template:|[mM][sS][gG]:)?Flughafen-Verkehrsaufkommen(?P<parameters>\s*\|[^}]+?|) *}}
for the pattern where . is replaced with [^}] ?
Nested templates aren't supported there, see teh TODO-comment.

Maybe? Or better to use NESTED_TEMPLATE_REGEX as suggested?

In T227386#5311416, @Dvorapa wrote:

Maybe? Or better to use NESTED_TEMPLATE_REGEX as suggested?

Replacing . with [^}] causes template_bot_tests.py to fail. Seems there is not a very trivial solution.

Xqt added a parent task: T229723: Insufficient wikitext regex parser functions in textlib (tracking).Aug 3 2019, 9:49 AM

Xqt added a subtask: T106763: Mandatory dependency on mwparserfromhell.Apr 12 2021, 4:41 PM

JJMC89 closed subtask T106763: Mandatory dependency on mwparserfromhell as Resolved.Apr 13 2021, 4:05 PM

Xqt closed this task as a duplicate of T110529: template.py does not recognize nested templates.Mar 15 2022, 8:24 PM

template.py fails removing a templateClosed, DuplicatePublicActions

Description

Related ObjectsSearch...

Event Timeline

template.py fails removing a template
Closed, DuplicatePublic
Actions

Related Objects
Search...