template.py fails removing a template; the bot removes a following table:
Probably this is caused by cosmetic_changes.py
template.py fails removing a template; the bot removes a following table:
Probably this is caused by cosmetic_changes.py
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T229723 Insufficient wikitext regex parser functions in textlib (tracking) | |||
Duplicate | None | T227386 template.py fails removing a template | |||
Resolved | Xqt | T106763 Mandatory dependency on mwparserfromhell | |||
Resolved | Ladsgroup | T88069 Add pure python mwparserfromhell to nightlies |
Something similar happened to me lately, I think there could be some issue with TemplateMatchBuilder in textlib.py?
Probably yes. I never trust them because there is a restriction on nested templates. Maybe we should use mwparserfromhell or @Dalba's wikitextparser and make it mandatory.
Okay, just tested, this is the issue:
builder = textlib._MultiTemplateMatchBuilder(self.site) template_regex = builder.pattern(old) elif self.getOption('remove'): separate_line_regex = re.compile( r'^[*#:]* *{0} *\n'.format(template_regex.pattern), re.DOTALL | re.MULTILINE) replacements.append((separate_line_regex, '')) spaced_regex = re.compile( r' +{0} +'.format(template_regex.pattern), re.DOTALL) replacements.append((spaced_regex, ' ')) replacements.append((template_regex, ''))
template.py compiles new regexes from builder: first removes template + newline, then template + space and finally template itself. This is not really good approach as the regex from textlib is not prepared to be extended like this. It then tries to fullfill the mandatory newline/space at the end and matches way more than it should:
$ python pwb.py shell Welcome to the Pywikibot interactive shell! >>> s=pywikibot.Site('de') >>> p=pywikibot.Page(s, 'Wikipedia:Spielwiese') >>> from pywikibot import textlib >>> builder = textlib._MultiTemplateMatchBuilder(s) >>> t='Flughafen-Verkehrsaufkommen' >>> template_regex = builder.pattern(t) >>> import re >>> separate_line_regex = re.compile( ... r'^[*#:]* *{0} *\n'.format(template_regex.pattern), ... re.DOTALL | re.MULTILINE) >>> spaced_regex = re.compile( ... r' +{0} +'.format(template_regex.pattern), ... re.DOTALL) >>> l=str(p.text) >>> re.search(spaced_regex, l) >>> re.search(separate_line_regex, l) <re.Match object; span=(4903, 5663), match='{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=> >>> re.search(template_regex, l) <re.Match object; span=(4903, 4964), match='{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=> >>> re.search(template_regex, l).group(0) '{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=|width=800}}' >>> re.search(separate_line_regex, l).group(0) '{{Flughafen-Verkehrsaufkommen|iata="TIV"|Legende=|width=800}}<!-- ENDE der Grafikdefinition -->\n<!-- -->\n\n{| class="wikitable sortable zebra" style="text-align:right;"\n|+ Flughafen Tivat – Verkehrszahlen 2005–2017<ref name="statistics" />\n|-\n! Jahr !! Fluggastaufkommen !! Flugbewegungen\n|-\n| 2017 || 1.129.716 || 6.323\n|-\n| 2016 || 979.432 || 5.985\n|-\n| 2015 || 895.050 || 5.422\n|-\n| 2014 || 910.264 || 5.281\n|-\n| 2013 || 868.343 || 5.198\n|-\n| 2012 || 725.412 || 4.605\n|-\n| 2011 || 647.184 || 4.531\n|-\n| 2010 || 541.870 || 4.017\n|-\n| 2009 || 532.080 || 4.226\n|-\n| 2008 || 570.636 || 4.630\n|-\n| 2007 || 574.011 || 4.079\n|-\n| 2006 || 451.289 || 3.261\n|-\n| 2005 || 377.013 || 2.522\n|}\n\n== Weblinks ==\n{{commonscat|Tivat Airport}}\n' >>> separate_line_regex re.compile('^[*#:]* *\\{\\{ *([Vv][Oo][Rr][Ll][Aa][Gg][Ee]:|[Tt][Ee][Mm][Pp][Ll][Aa][Tt][Ee]:|[mM][sS][gG]:)?[Ff]lughafen\\-Verkehrsaufkommen(?P<parameters>\\s*\\|.+?|) *}} *\\n', re.MULTILINE|re.DOTALL)
Me too
Maybe we should use mwparserfromhell or @Dalba's wikitextparser and make it mandatory.
Maybe in the future, I like the idea of using and cooperating with other py-wiki projects.
The regex from textlib is not prepared to be extended like this. It then tries to fullfill the mandatory newline/space at the end and matches way more than it should.
Okay, this will need a better approach. On both template.py and textlib.py sides we can not do much. We can a) prepare a better regex in template.py in-place b) use mwparser/wtparser here instead - adds mandatory dependency c) fix the TemplateMatchBuilder regex for these cases d) use recursive patterns from PyPI regex library (?R) in textlib.py - adds mandatory dependency (any other possibilities?)
Probably
\{\{ *(Vorlage:|Template:|[mM][sS][gG]:)?Flughafen-Verkehrsaufkommen(?P<parameters>\s*\|[^}]+?|) *}}
for the pattern where . is replaced with [^}] ?
Nested templates aren't supported there, see teh TODO-comment.
Replacing . with [^}] causes template_bot_tests.py to fail. Seems there is not a very trivial solution.