Page MenuHomePhabricator

_tag_pattern in textlib.py can not handle self-closed tags
Closed, ResolvedPublic

Description

Hello,

When using "subst" or "assubst" with template.py, there are some template calls ignored, because "subst" doesn't work between "ref", "gallery" and "nowiki" tags. This works fine because I haven't seen such errors.

The problem is that the exclusion is very large. It seems that the "<ref />" task are not well interpreted, and many templates that could be substed are not.

Is it possible to optimise the code of theses exclusions, to have less false positive ?

Event Timeline

@Hercule: Hi! Steps to reproduce very welcome - is this about Pywikibot, or what is the context?
What is the "improvement needed" in one short sentence, for a more descriptive task summary? Thanks!
Also see https://mediawiki.org/wiki/How_to_report_a_bug

Hello,

I'm sorry, it's my first bug report.

I use the standard Pywikibot's scripts.

For example, this command line :
python /data/project/herculebot/pywikibot-core/pwb.py template -subst -summary:"[[Catégorie:Modèle à subster|modèles à subster]]" -pt:0 AnnTV AnnCin AnnLit AnnMus -always

For this version of the page :
https://fr.wikipedia.org/w/index.php?title=Cali_(chanteur)&oldid=151930327
there is no replacements.

I just moved some references :
https://fr.wikipedia.org/w/index.php?title=Cali_(chanteur)&diff=prev&oldid=151940984

Then, with the same command line, there are some replacements :
https://fr.wikipedia.org/w/index.php?title=Cali_(chanteur)&diff=prev&oldid=151940985

The improvement would be a better detection of the problematic tags in the code of the page. I think that <ref name=xxx/> is interpreted as an opening tag, so the closing one is not detected.

Regards

Aklapper renamed this task from Improvement needed on exclusions for templates substitutions to Better detection of problematic tags (ref, gallery, nowiki) when using subst or assubst for template substitution.Sep 5 2018, 2:50 PM
Aklapper added a project: Pywikibot.

because "subst" doesn't work between "ref", "gallery" and "nowiki" tags

Is I can see in the code -subst is never intended to work inside tags of 'ref', 'gallery', 'poem' or 'pagelist',

I would suggest to investigate this:

I think that <ref name=xxx/> is interpreted as an opening tag, so the closing one is not detected.

at first.

And btw the doc says:

-subst       Resolves the template by putting its text directly into the
             article. This is done by changing {{...}} or {{msg:...}} into
             {{subst:...}}.
             Substitution is not available inside <ref>...</ref>,
             <gallery>...</gallery>, <poem>...</poem> and <pagelist ... />
             tags.

What to do here?
See also:
https://mediawiki.org/wiki/Special:Code/pywikipedia/9543
https://mediawiki.org/wiki/Special:Code/pywikipedia/7884

@Xqt I think you misunderstood, what the op wants to say. In the example page pywikibot detected no things to change, then op moved some references in the text and then the pywikibot detected thing to change.

Dvorapa renamed this task from Better detection of problematic tags (ref, gallery, nowiki) when using subst or assubst for template substitution to _tag_pattern in textlib.py can not handle self-closed tags after inline flags removal.Sep 5 2018, 4:31 PM
Dvorapa added a subscriber: Dalba.

I think I've found the root cause: https://phabricator.wikimedia.org/diffusion/PWBC/browse/master/pywikibot/textlib.py$261

The regex for 'ref' matches also code like this:

<ref name=etwa /> still matched {{template-for-subst}} still matched as one ref tag <ref name=anders>ref II</ref>

and handles this piece as an exception (since rPWBCa3e28f6eab1e3dc8fabe1d080def51430db9f49f). This is wrong behavior.

Dvorapa renamed this task from _tag_pattern in textlib.py can not handle self-closed tags after inline flags removal to _tag_pattern in textlib.py can not handle self-closed tags.Sep 5 2018, 4:36 PM
Dvorapa triaged this task as High priority.
Dvorapa added a project: Pywikibot-textlib.

Try it by yourself:

$ python pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> t="<ref name=etwa /> text <ref> fer </ref>"
>>> import pywikibot.textlib as tl
>>> tl.replaceExcept(t, r"text", r"xext", ['ref'])
'<ref name=etwa /> text <ref> fer </ref>'
>>> tl.replaceExcept(t, r"text", r"xext", [])
'<ref name=etwa /> xext <ref> fer </ref>'

Change 458636 had a related patch set uploaded (by Dalba; owner: dalba):
[pywikibot/core@master] textlib._tag_pattern: Do not mistake self-closing tags with start tag

https://gerrit.wikimedia.org/r/458636

Change 458636 merged by jenkins-bot:
[pywikibot/core@master] textlib._tag_pattern: Do not mistake self-closing tags with start tag

https://gerrit.wikimedia.org/r/458636