Page MenuHomePhabricator

extract_templates_and_params parser bugs loading w:en:Main_Page with mwparserfromhell
Closed, ResolvedPublic

Description

Calling extract_templates_and_params with use_mwparserfromhell enabled on the English Wikipedia Main Page results in many 'results' which are not a template

i.e.

$ PYTHONPATH="." python -c "import pywikibot; pywikibot.config.use_mwparserfromhell=True; print pywikibot.extract_templates_and_params(pywikibot.Page(pywikibot.Site('en', 'wikipedia'), 'Main Page').text)"

produces:

[(u'NUMBEROFARTICLES', {}), (u'#if:{{Main Page banner}}', {u'1': u'\n<table id="mp-banner" style="width: 100%; margin:4px 0 0 0; background:none; border-spacing: 0px;">\n<tr><td class="MainPageBG" style="padding:2px 8px; background-color:#fffaf5; border:1px solid #f2e0ce; color:#000; font-size:100%;">{{Main Page banner}}\n</td></tr>\n</table>\n'}), (u'Main Page banner', {}), (u'Main Page banner', {}), (u"#ifexpr:{{formatnum:{{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}}}|R}}>150", {u'1': u"From today's featured article", u'2': u'Featured article <span style="font-size:85%; font-weight:normal;">(Check back later for today\'s.)</span>'}), (u"formatnum:{{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}}}", {u'1': u'R'}), (u"PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}", {}), (u'#time:F j, Y', {}), (u"#ifexpr:{{formatnum:{{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}}}|R}}>150", {u'1': u"{{Wikipedia:Today's featured article/{{#time:F j, Y}}}}", u'2': u"{{Wikipedia:Today's featured article/{{#time:F j, Y|-1 day}}}}"}), (u"formatnum:{{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}}}", {u'1': u'R'}), (u"PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}", {}), (u'#time:F j, Y', {}), (u"Wikipedia:Today's featured article/{{#time:F j, Y}}", {}), (u'#time:F j, Y', {}), (u"Wikipedia:Today's featured article/{{#time:F j, Y|-1 day}}", {}), (u'#time:F j, Y', {u'1': u'-1 day'}), (u'Did you know', {}), (u'In the news', {}), (u'Wikipedia:Selected anniversaries/{{#time:F j}}', {}), (u'#time:F j', {}), (u'#switch:{{CURRENTDAYNAME}}', {u'1': u'Monday', u'2': u'', u'Friday': u'\n<table id="mp-middle" style="width:100%; margin:4px 0 0 0; background:none; border-spacing: 0px;">\n<tr>\n<td class="MainPageBG" style="width:100%; border:1px solid #f2cedd; background:#fff5fa; vertical-align:top; color:#000;">\n<table id="mp-center" style="width:100%; vertical-align:top; background:#fff5fa; color:#000;">\n<tr>\n<td style="padding:2px;"><h2 id="mp-tfl-h2" style="margin:3px; background:#f2cedd; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #bfa3af; text-align:left; color:#000; padding:0.2em 0.4em">From today\'s featured list</h2></td>\n</tr><tr>\n<td style="color:#000;"><div id="mp-tfl" style="padding:2px 5px;">{{#ifexist:Wikipedia:Today\'s featured list/{{#time:F j, Y}}|{{Wikipedia:Today\'s featured list/{{#time:F j, Y}}}}|{{TFLempty}}}}</div></td>\n</tr>\n</table>\n</td>\n</tr>\n</table>'}), (u'CURRENTDAYNAME', {}), (u"#ifexist:Wikipedia:Today's featured list/{{#time:F j, Y}}", {u'1': u"{{Wikipedia:Today's featured list/{{#time:F j, Y}}}}", u'2': u'{{TFLempty}}'}), (u'#time:F j, Y', {}), (u"Wikipedia:Today's featured list/{{#time:F j, Y}}", {}), (u'#time:F j, Y', {}), (u'TFLempty', {}), (u'#ifexist:Template:POTD protected/{{#time:Y-m-d}}', {u'1': u"Today's featured picture ", u'2': u' Featured picture&ensp;<span style="font-size:85%; font-weight:normal;">(Check back later for today\'s.)</span>'}), (u'#time:Y-m-d', {}), (u'#ifexist:Template:POTD protected/{{#time:Y-m-d}}', {u'1': u'{{POTD protected/{{#time:Y-m-d}}}}', u'2': u'{{POTD protected/{{#time:Y-m-d|-1 day}}}}'}), (u'#time:Y-m-d', {}), (u'POTD protected/{{#time:Y-m-d}}', {}), (u'#time:Y-m-d', {}), (u'POTD protected/{{#time:Y-m-d|-1 day}}', {}), (u'#time:Y-m-d', {u'1': u'-1 day'}), (u'Other areas of Wikipedia', {}), (u"Wikipedia's sister projects", {}), (u'Wikipedia languages', {}), (u'Main Page interwikis', {}), (u'noexternallanglinks', {})]

compare to when use_mwparserfromhell is disabled

$ PYTHONPATH="." python -c "import pywikibot; pywikibot.config.use_mwparserfromhell=False; print repr(pywikibot.extract_templates_and_params(pywikibot.Page(pywikibot.Site('en', 'wikipedia'), 'Main Page').text))"
[(u'NUMBEROFARTICLES', {}), (u'Main Page banner', {}), (u'Did you know', {}), (u'In the news', {}), (u'CURRENTDAYNAME', {}), (u'TFLempty', {}), (u'Other areas of Wikipedia', {}), (u"Wikipedia's sister projects", {}), (u'Wikipedia languages', {}), (u'Main Page interwikis', {}), (u'noexternallanglinks', {})]

Version: core-(2.0)
Severity: normal
See Also:
https://github.com/earwig/mwparserfromhell/issues/10

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:37 AM
bzimport set Reference to bz69384.
bzimport added a subscriber: Unknown Object (????).

Note that Page.botMayEdit() uses this method via Page.templatesWithParams() to look for {{nobots}}, and needs to catch an exception when it tries to instantiate a Link using these invalid 'template' names.

See comment in

https://git.wikimedia.org/blobdiff/pywikibot%2Fcore.git/7e3772cae04f95cb55b223a198fb6350f73b0639/pywikibot%2Fpage.py

Dalba subscribed.

I cannot reproduce this issue. The result is the same regardless of the boolean value of use_mwparserfromhell. Can anyone else confirm, please?

Xqt triaged this task as Medium priority.Aug 3 2019, 10:07 AM
Xqt subscribed.

Still occurres with mwpfh 0.6. @Earwig: any idea?

PARSER

The following table shows the templates found:

Entrymwparsterfromhellregexwikitextparser mwpfh with matches
NUMBEROFARTICLESXX-X
#if:{{Main Page banner}}X---
Main Page bannerXXXX
Main Page bannerX-XX
formatnum:{{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}}}X--X
PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}X--X
#time:F j, YX---
formatnum:{{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}}}X--X
PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y}}X--X
#time:F j, YX---
Wikipedia:Today's featured article/{{#time:F j, Y}}X-XX
#time:F j, YX---
Wikipedia:Today's featured article/{{#time:F j, Y !-1 day}}X-XX
#time:F j, YX---
Did you knowXXXX
In the newsXXXX
Wikipedia:Selected anniversaries/{{#time:F j}}X-XX
#time:F jX---
#switch:{{CURRENTDAYNAME}}X---
CURRENTDAYNAMEXX-X
#ifexist:Wikipedia:Today's featured list/{{#time:F j, Y}}X---
#time:F j, YX---
Wikipedia:Today's featured list/{{#time:F j, Y}}X-XX
#time:F j, YX---
TFLemptyXXXX
#ifexist:Template:POTD protected/{{#time:Y-m-d}}X---
#time:Y-m-dX---
#ifexist:Template:POTD protected/{{#time:Y-m-d}}X---
#time:Y-m-dX---
POTD protected/{{#time:Y-m-d}}X-XX
#time:Y-m-dX---
POTD protected/{{#time:Y-m-d !-1 day}}X-XX
#time:Y-m-dX---
Other areas of WikipediaXXXX
Wikipedia's sister projectsXXXX
Wikipedia languagesXXXX
Main Page interwikisXXXX
noexternallanglinksXXXX
#if:{{Wikipedia:Main_Page/Tomorrow}}X---
Wikipedia:Main_Page/TomorrowXXXX

Looks like the best parsing comes with @Dalba's wikitextparser. Any suggestions?

Change 675309 had a related patch set uploaded (by Xqt; author: Xqt):
[pywikibot/core@master] [IMPR] exclude expressions from parsed templated

https://gerrit.wikimedia.org/r/675309

In T71384#6951234, @Xqt wrote:

Still occurres with mwpfh 0.6. @Earwig: any idea?

It's simply that mwparserfromhell doesn't try to ignore "templates" that are actually parser functions or magic words. There's an open issue for it, but I haven't looked at it in a while.

In T71384#6951234, @Xqt wrote:

Still occurres with mwpfh 0.6. @Earwig: any idea?

It's simply that mwparserfromhell doesn't try to ignore "templates" that are actually parser functions or magic words. There's an open issue for it, but I haven't looked at it in a while.

I've removed parser functions from the mwpfh result with the patch given above. Can you review it?

Xqt claimed this task.

Change 675309 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] exclude expressions from parsed template in mwparserfromhell

https://gerrit.wikimedia.org/r/675309