Page MenuHomePhabricator

Parser hook output block level corruption
Closed, DuplicatePublic

Assigned To
Authored By
Feb 15 2007, 10:55 PM
Referenced Files
F3640: unstrip.patch
Nov 21 2014, 9:34 PM
F3639: fuckup.txt
Nov 21 2014, 9:34 PM
F3638: fuckuptest.php
Nov 21 2014, 9:34 PM


The parser seems to apply paragraph and pre-formating rules to the html returned
from parser hooks. But output from parser hooks
should be completely left alone. To see the effect, consider the following
parser hook:

function wfRenderFuckup( $input, $argv, &$parser ) {
'a b

c d

e f


the following in wikitext:


produces this html:

<p>xxa b
</p><p>c d
<pre> e f

Disabling tidy does not have any effect on this. Tested with current HEAD.

This is extremely bad for parser hooks that try to output html or JS code loaded
from some file, where
preformated (human readable) code may be found.

Version: 1.9.x
Severity: normal



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:34 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz8997.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 3230
code snippet with parser hook

bah, bugzilla mangeled the test code. here's an attachment.


Created attachment 3231
more test cases

wow, more fun here.

It seems like full wiki formating for lists and indenting is applied to the
parser hook output. Also, blanks before : are turned into &nbsp; for some
reason. HTML output for this test case:

<p>a: b
a&nbsp;: b

<dl><dd> bla
</dd><dt> bla&nbsp;</dt><dd> blubb
<ul><li> foo
</li><li> bar
<ol><li> foo
</li><li> bar

This is really really broken...


ayg wrote:

Isn't this just because parser-hook text is substituted in on an early pass and
then left to the mercies of the rest of the parser? There should be some option
to <nowiki> it all, presumably. Or . . . you could just add <nowiki> yourself?
(Well, that doesn't stop paragraph breaks, so something intermediate between
<nowiki> and <pre> would be needed.)

Well, as far as I see, not exactly an "early" pass, but *too* early: but the
output of the hook is put pack into the text before the "final block-level
transformation" (or something). Nowiki processing was already applied at this
stage, so the <nowiki> would be passed through verbatim - it also wouldn't work
because nowiki would *escape* any HTML tags.

What would really be needed is to unstrip results of parser hooks only after
*all* transforms are done. But the parser being as convoluted as it is, I have
no idea how to do about this. Also, this behavior has apparently been around for
quite some time - so some parser hooks may even rely on it (would be a bad idea,
but should be checked).

Anyway, Brion told me to "not treat this as an emergency", so I'll set this back
to normal severity (even though I feel the current behavior is severely broken).
Also, people seem to remember that this problem has already be files - I can't
find the report though. If you do, feel free to dupe this.

Hm, from experimenting with an extension I wrote, it seems to me that the right
place to "put back" results from a parser hook would be where the
ParserAfterTidy hook is called (or, if tidy should process it,
ParserBeforeTidy). To achieve this, the strip/unstrip mechanism should probably
be rigged so it can be specified what is unstripped when. I'm not clear on how
to do that, though.

wilson.jim.r wrote:

I will be attaching a Parser patch shortly which hides extension tag output
until after Tidy is finished.

One caveat, this patch causes two additional test failures that previously worked:

Running test Parser hook: static parser hook not inside a comment... FAILED!
Running test Parser hook: static parser hook inside a comment... FAILED!

The former expects "<p>hello, world\n</p>" and gets "<p>\nhello, world\n</p>"
The latter expects "<p><br />\n</p>" and gets "<p>\n</p>"

In both cases the differences are whitespace only.

wilson.jim.r wrote:

Parser patch to resolve whitespace mangling of extension output.

Note comments in bug regarding test failures as a result of applying this

attachment bug-8997-Parser.diff ignored as obsolete

wilson.jim.r wrote:

Previous patch seems to be breaking Cite. Specifically, <ref> tags are not
being replaced by "[1]" links - though they still render properly in the
<references> section. Not sure yet why this is.

wilson.jim.r wrote:

Looks like the parsing stops after putting in the <!-- LINK # --> flags. The
<sup> tags are there for the references, but only <!--LINK #--> can be found inside.

These LINK comments are being generated by the makeLinkHolder() function. Still
not sure why they're not getting resolved.

publicsean wrote:

The problem comes from the doBlockLevels() call in parse() which renders lists
and inserts paragraph tags. Tags shouldn't be unstripped until after lists are
rendered but currently they are. So all that needs to happen is to move this
line of code:
$text = $this->mStripState->unstripGeneral( $text );
to just below the doBlockLevels() call.

The only effect this would have is on gallery tags, but the list functionality
in gallery tags is buggy anyway.

Does anyone else see any problems with this fix?

sergey.chernyshev wrote:

Parser patch to resolve whitespace mangling of extension output.

Sean J's recommendations seem to work fine. I made new patch for this issue for trunk (works fine on current stable 1.10 branch too).

I'll be happy to see it in the core.


sergey.chernyshev wrote:

This seems to be the case in the new (1.12) parser as well.

This patch breaks a lot of parser tests:

16 previously passing test(s) now FAILING! :(

  • Parser hook: empty input
  • Parser hook: empty input using terminated empty elements
  • Parser hook: empty input using terminated empty elements (space before)
  • Parser hook: basic input
  • Parser hook: case insensitive
  • Parser hook: case insensitive, redux
  • Parser hook: nested tags
  • Parser hook: basic arguments
  • Parser hook: argument containing a forward slash
  • Parser hook: empty input using terminated empty elements (bug 2374)
  • Parser hook: basic arguments using terminated empty elements
  • Parser hook: static parser hook not inside a comment
  • Parser hook: static parser hook inside a comment
  • Special page transclusion
  • Special page transclusion twice (bug 5021)
  • Gallery

philipp.spitzer wrote:

A note about the workaround (replace output with marker in the parser hook - back-replace by content in the ParserAfterTidy) as described in causes the final content to be wrapped with <p>. This causes the final output to fail the HTML validation in case the parser hook produces output that is not allowed inside <p>, e.g. <noscript> or <div>.
I did not find a way to prevent this.

Maybe a parameter would be needed where a tag extension can "tell" Mediawiki whether it returns inline or block elements? (or have I overlooked something in the documentation?) Otherwise I cannot find a way how to prevent Mediawiki to place the output in <p>s?

  • This bug has been marked as a duplicate of bug 1319 ***