Recursive tags in extensions.
OpenPublic

Description

Author: andy753421

Description:
I made an extension to allow for easier discussions (an example is here
http://moacad.com/wiki/index.php?title=Talk:Changelog) but in doing so I noticed
that the current extractTags function in Parser.php would extract the tag
starting at the beginning of the tag (<tag>) and would stop at the 'first' end
tag (</tag>). For example if someone edited a page and included the text
'<tag><tag>foo</tag>bar</tag>' it would think the tag was only
'<tag><tag>foo</tag>' and would leave off the extra '</tag>' on the end.


Version: 1.4.x
Severity: enhancement

bzimport added projects: MediaWiki-Parser, Parser.Via ConduitNov 21 2014, 8:11 PM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz1310.
bzimport created this task.Via LegacyJan 11 2005, 4:09 AM
bzimport added a comment.Via ConduitJan 11 2005, 4:16 AM

andy753421 wrote:

patch for the Parser.php file, with line numbers

I'm not sure how to include the code for html comments with it so I just put
that separate with an if tag. If anyone would like to change it go ahead. It
may also be sloppy formating or code, this is my first patch so I don't know
much about the style it should be in.

attachment parser_patch.php ignored as obsolete

bzimport added a comment.Via ConduitJan 11 2005, 3:30 PM

andy753421 wrote:

more recent patch

I think this should be more usefull. It replaces lines 234-285 in the 1.4beta4
version of Parser.php

attachment parser_patch.php ignored as obsolete

brion added a comment.Via ConduitJul 10 2005, 10:58 PM

Not sure this would be desireable; may have side effects.

Anyway the patch is very out of date...

bzimport added a comment.Via ConduitOct 1 2005, 10:38 AM

avarab wrote:

Not a patch, removing patch keyword.

bzimport added a comment.Via ConduitJul 18 2007, 6:45 PM

shardsofmetal wrote:

Isn't this bug fixed in 1.9 and maybe earlier with Parser::recursiveTagParse()?

bzimport added a comment.Via ConduitSep 22 2007, 3:00 PM

aki.ikgw wrote:

diff file from mediawiki 1.11.0

attachment Parser.php.diff ignored as obsolete

bzimport added a comment.Via ConduitOct 1 2007, 9:31 PM

ssanbeg wrote:

*** Bug 11528 has been marked as a duplicate of this bug. ***

IAlex added a comment.Via ConduitAug 25 2009, 7:12 PM
  • Bug 20350 has been marked as a duplicate of this bug. ***
IAlex added a comment.Via ConduitNov 7 2009, 11:05 AM
  • Bug 21426 has been marked as a duplicate of this bug. ***
Chad added a comment.Via ConduitJul 17 2010, 11:00 AM

Removing need-review keyword. Patches are ancient and not very useful, so I've marked them obsolete.

bzimport added a comment.Via ConduitMar 18 2011, 5:49 PM

sharon.dagan wrote:

This bug still exist in 1.16.2 - any plans to fix it?

MarkAHershberger added a comment.Via ConduitMar 18 2011, 8:08 PM

Probably exists in 1.17 (about to be released), too. Can you check out a copy of HEAD from subversion to check?

svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3 wiki

Also, if you're running into this, let us know the particulars of your case.

Bawolff added a comment.Via ConduitMar 18 2011, 8:13 PM

/me wonders if the proposed change in this bug is really the desired behaviour. Disallowing nested start <tag>'s seems sane to me.

bzimport added a comment.Via ConduitMar 18 2011, 8:53 PM

bugzilla.wikimedia wrote:

(In reply to comment #13)

/me wonders if the proposed change in this bug is really the desired behaviour.
Disallowing nested start <tag>'s seems sane to me.

That depends on semantics and the code handling them.

bzimport added a comment.Via ConduitMar 18 2011, 10:07 PM

sharon.dagan wrote:

Working with the latest code from trunk/phase3, as suggested.
My test case extension is a very basic tag hook:

File: Bug1310_TestCase.php

<?php

$wgHooks['ParserFirstCallInit'][] = 'onParserFirstCallInit';

function onParserFirstCallInit( &$parser ) {

$parser->setHook( 'foo', 'onTag' );
return true;

}

function onTag( $input, $args, $parser, $frame ) {

wfDebug( $input );

return 'xxx';
}

?>

in LocalSettings.php the extension is loaded the normal way.

The input for the test case is:
'<foo>Begin1... <foo>Begin2... ...End2</foo> ...End2</foo>'

The $input that gets into onTag() should be:
'Begin1... <foo>Begin2... ...End2</foo> ...End2'

However,
In wfDebug I get: 'Begin1... <foo>Begin2... ...End2'
And in the browser I get: 'xxx...End1</foo>'

bzimport added a comment.Via ConduitMar 18 2011, 10:10 PM

sharon.dagan wrote:

OOPS! (why can't I edit my comment?)

The input for the test case is:
'<foo>Begin1... <foo>Begin2... ...End2</foo> ...End1</foo>'

The $input that gets into onTag() should be:
'Begin1... <foo>Begin2... ...End2</foo> ...End1'

However,
In wfDebug I get: 'Begin1... <foo>Begin2... ...End2'
And in the browser I get: 'xxx...End1</foo>'

Peachey88 added a comment.Via ConduitApr 30 2011, 12:10 AM

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

bzimport added a comment.Via ConduitAug 24 2011, 3:40 PM

john wrote:

-patch, no non-obselete patches.

bzimport added a comment.Via ConduitOct 9 2011, 3:41 PM

theom3ga wrote:

Same bug here. I'm developing an extension to semantically tag the document, so there are going to be recursive tags, and I'm getting this bug too, test case is the same as in Comment 15.

Any alternative to tag extensions for this?

Bawolff added a comment.Via ConduitOct 11 2011, 7:39 PM

Any alternative to tag extensions for this?

Well parser function style things perhaps ({{#foo:...}})

bzimport added a comment.Via ConduitJan 28 2012, 3:30 AM

sharon.dagan wrote:

Allows extension tags to be nested

This patch allows tag extensions to be nested. Only the most outer tags are parsed, everything in between is passed to the callback.

Given the wiki text "<foo>123<foo>456</foo>789</foo>", foo's callback will be called with the text "123<foo>456</foo>789".

Attached: Bug1310.patch

MarkAHershberger added a comment.Via ConduitFeb 10 2012, 6:27 PM

Thanks for this patch! We've been in a code slush (not quite a freeze) for a few weeks so we're just getting around to looking at these.

We're also doing a lot of parser work, so I'm not sure how relevant this is, but I'll ask them to take a look.

bzimport added a comment.Via ConduitFeb 11 2012, 3:33 PM

wicke wrote:

From an implementation standpoint, simply matching up the closest start/end tag is definitely easier than building a stack to enable nested tag pairs. I am also not convinced that nested tag pairs would be a good UI design, as it seems to make the distinction of regular wiki content and input to an extension harder than necessary.

Could you present a compelling use case that demonstrates the need to use the same tags both to delimit the extension inputs and the input itself?

bzimport added a comment.Via ConduitFeb 11 2012, 3:35 PM

wicke wrote:

The last sentence should naturally end with *and in the input itself*. An edit button would be handy sometimes.

Bawolff added a comment.Via ConduitFeb 11 2012, 8:15 PM

(In reply to comment #23)

From an implementation standpoint, simply matching up the closest start/end tag
is definitely easier than building a stack to enable nested tag pairs. I am
also not convinced that nested tag pairs would be a good UI design, as it seems
to make the distinction of regular wiki content and input to an extension
harder than necessary.

Could you present a compelling use case that demonstrates the need to use the
same tags both to delimit the extension inputs and the input itself?

There are two examples I could see where this may be wanted

  • <ref> tags so people could do nested ref stuff without {{#tag:ref hackery.
  • <source> tag's for when highlighting xml-ish things that have a <source> inside them (since they would usually have a closing source tag as well, but that's more like accidentally fixing an issue then actually fixing an issue).

But I also tend to agree that it may not be worth the effort.

DanielFriesen added a comment.Via ConduitFeb 11 2012, 10:20 PM

That as a <source> fix sounds to me more like a hack to fix a non-issue to me. The <source> isn't written to explicitly do anything special with any <source> tags inside of it so that does not sound like the thing we should be aiming for. (And sounds like it would break if someone used <source> to document example arguments to the opening <source> tag)

Switching over to a complete even tag matching could change the behaviour of existing content -- ie: <foo><foo></foo> suddenly having different behaviour -- so I'd reject the patch we have on those grounds alone.

We probably also want to write a test to make sure that <nowiki><nowiki></nowiki> doesn't suddenly start turning everything after it into nowiki content when it was written expecting it to display a "<nowiki>" tag verbatim in the page for documentation.

I think that if we do implement recursive tags, it's going to have to be an explicit op-in by extensions feature. ie: Only tags with a specific option will be parsed recursively. We can enable it on <ref> but we may not want to enable it for <source>, and definitely don't want it on <nowiki>.

bzimport added a comment.Via ConduitFeb 11 2012, 11:26 PM

wicke wrote:

I also share Daniel's concerns about changing the behavior of existing content.

In the longer term, I think that it would be desirable to add a fully parsed input mode for extension tag contents. This could take the form of a token stream or a DOM fragment built from those. Extensions could choose between plain-text input and tokens, so this would be opt-in. This is also very close to how this is currently handled in the Parsoid parser (http://mediawiki.org/wiki/Parsoid), although there still are some issues to solve (e.g. an unclosed html comment in the extension content). Nested nowiki is covered by parser tests already, and works as expected.

Are there extensions that need nested extension tags, but otherwise unparsed input?

brion added a comment.Via ConduitMar 13 2012, 9:48 PM
  • Bug 35173 has been marked as a duplicate of this bug. ***
Qgil added a comment.Via ConduitMay 17 2014, 12:21 AM

This discussion has been stalled during more than two years. Has there been any change helping to its resolution in a way or another?

MZMcBride added a comment.Via ConduitMay 17 2014, 12:26 AM

(In reply to Quim Gil from comment #29)

This discussion has been stalled during more than two years. Has there been
any change helping to its resolution in a way or another?

Had there been, this bug would likely already reflect such a change. I'm not sure what you're asking.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.