Page MenuHomePhabricator

parser passes 'UNIQ' tokens to hook, instead of text
Closed, ResolvedPublic

Description

Author: ittayd

Description:
created a parser function:
public function foreach2Hook(&$parser, $text, $pattern, $replacement, $insep =
',', $outsep = ',' ) {

print($text);

}

used it in a page like:
{{ #foreach2: <pre>test</pre> }}

the printed string is a UNIQ..QINU identifier

since the actual content is in $matches, a local variable of strip, i can't
access it


Version: 1.8.x
Severity: normal
OS: Linux
Platform: PC

Details

Reference
bz8451

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:30 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz8451.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

I think you want to call $parser->mStripState->unstripBoth( $text )?

ittayd wrote:

shouldn't the parser do it for me? i should get either unprocessed text or
completely processed.

robchur wrote:

You need to look again at the code which is registering the hook callback in the
first place...there are a number of options available to tell the parser what it
should pass to you, and what you're going to pass back to it.

ittayd wrote:

function setHook( $tag, $callback ) - no parameters
function setFunctionHook( $id, $callback, $flags = 0 ) - flags can only be
SFH_NO_HASH

david.sledge wrote:

Getting the same problem with a clean install of MW v1.11.0. And I've got the same question as Ittay: Shouldn't the input the hooked function receives already be processed? Additionally, I've got to ask: is this behavior intended by design? Is the solution, that Simetrical provided, documented in the code documentation, on meta.wikimedia.org, and/or mediawiki.org? And why doesn't setFunctionHook() allow one to specify the level of parsing of the input parameters (i.e. parsed, raw, or something in between)?

bacher wrote:

I hit the same problem while working on setHook using the parser class.

my code:

$wgHooks['ParserBeforeStrip'][] = 'tsRegisterInlineQueries'; // execute inline query

function tsRegisterInlineQueries( ) {

global $wgParser;
$wgParser->setHook( 'ask', 'testIQ' ); 
return true;

}

function testIQ($val, $params, &$parser){

return "blabla";

}

Now the wikimarkup:
<ask>blub</ask> should be replaced by: "blabla" but isn't.

I think i have hunted it down: there is the line in Parser.php in the parse() method:
303: wfRunHooks( 'ParserBeforeStrip', array( &$this, &$text, &$this->mStripState ) );
304: $text = $this->strip( $text, $this->mStripState );

in Line 303 the setHook will be executed, in Line 304 the strip function will be called. the StripState contains all the replacements Strings.

Now in strip the Replacement Strings will change, BUT: the calling parse function will never recognice since the header of the strip function looks like this:

561: function strip( $text, $state, $stripcomments = false , $dontstrip = array () ) {

the state is not given as a pointer. any change to it, will get lost!

changing line 561 to:
561: function strip( $text, &$state, $stripcomments = false , $dontstrip = array () ) {

will solve the problem. This should be a problem everywhere where strip is called from, since the replacement strings will get lost.

the following diff should fix it:
561c561

< function strip( $text, &$state, $stripcomments = false , $dontstrip = array () ) {

function strip( $text, $state, $stripcomments = false , $dontstrip = array () ) {

cheers and have fun joshua bacher

bacher wrote:

The $state variable in the strip function was changed in the following revision from a reference to a variable.

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Parser.php?revision=17880&view=markup

i tested it with MW 1.10 and MW 1.11 and the provided test code does not work like expected. it always shows
up the placeholder indicated by the string \x07UNIQ.

the patch provided is broken. Find a better one here:
http://bacher.bash-it.de/download/parser.patch

cheers josh

ssanbeg wrote:

From the diff, it looks like one reference was dropped in that revision; re-added in r27280

bacher wrote:

the revision does not fix the bug. the changed line
415 $text = $this->strip( $text, &$this->mStripState )

does not solve it. All function calls to strip() need the $state (especially the parse() function for me) variable passed by reference.
If not the general ReplacementArray will not change in the calling function and therefore the placeholder UNIQ will show up instead of
the intended replacement because it is not replaced

david.sledge wrote:

After messing around with extensions a bit, here's the issue with the UNIQ place-holder text. The policy of tags is that any text returned from them be in HTML. Parser functions have both the input and the returned text parsed.

Let's take the following:

{{#rflxsv:<tag>text</tag>}}

where "rflxsv" is a parser function that just takes the first argument and spits it back out, and "tag" is a tag that creates an HTML anchor (like Cite or the DPL/DPL2 extension). Without "UNIQ" placeholder, the end result is that links generated by the tag extension (along other tags that aren't normally allowed in wiki-markup like <form> and <input>) end up being double-parsed/escaped. So instead of getting:

<a href="someURL">text</a>

You instead get:

&lt;a href="someURL"&gt;text&lt;/a&gt;

bacher wrote:

once again. i only tested the behaviour with my whole extension. After just limiting it to the Functions that i gave as a a proof of concept i found out that, with a fresh mw 1.11 installation candidate, the poc fails.

this means: somehow my code breaks the things not the parser class.

cheers josh