Page MenuHomePhabricator

Parser::setFunctionTagHook() produces non-removed UNIQ markers when the tag is defined right after section start === ... === in page wikitext
Open, LowPublic

Description

Author: questpc

Description:
I like Parser::setFunctionTagHook(), because it provides greater flexibility, for example it can protect the output from additional wikitext transformation (it allows to return array, just like parser function hook, instead of plain text).

However I am having weird problems with it. Here is the result my parser hook produces:

array(3) {

[0]=>
string(537) "<div class="thumb tright" style="width:400px; height:auto; ">

<div class="thumbinner" style="width:400px; height:auto; ">
<div class="jqgmap_canvas" data-jqgmap="{&quot;lat&quot;:&quot;55.8207542703&quot;,&quot;lng&quot;:&quot;37.6253418334&quot;,&quot;zoom&quot;:14,&quot;markers&quot;:[{&quot;lat&quot;:55.8207542703,&quot;lng&quot;:37.6253418334,&quot;content&quot;:&quot;&lt;p&gt;\u041f\u041d\u0414 7\n&lt;\/p&gt;&quot;}]}" id="jqgmap_canvas0" style="width:400px; height:300px; "></div>
<div class="thumbcaption"></div>
</div>
</div>"

["isHTML"]=>
bool(true)
["noparse"]=>
bool(true)

}

Note that div.jqgmap_canvas has html5 data attribute, which is parsed via jQuery.data() at later stage.

Now there is the ParserOutput:

object(ParserOutput)#182 (32) {

["mText"]=>
string(6425) "<p>Психоневрологический диспансер № 7 СВАО г. Москвы

</p>
<table id="toc" class="toc"><tr><td><div id="toctitle"><h2>Содержание</h2></div>
<ul>
<li class="toclevel-1"><a href="#.7FUNIQ45804c9858a0e381-h-0--QINU.7F_.D0.9F.D0.BE.D0.BB.D0.BD.D0.BE.D0.B5_.D0.BD.D0.B0.D0.B8.D0.BC.D0.B5.D0.BD.D0.BE.D0.B2.D0.B0.D0.BD.D0.B8.D0.B5"><span class="tocnumber">1</span> <span class="toctext">UNIQ45804c9858a0e381-h-0--QINU Полное наименование</span></a></li>
<li class="toclevel-1"><a href="#.7FUNIQ45804c9858a0e381-h-1--QINU.7F_.D0.90.D0.B4.D1.80.D0.B5.D1.81"><span class="tocnumber">2</span> <span class="toctext">UNIQ45804c9858a0e381-h-1--QINU Адрес</span></a></li>
<li class="toclevel-1 tocsection-3"><a href="#.D0.A2.D0.B5.D0.BB.D0.B5.D1.84.D0.BE.D0.BD.D1.8B"><span class="tocnumber">3</span> <span class="toctext">Телефоны</span></a></li>
<li class="toclevel-1 tocsection-4"><a href="#.D0.93.D1.80.D0.B0.D1.84.D0.B8.D0.BA_.D1.80.D0.B0.D0.B1.D0.BE.D1.82.D1.8B"><span class="tocnumber">4</span> <span class="toctext">График работы</span></a></li>
<li class="toclevel-1 tocsection-5"><a href="#.D0.9E.D0.B1.D1.81.D0.BB.D1.83.D0.B6.D0.B8.D0.B2.D0.B0.D0.B5.D0.BC.D1.8B.D0.B5_.D1.80.D0.B0.D0.B9.D0.BE.D0.BD.D1.8B"><span class="tocnumber">5</span> <span class="toctext">Обслуживаемые районы</span></a></li>
<li class="toclevel-1 tocsection-6"><a href="#.D0.A1.D0.BC._.D1.82.D0.B0.D0.BA.D0.B6.D0.B5"><span class="tocnumber">6</span> <span class="toctext">См. также</span></a></li>
</ul>
</td></tr></table>
<h2> <span class="mw-headline" id=".7FUNIQ45804c9858a0e381-h-0--QINU.7F_.D0.9F.D0.BE.D0.BB.D0.BD.D0.BE.D0.B5_.D0.BD.D0.B0.D0.B8.D0.BC.D0.B5.D0.BD.D0.BE.D0.B2.D0.B0.D0.BD.D0.B8.D0.B5">UNIQ45804c9858a0e381-h-0--QINU Полное наименование </span></h2>
<p>Государственное учреждение здравоохранения города Москвы Психоневрологический диспансер №7 Управления здравоохранения Северо-Восточного административного округа.
</p><p><br />
</p>
<h2> <span class="mw-headline" id=".7FUNIQ45804c9858a0e381-h-1--QINU.7F_.D0.90.D0.B4.D1.80.D0.B5.D1.81">UNIQ45804c9858a0e381-h-1--QINU Адрес </span></h2>
<div class="thumb tright" style="width:400px; height:auto;">
<div class="thumbinner" style="width:400px; height:auto;">
<div class="jqgmap_canvas" data-jqgmap="&#123;&quot;lat&quot;:&quot;55.8207542703&quot;,&quot;lng&quot;:&quot;37.6253418334&quot;,&quot;zoom&quot;:14,&quot;markers&quot;:&#91;&#123;&quot;lat&quot;:55.8207542703,&quot;lng&quot;:37.6253418334,&quot;content&quot;:&quot;&lt;p&gt;\u041f\u041d\u0414 7\n&lt;\/p&gt;&quot;}]}" id="jqgmap_canvas0" style="width:400px; height:300px;"></div>
<div class="thumbcaption"></div>
</div>
</div>
<p>129515, г. Москвы, ул. Академика Королева, д.9, корпус 1
</p><p>(вход со двора, в углу дома - отмечен меткой)
</p><p><br />
</p><p>Сайт: <a rel="nofollow" class="external free" href="http://www.mosgorzdrav.ru/pnd7">http://www.mosgorzdrav.ru/pnd7</a>
</p>
<h2><mw:editsection page="Testilka7" section="3">Телефоны</mw:editsection> <span class="mw-headline" id=".D0.A2.D0.B5.D0.BB.D0.B5.D1.84.D0.BE.D0.BD.D1.8B"> Телефоны </span></h2>
<p>Справочная служба (495) 602-1332
</p><p>Регистратура, тел/факс: (495) 602-13-32
</p><p>Главная медсестра: (495) 683-46-19
</p><p>Емайл: pnd7<a href="/index.php?title=%D0%A8%D0%B0%D0%B1%D0%BB%D0%BE%D0%BD:@&amp;action=edit&amp;redlink=1" class="new" title="Шаблон:@ (страница не существует)">Шаблон:@mosgorzdrav</a>.ru
</p><p><br />
</p>
<h2><mw:editsection page="Testilka7" section="4">График работы</mw:editsection> <span class="mw-headline" id=".D0.93.D1.80.D0.B0.D1.84.D0.B8.D0.BA_.D1.80.D0.B0.D0.B1.D0.BE.D1.82.D1.8B"> График работы </span></h2>
<ul><li> в будние дни с 8-00 до 20-00,
</li></ul>
<ul><li> в субботу с 9-00 до 16-00.
</li></ul>
<p><br />
</p>
<h2><mw:editsection page="Testilka7" section="5">Обслуживаемые районы</mw:editsection> <span class="mw-headline" id=".D0.9E.D0.B1.D1.81.D0.BB.D1.83.D0.B6.D0.B8.D0.B2.D0.B0.D0.B5.D0.BC.D1.8B.D0.B5_.D1.80.D0.B0.D0.B9.D0.BE.D0.BD.D1.8B"> Обслуживаемые районы </span></h2>
<p><a href="/index.php?title=%D0%90%D0%BB%D0%B5%D0%BA%D1%81%D0%B5%D0%B5%D0%B2%D1%81%D0%BA%D0%B8%D0%B9&amp;action=edit&amp;redlink=1" class="new" title="Алексеевский (страница не существует)">Алексеевский</a>, <a href="/index.php?title=%D0%91%D1%83%D1%82%D1%8B%D1%80%D1%81%D0%BA%D0%B8%D0%B9&amp;action=edit&amp;redlink=1" class="new" title="Бутырский (страница не существует)">Бутырский</a>, <a href="/index.php?title=%D0%9C%D0%B0%D1%80%D1%8C%D0%B8%D0%BD%D0%B0_%D0%A0%D0%BE%D1%89%D0%B0&amp;action=edit&amp;redlink=1" class="new" title="Марьина Роща (страница не существует)">Марьина Роща</a>, <a href="/index.php?title=%D0%9C%D0%B0%D1%80%D1%84%D0%B8%D0%BD%D0%BE&amp;action=edit&amp;redlink=1" class="new" title="Марфино (страница не существует)">Марфино</a>, <a href="/index.php?title=%D0%9E%D1%81%D1%82%D0%B0%D0%BD%D0%BA%D0%B8%D0%BD%D1%81%D0%BA%D0%B8%D0%B9&amp;action=edit&amp;redlink=1" class="new" title="Останкинский (страница не существует)">Останкинский</a>
</p><p><br />
</p>
<h2><mw:editsection page="Testilka7" section="6">См. также</mw:editsection> <span class="mw-headline" id=".D0.A1.D0.BC._.D1.82.D0.B0.D0.BA.D0.B6.D0.B5"> См. также </span></h2>
<p><a href="/index.php?title=%D0%9A%D0%B0%D0%BA_%D0%BF%D0%BE%D0%BB%D1%83%D1%87%D0%B8%D1%82%D1%8C_%D1%81%D0%BF%D1%80%D0%B0%D0%B2%D0%BA%D1%83_%D0%B8%D0%B7_%D0%9F%D0%9D%D0%94&amp;action=edit&amp;redlink=1" class="new" title="Как получить справку из ПНД (страница не существует)">Как получить справку из ПНД</a>
</p>
<!-- Saved in parser cache with key phase3-wiki_:pcache:idhash:10844-0!*!0!!ru!*!* and timestamp 20120421065844 -->
"

["mLanguageLinks"]=>
array(0) {
}
["mCategories"]=>
array(1) {
  ["WM:Психоневрологический_диспансер"]=>
  string(0) ""
}
["mTitleText"]=>
string(9) "Testilka7"
["mLinks"]=>
array(2) {
  [0]=>
  array(6) {
    ["Алексеевский"]=>
    int(0)
    ["Бутырский"]=>
    int(0)
    ["Марьина_Роща"]=>
    int(0)
    ["Марфино"]=>
    int(0)
    ["Останкинский"]=>
    int(0)
    ["Как_получить_справку_из_ПНД"]=>
    int(0)
  }
  [10]=>
  array(1) {
    ["@"]=>
    int(0)
  }
}
["mTemplates"]=>
array(1) {
  [10]=>
  array(1) {
    ["@"]=>
    int(0)
  }
}
["mTemplateIds"]=>
array(1) {
  [10]=>
  array(1) {
    ["@"]=>
    int(0)
  }
}
["mImages"]=>
array(0) {
}
["mFileSearchOptions"]=>
array(0) {
}
["mExternalLinks"]=>
array(1) {
  ["http://www.mosgorzdrav.ru/pnd7"]=>
  int(1)
}
["mInterwikiLinks"]=>
array(0) {
}
["mNewSection"]=>
bool(false)
["mHideNewSection"]=>
bool(false)
["mNoGallery"]=>
bool(false)
["mHeadItems"]=>
array(0) {
}
["mModules"]=>
array(0) {
}
["mModuleScripts"]=>
array(0) {
}
["mModuleStyles"]=>
array(0) {
}
["mModuleMessages"]=>
array(0) {
}
["mOutputHooks"]=>
array(0) {
}
["mWarnings"]=>
array(0) {
}
["mSections"]=>
array(6) {
  [0]=>
  array(8) {
    ["toclevel"]=>
    int(1)
    ["level"]=>
    string(1) "2"
    ["line"]=>
    string(70) "UNIQ45804c9858a0e381-h-0--QINU Полное наименование"
    ["number"]=>
    string(1) "1"
    ["index"]=>
    string(0) ""
    ["fromtitle"]=>
    bool(false)
    ["byteoffset"]=>
    int(1015)
    ["anchor"]=>
    string(146) ".7FUNIQ45804c9858a0e381-h-0--QINU.7F_.D0.9F.D0.BE.D0.BB.D0.BD.D0.BE.D0.B5_.D0.BD.D0.B0.D0.B8.D0.BC.D0.B5.D0.BD.D0.BE.D0.B2.D0.B0.D0.BD.D0.B8.D0.B5"
  }
  [1]=>
  array(8) {
    ["toclevel"]=>
    int(1)
    ["level"]=>
    string(1) "2"
    ["line"]=>
    string(43) "UNIQ45804c9858a0e381-h-1--QINU Адрес"
    ["number"]=>
    string(1) "2"
    ["index"]=>
    string(0) ""
    ["fromtitle"]=>
    bool(false)
    ["byteoffset"]=>
    int(1015)
    ["anchor"]=>
    string(67) ".7FUNIQ45804c9858a0e381-h-1--QINU.7F_.D0.90.D0.B4.D1.80.D0.B5.D1.81"
  }
  [2]=>
  array(8) {
    ["toclevel"]=>
    int(1)
    ["level"]=>
    string(1) "2"
    ["line"]=>
    string(16) "Телефоны"
    ["number"]=>
    string(1) "3"
    ["index"]=>
    string(1) "3"
    ["fromtitle"]=>
    string(9) "Testilka7"
    ["byteoffset"]=>
    int(1015)
    ["anchor"]=>
    string(48) ".D0.A2.D0.B5.D0.BB.D0.B5.D1.84.D0.BE.D0.BD.D1.8B"
  }
  [3]=>
  array(8) {
    ["toclevel"]=>
    int(1)
    ["level"]=>
    string(1) "2"
    ["line"]=>
    string(25) "График работы"
    ["number"]=>
    string(1) "4"
    ["index"]=>
    string(1) "4"
    ["fromtitle"]=>
    string(9) "Testilka7"
    ["byteoffset"]=>
    int(1015)
    ["anchor"]=>
    string(73) ".D0.93.D1.80.D0.B0.D1.84.D0.B8.D0.BA_.D1.80.D0.B0.D0.B1.D0.BE.D1.82.D1.8B"
  }
  [4]=>
  array(8) {
    ["toclevel"]=>
    int(1)
    ["level"]=>
    string(1) "2"
    ["line"]=>
    string(39) "Обслуживаемые районы"
    ["number"]=>
    string(1) "5"
    ["index"]=>
    string(1) "5"
    ["fromtitle"]=>
    string(9) "Testilka7"
    ["byteoffset"]=>
    int(1015)
    ["anchor"]=>
    string(115) ".D0.9E.D0.B1.D1.81.D0.BB.D1.83.D0.B6.D0.B8.D0.B2.D0.B0.D0.B5.D0.BC.D1.8B.D0.B5_.D1.80.D0.B0.D0.B9.D0.BE.D0.BD.D1.8B"
  }
  [5]=>
  array(8) {
    ["toclevel"]=>
    int(1)
    ["level"]=>
    string(1) "2"
    ["line"]=>
    string(16) "См. также"
    ["number"]=>
    string(1) "6"
    ["index"]=>
    string(1) "6"
    ["fromtitle"]=>
    string(9) "Testilka7"
    ["byteoffset"]=>
    int(1015)
    ["anchor"]=>
    string(44) ".D0.A1.D0.BC._.D1.82.D0.B0.D0.BA.D0.B6.D0.B5"
  }
}
["mEditSectionTokens"]=>
bool(true)
["mProperties"]=>
array(0) {
}
["mTOCHTML"]=>
string(1523) "<table id="toc" class="toc"><tr><td><div id="toctitle"><h2>Содержание</h2></div>

<ul>
<li class="toclevel-1"><a href="#.7FUNIQ45804c9858a0e381-h-0--QINU.7F_.D0.9F.D0.BE.D0.BB.D0.BD.D0.BE.D0.B5_.D0.BD.D0.B0.D0.B8.D0.BC.D0.B5.D0.BD.D0.BE.D0.B2.D0.B0.D0.BD.D0.B8.D0.B5"><span class="tocnumber">1</span> <span class="toctext">UNIQ45804c9858a0e381-h-0--QINU Полное наименование</span></a></li>

<li class="toclevel-1"><a href="#.7FUNIQ45804c9858a0e381-h-1--QINU.7F_.D0.90.D0.B4.D1.80.D0.B5.D1.81"><span class="tocnumber">2</span> <span class="toctext">UNIQ45804c9858a0e381-h-1--QINU Адрес</span></a></li>

<li class="toclevel-1 tocsection-3"><a href="#.D0.A2.D0.B5.D0.BB.D0.B5.D1.84.D0.BE.D0.BD.D1.8B"><span class="tocnumber">3</span> <span class="toctext">Телефоны</span></a></li>

<li class="toclevel-1 tocsection-4"><a href="#.D0.93.D1.80.D0.B0.D1.84.D0.B8.D0.BA_.D1.80.D0.B0.D0.B1.D0.BE.D1.82.D1.8B"><span class="tocnumber">4</span> <span class="toctext">График работы</span></a></li>

<li class="toclevel-1 tocsection-5"><a href="#.D0.9E.D0.B1.D1.81.D0.BB.D1.83.D0.B6.D0.B8.D0.B2.D0.B0.D0.B5.D0.BC.D1.8B.D0.B5_.D1.80.D0.B0.D0.B9.D0.BE.D0.BD.D1.8B"><span class="tocnumber">5</span> <span class="toctext">Обслуживаемые районы</span></a></li>

<li class="toclevel-1 tocsection-6"><a href="#.D0.A1.D0.BC._.D1.82.D0.B0.D0.BA.D0.B6.D0.B5"><span class="tocnumber">6</span> <span class="toctext">См. также</span></a></li>
</ul>
</td></tr></table>
"

["mTimestamp"]=>
string(14) "20120418164637"
["mIndexPolicy":"ParserOutput":private]=>
string(0) ""
["mAccessedOptions":"ParserOutput":private]=>
array(3) {
  ["numberheadings"]=>
  bool(true)
  ["userlang"]=>
  bool(true)
  ["stubthreshold"]=>
  bool(true)
}
["mVersion"]=>
string(5) "1.6.4"
["mCacheTime"]=>
string(14) "20120421065844"
["mCacheExpiry"]=>
NULL
["mContainsOldMagic"]=>
bool(false)

}

That is the source wikitext:

Психоневрологический диспансер № 7 СВАО г. Москвы

Полное наименование

Государственное учреждение здравоохранения города Москвы Психоневрологический диспансер №7 Управления здравоохранения Северо-Восточного административного округа.

Адрес

<jqgmap zoom="14" lat="55.8207542703" lng="37.6253418334" align="right" width="400px" height="300px" caption="" >
55.8207542703, 37.6253418334 ПНД 7
</jqgmap>

129515, г. Москвы, ул. Академика Королева, д.9, корпус 1

(вход со двора, в углу дома - отмечен меткой)

Сайт: http://www.mosgorzdrav.ru/pnd7

Телефоны

Справочная служба (495) 602-1332

Регистратура, тел/факс: (495) 602-13-32

Главная медсестра: (495) 683-46-19

Емайл: pnd7{{@}}mosgorzdrav.ru

График работы

  • в будние дни с 8-00 до 20-00,
  • в субботу с 9-00 до 16-00.

Обслуживаемые районы

[[Алексеевский]], [[Бутырский]], [[Марьина Роща]], [[Марфино]], [[Останкинский]]

См. также

[[Как получить справку из ПНД]]

[[Категория:WM:Психоневрологический диспансер]]

I am about to switch to parser function due of that. The only problem that worries me that I'll have to track pipe character in function parameter values, whether these are template parameters or part of the text. I have editor which generates source text of tag, which will be converted to source text of parser function. I guess when the user inputs pipe character into the field, I'd have to scan the text and replace "non-templated" pipes to &#124; entity in generated source text, right?

I have another question. What's the best way to detect that my tag / function hook actually was used at the page? So I will not have to load my startup module for pages which do not use my fn/tag hook. Now I use the following code:

  1. check, whether current output has embedded jqgmap data
  2. it is not bullet-proof but should be enough for the most of tasks
		if ( strpos( $parserOutput->getText(), '<div class="jqgmap_canvas"' ) !== false ) {
			# There are references on the page, load the extension's startup module.
			$out->addModules( 'ext.jqgmap' );
		}

Originally it was called only via 'BeforePageDisplay' hook. However, I found that when page hits local cache, that hook is not fired. So, now I am also hooked to 'OutputPageCheckLastModified'. Used to work in 1.18 / 1.19, however in 1.20 I do not have my startup module being executed.

Sorry for long post, I am just having hard time with both Parser and ResourceLoader. It was much, much simplier back with 1.15.


Version: 1.18.x
Severity: normal

Details

Reference
bz36140

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:26 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz36140.
bzimport added a subscriber: Unknown Object (MLST).

questpc wrote:

Surprisingly, when I switched to parser function, UNIQ markers in parser output did not disappear. After trying different parser function outputs, I remembered that I selectively parse parts of user input as wikitext. Originally I used an instance of Parser and PPFrame supplied to function hook, stored as static members of my class:

static function parseWikiText( $text ) {

		return self::$parser->recursiveTagParse( $text, self::$frame );

}

however, while italics were parsed fine, wikilinks were removed. It probably was too early stage of parsing to produce html anchors. So, I switched to something like this, which used to work in 1.15 I think:

static function parse( $text ) {

		return self::$parser->parse( $text, self::$parser->getTitle(), new ParserOptions() )->getText();

}

It worked fine at the first sight: links were processed to anchors. However tests with section headers revealed non-replaced UNIQ markers.

When replaced to "stub" method, UNIQ markers disappeared:

static function parse( $text ) {

		return $text;

}

Call to Parser::parse at the stage of functions processing incorrectly changed the state of Parser, so UNIQ markers were produced.

I grepped through the whole svn base of MediaWiki extensions and found the solution in Extension:Semantic Maps:

$parser = version_compare( $GLOBALS['wgVersion'], '1.18', '<' ) ? $wgParser : clone $wgParser;

It was not necessary to clone Parser, however since 1.18 it is!

With cloned Parser, UNIQ markers did not appear (because the Parser is separate instance) and wiki links [[...]] also were processed correctly (and stored in html5 data attribute). I cloned the instance supplied to 'ParserFirstCallInit' hook handler, not the $wgParser; but they are probably the same. Also I cloned unconditionally, because this extension I am developing is for 1.18+.

If there is better way of processing wikitext with wikilinks to output html in parser function, please let me know. Because cloning of Parser is probably not the best way (Parser is large enough).

Why didn't I keep function input parts as wikitext? Because I need them encoded as html5 data attribute to be displayed by client side code at later stage.