Page MenuHomePhabricator

<mw:editsection> markup visible on a specific page on Wikisource when using Parsoid
Open, Needs TriagePublic

Description

On this page: https://en.wikisource.org/w/index.php?title=A_General_History_of_the_Pyrates/Chapter_1&useparsoid=1

The following fragment of markup appears:

<section data-mw-section-id="-1" id="mwBA"><h2 data-mw-anchor="CHAP._I.ofCaptain_Avery,And_his_Crew." data-mw-fallback-anchor="CHAP._I.ofCaptain_Avery.2CAnd_his_Crew." id="CHAP._I.ofCaptain_Avery,And_his_Crew.CHAP._I.ofCaptain_Avery,And_his_Crew."><span id="CHAP._I.ofCaptain_Avery.2CAnd_his_Crew.CHAP._I.ofCaptain_Avery.2CAnd_his_Crew." typeof="mw:FallbackId"></span><span style="font-size:144%;" id="mwKA"><span style="letter-spacing:0.15em;" id="mwKQ">CHA</span>P. I.</span><br id="mwKg"/><span style="text-transform: uppercase;" id="mwKw"><span style="letter-spacing:0.15em;" id="mwLA">of</span></span><br id="mwLQ"/><span style="font-size:207%;" id="mwLg">Captain <i id="mwLw"><span style="text-transform: uppercase;" id="mwMA"><span style="letter-spacing:0.15em;" id="mwMQ">Aver</span>y</span></i>,</span><br id="mwMg"/><span style="font-size:144%;" id="mwMw">And his <span class="smallcaps" style="font-variant:small-caps;" id="mwNA">Crew</span>.</span><mw:editsection page="Page:A_general_history_of_the_pyrates,_from_their_first_rise_and_settlement_in_the_Island_of_Providence,_to_the_present_time_(1724).djvu/53" section="T-1" id="mwNQ">CHAP. I.ofCaptain Avery,And his Crew.</mw:editsection></h2>

I don't really understand what's happening. That looks like a mix of Parsoid markup and old parser markup, and it should not appear in the output (the data-mw-anchor attributes and mw:editsection tags should be replaced by the output pipeline).

Event Timeline

Quickly, I imagine what's going on here is:

  • The content of the page is an extension tag that emits a heading <pages ... /> (See "MediaWiki:Proofreadpage header template")
  • Parsoid doesn't have a native implementation of the extension and so it asks the legacy parser to parse the tag to html
  • The DataAccess::parseWikitext implementation invokes the parser directly so the OutputTransform pipeline doesn't get run on the legacy parser output, leaving behind the mw:editsection tags https://github.com/wikimedia/mediawiki/blob/master/includes/parser/Parsoid/Config/DataAccess.php#L357-L373
  • When the pipeline is run on the combined Parsoid output, isParsoidContent is set so HandleSectionLinks gets skipped.