Page MenuHomePhabricator

Empty elements should be removed when producing the extract
Open, Needs TriagePublicBUG REPORT

Description

Use case: on the French wiki we have a template that produces an element, with a specific ID for targetting, and using JavaScript (in the MediaWiki:Common.js) we detach this element and append it to the #firstHeading (which is outside #mw-content-text), to create a pseudo sub-title displayed below the page title.

Initially the template content was: <span id="specific_id">Subtitle text</span>

And it is expected to be placed at the beginning of the article wikitext:

{{Sous-titre}} <-- here

First paragraph

So the resulting HTML is:

<p><span id="specific_id">Subtitle text</span></p> <!-- JavaScript will remove the <span>, resulting in a empty <p> that is invisible -->
<p>First paragraph</p>

But an issue appeared later: the content "Subtitle text" got picked for the previews generated by the Popups extension.

So, as a first step I added the noexcerpt class (documented here and also here) to the template: <span id="specific_id" class="noexcerpt">Subtitle text</span>.

Thus, as expected the Popups extension removed the entire <span> element (implementation here and here), but another issue arose. As after removing the <span>, we still had the autogenerated <p>, the HTML for the extract was:

<p></p>
<p>First paragraph</p>

… and because it's the first <p> that is taken for generating the popup preview, we ended up with a blank result.

So, in the template I replaced the <span> with a <p>: <p id="specific_id" class="noexcerpt">Subtitle text</p>. Thus the page HTML is:

<p id="specific_id" class="noexcerpt">Subtitle text</p> <!-- explicit <p>, instead of being autogenerated -->
<p>First paragraph</p>

… so that noexcerpt removes the whole <p>, leaving only <p>First paragraph</p> which is picked for the popup preview. Finally!


Proposal: Although the original issue is fixed locally, this story made me think of an improvement that could be made in the TextExtracts extension (which is used by the Popups extension):

After the noexcerpt elements are removed (as a reminder, implementation here and here), we could run a pass of removing the empty <p> elements. This would remove elements like the one encountered in the above use case.

Note this is already done in mediawiki-services-mobileapps, see the code in summarize.js:

// ...
rmElements(doc.body, '.mw-ref,.reference,.noexcerpt,.nomobile,.noprint,.sortkey');
for (let i = 8, runAgain = true; i > 0 && runAgain; i--) {
	runAgain = rmElements(doc.body, 'span:empty,b:empty,i:empty,p:empty');
}
// ...

As an aside note, the PhpTags Wiki extension is also generating such extracts making use of the TextExtracts extension and the ExtractsRemoveClasses config item: see Extractor.php.