Page MenuHomePhabricator

Escaped HTML underneath page title in wikis with the GeoCrumbs extension enabled
Closed, ResolvedPublicBUG REPORT

Assigned To
Authored By
RolandUnger
Aug 24 2022, 9:11 AM
Referenced Files
F35488281: image.png
Aug 24 2022, 6:51 PM
F35488274: image.png
Aug 24 2022, 6:51 PM
F35488252: image.png
Aug 24 2022, 6:51 PM
F35488257: image.png
Aug 24 2022, 6:51 PM
F35488263: image.png
Aug 24 2022, 6:51 PM
F35488248: image.png
Aug 24 2022, 6:51 PM
F35488246: image.png
Aug 24 2022, 6:51 PM
F35488244: image.png
Aug 24 2022, 6:51 PM
Tokens
"Haypence" token, awarded by ppelberg.

Description

Steps to replicate the issue (include links if applicable):

What happens?:

  • Since a few days, in some links of the breadcrumb trail <span>-Tags are visible that should not happen. By any process, maybe tidying (HTML5 parser), HTML-tag angle brackets and quots are converted to SGML entities like:
<a href="/wiki/%C3%84gypten" title="Ägypten">&lt;span class=&quot;mw-page-title-main&quot;&gt;Ägypten&lt;/span&gt;</a>

What should have happened instead?:

  • HTML tags should not be converted to SGML entities

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
(Copied from T316108)

Screenshot from 2022-08-24 14-09-01.png (142×695 px, 21 KB)

This must be related to the change introduced by T306440 (hence subscribing @matmarex and @ppelberg). However, I think the blame lies with the GeoCrumbs extenison's handling of page titles (it takes them from $parserOutput), so it should probably be fixed there.

Event Timeline

jhsoby renamed this task from Visible HTML tags in breadcrumb trails (Wikivoyage) to Escaped HTML underneath page title in wikis with the GeoCrumbs extension enabled.Aug 24 2022, 1:20 PM
jhsoby updated the task description. (Show Details)

GeoCrumbs uses $parserOutput->getTitleText(); to get the link texts it renders. So perhaps the problem is that the changes in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/821353/ led to the HTML tags being included in what $parserOutput->getTitleText(); outputs?

On mwdebug1001 I have switched enwikivoyage to 1.39.0-wmf.25, confirmed the switch via Special:Version then purged https://en.wikivoyage.org/wiki/Cairo/Downtown . It was still showing escaped content in the bread crumb. But maybe a purge is not sufficient to regenerate it. Could it be an issue that occurred before we updated MediaWiki?

I it is back to wmf.26 on mwdebug1001

@hashar I think it was introduced today, but the way GeoCrumbs works with RDF and stuff is a bit mysterious to me. When I filed the (now-closed as duplicate of this) bug, I checked various pages in enwikivoyage via Special:Random, and didn't see anything off except on mainspace pages that were subpages, but after a while I saw it on all pages. The description for this bug filed by @RolandUnger says "Since a few days" though. I first noticed the problem on the Incubator a few minutes before I filed my bug – I had loaded some pages earlier in the day that did not have the bug (I would have noticed).

Mentioned in SAL (#wikimedia-operations) [2022-08-24T13:49:56Z] <hashar@deploy1002> rebuilt and synchronized wikiversions files: Revert "Group 1 wikis to 1.39.0-wmf.26" # T316085 T314187

I have rolled back the wikis from 1.39.0-wmf.26 to 1.39.0-wmf.25 to be sure. I have purged the two example pages using:

They still show the escaped HTML though.

@hashar I did null edits on [[Cairo]] and [[Cairo/Downtown]] and a few purges, and now the link to Cairo from Cairo/Downtown does not have the HTML tags anymore (on mwdebug1001), but the other links have it. So it seems it's cascading…

Ok, thank for your tests and thanks for the detailed bug reports. I have raised awareness about this task and it is a blocker to the MediaWiki train.

Cairo/Downtown showed: "Africa > North Africa > Egypt > <span class="mw-page-title-main">Lower Egypt</span> > Cairo > Cairo/Downtown"

after a null edit on Lower_Egypt, it correctly shows "Africa > North Africa > Egypt > Lower Egypt > Cairo > Cairo/Downtown"

ParserOutputs are cached, so it makes sense that a null edit on "Lower Egypt" fixes the display issue on "Cairo/Downtown"

GeoCrumbs are cascading so it makes sense to make a null edit on pages with escaped span tags. Using the parser function isin the precursor of an article is specified, and so on. There is only one exception: if the page title consists of both a basepage name and a subpage name, then the precursor is taken from the basepage name if isin is not specified.

Unfortunately, I do not know where the wrapping span tag comes from and why it is added only in a few cases. In fact, $parserOutput->getTitleText(); should not return this tag.

GeoCrumbs calls:

$linkText = $parserCache->getTitleText(); 
$linkTarget = Title::newFromText( $linkText );

Note that $parserCache is actually a ParserOutput object and ParserOutput::getTitleText() returns $mTitleText which is documented to be Title text of the chosen language variant, as HTML.

So, unless I am misreading the code badly, the call to Title::newFromText on the link text seems broken to me (you are trying to convert a HTML string to a title).

This will need a fix to the GeoCrumbs extension.

This broken usage came from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GeoCrumbs/+/103656 - the intent seems to have been to render language-converted titles.

Change 826348 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/extensions/GeoCrumbs@master] WIP: Convert page title to variant properly

https://gerrit.wikimedia.org/r/826348

Someone needs to test this -- I am not familiar with the usage and the expectation of what needs to happen wrt language conversion here. Hence WIP. The extension has no tests either.

This broken usage came from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GeoCrumbs/+/103656 - the intent seems to have been to render language-converted titles.

Looking at this commit and the associated task T59925, and updating the testing instructions there to today's reality: the expectation is that the breadcrumbs in top-left corner on https://zh.wikivoyage.org/zh-hans/中国 and https://zh.wikivoyage.org/zh-hant/中国 must be displayed in the correct language variant. I don't speak the language, but I assume that the current rendering is correct:

If the patch produces the same results, I think we're good.

The extension is easy enough to set up locally, and you can use https://www.mediawiki.org/wiki/Manual:$wgUsePigLatinVariant to test language variants in English.

@ssastry's patch works for me when testing with that:

Before the bug (reverting f7158c39)BugFix
en
image.png (2×3 px, 501 KB)
image.png (2×3 px, 476 KB)
image.png (2×3 px, 473 KB)
en-x-piglatin
image.png (2×3 px, 475 KB)
image.png (2×3 px, 500 KB)
image.png (2×3 px, 516 KB)

One thing that worked before, and now does not work, is displaying explicitly defined variant titles in breadcrumbs.

If I edit "Geocrumbs/Subpage" and add -{T|Geocrumbs/Subpage variant title}-:

Before the bugFix
image.png (2×3 px, 511 KB)
image.png (2×3 px, 475 KB)

However that seems to be basically unused on zh.voy: https://zh.wikivoyage.org/w/index.php?search=insource%3A%2F%5C-%5C%7BT%2F&title=Special%3A搜索&go=前往&ns0=1 and no other Wikivoyage language has variants: https://meta.wikimedia.org/wiki/Wikivoyage#Statistics so it's probably okay.

Change 826348 merged by jenkins-bot:

[mediawiki/extensions/GeoCrumbs@master] Convert page title to variant properly

https://gerrit.wikimedia.org/r/826348

Change 826330 had a related patch set uploaded (by Ladsgroup; author: Subramanya Sastry):

[mediawiki/extensions/GeoCrumbs@wmf/1.39.0-wmf.26] Convert page title to variant properly

https://gerrit.wikimedia.org/r/826330

Change 826330 merged by jenkins-bot:

[mediawiki/extensions/GeoCrumbs@wmf/1.39.0-wmf.26] Convert page title to variant properly

https://gerrit.wikimedia.org/r/826330

Mentioned in SAL (#wikimedia-operations) [2022-08-24T19:23:02Z] <ladsgroup@deploy1002> Synchronized php-1.39.0-wmf.26/extensions/GeoCrumbs/includes/Hooks.php: Backport: [[gerrit:826330|Convert page title to variant properly (T316085)]] (duration: 02m 50s)

Ladsgroup assigned this task to matmarex.

It seems fixed, I close this.

@matmarex, your report at T316085#8182741 is well detailed and can probably selected for the bug review award of the year.

Thank you everyone!

Change 826507 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] Revert "group1 wikis to 1.39.0-wmf.26"

https://gerrit.wikimedia.org/r/826507

Change 826507 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "group1 wikis to 1.39.0-wmf.26"

https://gerrit.wikimedia.org/r/826507