Page MenuHomePhabricator

No reference popup shown when ref name contains certain HTML entities
Open, Needs TriagePublicBUG REPORT

Description

The label (e.g. "[9]") of a <ref> tag normally (on en.wiki anyway) if clicked will scroll to the reference text, and if hovered over will either pop up a box with the reference text (if the reference is offscreen) or highlight the reference text (if the reference is onscreen). However if the <ref> tag has a name parameter with any of various character sequences, such as %20, then none of these responses will happen. This is a problem for a reader wanting to see the reference information.

Known problem sequences

Some of these are more frequent than you might expect. There likely are other character sequences that fail in the same way. For example, the %c8%98 represents a Latin letter with diacritical marks, and there are lots of such letters. The basic problem seems to be that the html generated uses different identifiers for the link to the reference and the target. Apparently the name value is used to build the identifier, but the identifier is built in two different ways, one for the link and one for the target. Seems like a sure recipe for bugs.

Note that if one has accidentally used such a name value for a ref tag, it will not be apparent that there is a problem. No one goes around hovering over every reference label in an article. Also, some automated or "semi-automated" edits may change a ref name in a way that starts the ref label failing in this way. All the more reason the software should generate the html identifier in one place and not two so that mismatches won't occur.

List of steps to reproduce:

  • Add lines like these to a page:
The mouse.<ref name="A&nbsp;- Z"> Animals A to Z </ref>
== References ==
* <references />
  • Save the page
  • Hover the cursor over the reference label which follows "The mouse.".

What happens?:

  • Nothing

What should have happened instead?:

  • The reference line with "Animals A to Z" should be highlighted (or a popup with that text shown).

Event Timeline

Aklapper removed a subscriber: Cite.

Cite itself shows no popups. On English Wikipedia that is the "Reference Tooltips" gadget; see https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-gadgets . The problem also happens with the "Reference Previews" extension from https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-betafeatures .
The problem does not happen when using "Navigation popups" on https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-gadgets

Aklapper renamed this task from Dead ref label (no link, popup or highlight) depending on ref name to No reference popup shown when ref name contains certain HTML entities.Dec 25 2021, 11:40 AM

"Navigation popups" has problems with this also, but in its own way. When hovering over the ref label (in the %2E case, at least), it does give a popup, but the popup does not contain the normal contents it is supposed to deliver, namely the text displayed in the reference note. (Instead it just shows the plain popup for the page, which is the text of the intro.) You can see this by hovering over the %2E example label [8], contrasted with hovering over [7], which does not have problem characters.

thiemowmde added a subscriber: thiemowmde.

TL;DR: This bug is caused by a change in core, but needs to be fixed in the Cite codebase. It affects the Reference Previews feature of the Popups extension and all other kinds of reference tooltips, but can't be fixed in these codebases.

Here is a minimized version of the examples from the task description:

<ref name="nature%20phylo">A</ref>
<ref name="Mininova%2E26%2E11%2E2009">B</ref>
<ref name="%c8%98tiri_2019">C</ref>
<ref name="play%21">D</ref>
<ref name="Terry+O%26rsquo%3BN…</ref">E</ref>
<ref name="The Bill moves to 9&nbsp;pm">F</ref>
<references />

Cite generates bogus HTML for these examples. For example, href="#cite_note-nature_phylo-1" and id="cite_note-nature%20phylo-1" don't match,

Cite is hardly to blame. There is only a single method that generates this id. It's used in two places. Both are wikitext, one <li id="…"> and one [[#…]]. Both wikitext snippets are parsed via Parser::recursiveTagParse.

The problem is that the parser behaves very different depending on the $wgFragmentMode, see https://www.mediawiki.org/wiki/Manual:$wgFragmentMode. The fragment in [[#…]] is url-decoded (this happens in Parser::handleInternalLinks2), but the id in <li id="…"> is not. The $wgFragmentMode default changed with MediaWiki 1.37, just a few months ago in September 2021, see https://gerrit.wikimedia.org/r/720427, tracked via T186267.

Note this issue is not related to anything the WMDE-TechWish team did, even if it affects one of our projects. Anybody should feel free to pick this up. Just let us know.

Warning: Simply url-decoding the input doesn't work when it contains something like %2522 that appears double-encoded. Decoding this to %22 to early would still trigger the same issue.

Change 751119 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Add parser & unit test cases for different $wgFragmentMode's

https://gerrit.wikimedia.org/r/751119

Change 751131 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] [POC] Use different id/fragment encoding

https://gerrit.wikimedia.org/r/751131

Change 751149 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Use correct Sanitizer method for id/fragment escaping

https://gerrit.wikimedia.org/r/751149

Change 751119 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Add parser & unit test cases for different $wgFragmentMode's

https://gerrit.wikimedia.org/r/751119

Change 751149 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Use correct Sanitizer method for id/fragment escaping

https://gerrit.wikimedia.org/r/751149