Page MenuHomePhabricator

Auto triming of internal links is breaking anchors if the last character is a space
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

<span data-mw-comment-start="" id="c-Lofhi-20240125225100-ChezLesDevs_:_nouvelles_fonctionnalités_pour_aider_au_contrôle_de_la_qualité_"></span>
  • Observe that the identifier used in the HTML anchor for the message initiating the discussion is:
<a href="https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:%C3%89chos_de_la_fondation#c-Lofhi-20240125225100-ChezLesDevs_:_nouvelles_fonctionnalités_pour_aider_au_contrôle_de_la_qualité_" class="ext-discussiontools-init-timestamplink">25 janvier 2024 à 23:51 (CET)</a>
  • Observe the trailing whitespaces encoded as _;
  • Try to use this anchor to autoscroll on the discussion;

What happens?:
The bug arises from the title of a discussion topic, which includes a trailing space (_) in its identifier. The generated identifier for the discussion includes this trailing space, such as c-Lofhi-20240125225100-ChezLesDevs_:_nouvelles_fonctionnalités_pour_aider_au_contrôle_de_la_qualité_. Depending of the POV, you could say that the bug stems from MediaWiki truncating the identifier because it's too long, which leads choosing an identifier that ends with a space.

But it seems less simple than it looks: you could think about a ending space cleaned from the URL since it is an unsafe character... but it seems more like a JavaScript/Parsoid trimming implementation problem. My request autosent by DiscussionTools to Parsoid after using the "add a link" button for previewing in visual mode is:

action	"discussiontoolspreview"
format	"json"
formatversion	"2"
uselang	"fr"
type	"topic"
page	"Wikipédia:Le_Bistro/31_janvier_2024"
wikitext	"[[Wikipédia:Échos+de+la+fondation#c-Lofhi-20240125225100-ChezLesDevs+:+nouvelles+fonctionnalités+pour+aider+au+contrôle+de+la+qualité|Wikipédia:Échos+de+la+fondation#c-Lofhi-20240125225100-ChezLesDevs+:+nouvelles+fonctionnalités+pour+aider+au+contrôle+de+la+qualité+]]"
sectiontitle	""
useskin	"vector-2022"

The whitespace is already removed even before submitting the message. The problem seem even more global, since adding a whitespace or even _ (considered as a whitespace) after é is also cleaned by Parsoid when using source mode.

What should have happened instead?:
Autoscroll to the HTML identifier.

Event Timeline

Lofhi updated the task description. (Show Details)
Lofhi updated the task description. (Show Details)

That link does seem to work for me (I get https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:%C3%89chos_de_la_fondation#c-Lofhi-20240125225100-ChezLesDevs_:_nouvelles_fonctionnalit%C3%A9s_pour_aider_au_contr%C3%B4le_de_la_qualit%C3%A9_ copied, which does indeed include the trailing _) -- following it, I'm scrolled there and the comment is highlighted.

Could you tell me which browser you're using, since that might be a factor in this sort of URL-mangling? (I've tested on macOS with Safari, Firefox, and Chrome.)

The link work by itself. Now, try to paste it in visual mode, or add an internal link using the “add a link” button on source mode. The previewed generated link has no trailing _, the identifier is broken. Firefox 122, Windows 11.

Esanders subscribed.

Yes, the VE link inspector changes underscore to space, as this generally safe to do with titles, but as pointed out here it is not safe with hash fragments. We should avoid underscore replacement in hash fragments completely.

No - there are quite a few places in MediaWiki that assume that we can convert _ to ' ' and trim, specifically the widely used mw.Title.

This leaves us with 2 not very satisfactory options:

  1. Fix everywhere that parses URLs / Titles to allow trailing underscores
  2. Fix our ID generation code to trim trailing underscores

(1) seems like it could be a large undertaking, and would leave behind a bunch of messy code just to support some DT IDs
(2) would break all existing DT IDs that end in an underscore. We'd need to have some normalisation code in a bunch of places (including SQL queries?) or we'd need to run a migration script to fix at least some of these IDs (probably not the ones in notifications, but maybe the ones in dt_items and dt_item_ids)

Change 995261 had a related patch set uploaded (by Esanders; author: Esanders):

[mediawiki/extensions/DiscussionTools@master] Trim whitespace from truncated heading titles in IDs

https://gerrit.wikimedia.org/r/995261

Since I reported the problem, I've encountered it less than ten times. That's still 10 cases where the anchor is broken...

Today I tried to reference a request to the administrators on frwiki in another request to the administrators and well it doesn't work even and the first one is about to be archived (so moved from the "live" page).