Page MenuHomePhabricator

New vector TOC not escaping hash fragments and so generating invalid URLs
Closed, ResolvedPublic

Description

Hash fragments need to be escaped with Sanitizer::escapeIdForLink.

  1. Go to https://en.wikipedia.beta.wmflabs.org/w/index.php?title=User:ESanders_(WMF)/sandbox&oldid=556241&useskin=vector-2022
  2. Click on the last link in the TOC (Romeo+Juliet)
  3. Observer the link does nothing

In classic vector this works fine: https://en.wikipedia.beta.wmflabs.org/w/index.php?title=User:ESanders_(WMF)/sandbox&oldid=556241&useskin=vector

TODO

QA Results - Beta

ACStatusDetails
1โœ…T315222#8252105

QA Results - Prod

ACStatusDetails
1โœ…T315222#8277896

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptAug 15 2022, 11:58 AM

There are a few dozens test cases in parserTests.txt where the whole table of contents is asserted, checking things like this encoding issue and also deduplication of duplicate headings. I would suggesting making a Vector-2022 version of these tests.

(Removing regression as in web team we reserve this for established features and this one is still in development). Thanks for the report and suggestion Ed!

It's a regression from Vector classic

Change 823274 had a related patch set uploaded (by Func; author: Func):

[mediawiki/skins/Vector@master] ToC: Handle anchors with % in plaintext

https://gerrit.wikimedia.org/r/823274

@cscott would it make sense to do this in the parser or in the render layer? I presume the parser as there is likely other bugs relating to this e.g. API?

Change 823588 had a related patch set uploaded (by Func; author: Func):

[mediawiki/core@master] skins: Expose anchors escaped for links in the section data

https://gerrit.wikimedia.org/r/823588

@cscott would it make sense to do this in the parser or in the render layer? I presume the parser as there is likely other bugs relating to this e.g. API?

I would only suggest introducing this in skin components, not the parser.
Otherwise, there would be plenty of repos that need to be patched, since their code and tests are coupled with the structure of section data.
In the meantime, the percentage sign is very uncommon and a simple regex replace can be implemented when the developer is using API, it doesn't make too much sense. The API return provided just enough information.

@Func but the anchor parameter is broken in this API result, no? https://en.wikipedia.beta.wmflabs.org/wiki/Special:ApiSandbox#action=parse&format=json&oldid=556241&prop=sections

Otherwise, there would be plenty of repos that need to be patched, since their code and tests are coupled with the structure of section data.

Could you give some examples of where this would be problematic?

I agree with @Jdlrobson that the section API looks problematic to me. Like with T306862: Chinese Language Converter is not working in the sidebar table of the contents in Vector-2022 it looks to me like we really ought to providing the proper information in the section JSON. In this case, the section JSON has an "anchor" property, I agree with the bug description that we should be running Sanitizer::escapeId* on it, not forcing all clients of this API to reimplement the particular escaping mechanism used by MediaWiki (some clients may be client-side JS for example).

(As a general note, the JSON returned by the Section API really should be better documented, and the escaping (or not) status of the 'anchor' property should certainly be included in a description of that field. Did the work that was done for T299065: Revise table of contents data format include some documentation of the section data? Parser.php doesn't really have any, but T299065 seems to reference a "Section" class at one point.

Change 823588 abandoned by Func:

[mediawiki/core@master] skins: Expose anchors escaped for links in the section data

Reason:

https://gerrit.wikimedia.org/r/823588

Esanders renamed this task from New vector TOC not escape hash fragments and generating invalid URLs to New vector TOC not escaping hash fragments and so generating invalid URLs.Aug 22 2022, 8:55 PM

C Scott would like web team to take on this work. Need help writing test case and QA. But available for consultation.
Low risk short term, high risk longer term. Worst case is some broken hash fragments in certain products. Might require follow up work.

Olga: this is likely a blocker for deployment, particularly in non-Latin languages.

Change 831147 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/core@master] Parser: Use linkAnchor in section definition, not anchor

https://gerrit.wikimedia.org/r/831147

Change 831147 merged by jenkins-bot:

[mediawiki/core@master] Parser: Use linkAnchor in section definition as well as anchor

https://gerrit.wikimedia.org/r/831147

Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)

This is now ready for web team estimation.

Change 823274 merged by jenkins-bot:

[mediawiki/skins/Vector@master] ToC: Handle anchors with % in plaintext

https://gerrit.wikimedia.org/r/823274

Test Result - Beta

Status: โœ… PASS
Environment: beta
OS: macOS Monterey
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

Go to https://en.wikipedia.beta.wmflabs.org/w/index.php?title=User:ESanders_(WMF)/sandbox&oldid=556241&useskin=vector-2022
Click on the last link in the TOC (Romeo+Juliet)
โœ… AC1: The page should navigate to the corresponding section.

Vector:

Screen Recording 2022-09-21 at 4.39.33 PM.mov.gif (952ร—1 px, 308 KB)

Vector Legacy:

Screen Recording 2022-09-21 at 4.40.06 PM.mov.gif (952ร—1 px, 385 KB)

Change 834386 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Linker::generateTOC(): distinguish between $anchor and $linkAnchor

https://gerrit.wikimedia.org/r/834386

Change 834386 merged by jenkins-bot:

[mediawiki/core@master] Linker::generateTOC(): distinguish between $anchor and $linkAnchor

https://gerrit.wikimedia.org/r/834386

Edtadros subscribed.

Test Result - Prod

Status: โœ… PASS
Environment: testwiki
OS: macOS Monterey
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

Go to a page with encodable characters and % in plaintext.
Click on the link in the TOC (Romeo+Juliet)
โœ… AC1: The page should navigate to the corresponding section.

Vector:

Screen Recording 2022-10-01 at 11.13.21 AM.mov.gif (684ร—1 px, 376 KB)

Vector Legacy:

Screen Recording 2022-10-01 at 11.16.19 AM.mov.gif (684ร—1 px, 285 KB)