Page MenuHomePhabricator

Reference anchor and target IDs use inconsistent percent encoding
Closed, DeclinedPublic

Description

On enwiki > Nights: Journey of Dreams the reference link [7] uses percent encoding...

<a href="#cite_note-FOOTNOTESega20074%E2%80%935-8">[7]</a>

...but the`li` it links to doesn't...

<li id="cite_note-FOOTNOTESega20074–5-8">

I realize tapping [7] still works, but would be nice for debugging if you could, say, search for an id and see the 2 matches you expect instead of just one because the other one uses different encoding. Confused me at a first when I tried to do so :)

This caused a bug T183048 in some reference gathering JS used by apps which didn't account for the href # id and the li id using different encodings. Easily fixed, but thought it would be nice if these id's were made consistent upstream too :)

Event Timeline

Mhurd created this task.Dec 16 2017, 12:09 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 16 2017, 12:09 AM
Tgr added a subscriber: Tgr.EditedDec 16 2017, 8:10 AM

It doesn't:

$ curl -s 'https://en.wikipedia.org/wiki/Nights:_Journey_of_Dreams' | ack -o '<[^<]*cite_note-FOOTNOTESega20074.{1,9}5-8[^>]*>'
<a href="#cite_note-FOOTNOTESega20074–5-8">
<li id="cite_note-FOOTNOTESega20074–5-8">
$ curl -s 'https://en.wikipedia.org/api/rest_v1/page/html/Nights:_Journey_of_Dreams' | ack -o '<[^<]*cite_note-FOOTNOTESega20074.{1,9}5-8[^>]*>'
<a href="./Nights:_Journey_of_Dreams#cite_note-FOOTNOTESega20074–5-8" style="counter-reset: mw-Ref 7;">
<li about="#cite_note-FOOTNOTESega20074–5-8" id="cite_note-FOOTNOTESega20074–5-8">
<span id="mw-reference-text-cite_note-FOOTNOTESega20074–5-8" class="mw-reference-text">

Might have something to do with the JS library you are doing. According to the spec the anchor should be percent-encoded when parsing the URL, so maybe the library automatically does that for you.

Mhurd added a comment.Dec 18 2017, 6:27 PM

@Tgr hmm the snippets I posted were from viewing source on https://en.m.wikipedia.org/wiki/Nights:_Journey_of_Dreams

You are right, the mobile version does have percent-encoded anchors (although in general the DOM viewer in the browser is not a reliable way of making sure - it shows the source after it has been corrected for various HTML parsing errors).

$ curl -s 'https://en.m.wikipedia.org/wiki/Nights:_Journey_of_Dreams' | ack -o '<[^<]*cite_note-FOOTNOTESega20074.{1,9}5-8[^>]*>'
<a href="#cite_note-FOOTNOTESega20074%E2%80%935-8">
<li id="cite_note-FOOTNOTESega20074–5-8">

Probably caused by the DOMDocument processing in MobileFrontend then?

Jdlrobson updated the task description. (Show Details)Dec 20 2017, 2:28 AM
Jdlrobson triaged this task as Normal priority.Mar 15 2018, 4:35 PM

This will hopefully be fixed by T188547 which is in the current sprint. If so we'll be able to close this swiftly.

Sadly this is not easily fixable (see https://phabricator.wikimedia.org/T188547#4063268)
You'll need to update this in the client. We'll be doing the same in MobileFrontend.

Potentially you might be equipped to fix this in the REST base layer.

So is this something we should address in the PCS @bearND

bearND added a comment.EditedMar 28 2018, 10:15 PM

@Fjalapeno Searching for FOOTNOTESega20074 in the Parsoid version or the local PCS read-html endpoint (draft version) I see no percent encoding used in this href or id value. So I don't expect PCS to have this problem since it uses Parsoid.
FWIW, the new references endpoint looks clean in that regard, too. I checked even though the issue was really with the link from the content to the reference.
The only place I see this encoding is in the action=mobileview output the iOS app still uses.

bearND closed this task as Declined.Feb 13 2019, 5:20 PM

Plan to move to PCS anyways.