Page MenuHomePhabricator

Reference anchor and target IDs use inconsistent percent encoding
Closed, DeclinedPublic

Description

On enwiki > Nights: Journey of Dreams the reference link [7] uses percent encoding...

<a href="#cite_note-FOOTNOTESega20074%E2%80%935-8">[7]</a>

...but the`li` it links to doesn't...

<li id="cite_note-FOOTNOTESega20074–5-8">

I realize tapping [7] still works, but would be nice for debugging if you could, say, search for an id and see the 2 matches you expect instead of just one because the other one uses different encoding. Confused me at a first when I tried to do so :)

This caused a bug T183048 in some reference gathering JS used by apps which didn't account for the href # id and the li id using different encodings. Easily fixed, but thought it would be nice if these id's were made consistent upstream too :)

Event Timeline

It doesn't:

$ curl -s 'https://en.wikipedia.org/wiki/Nights:_Journey_of_Dreams' | ack -o '<[^<]*cite_note-FOOTNOTESega20074.{1,9}5-8[^>]*>'
<a href="#cite_note-FOOTNOTESega20074–5-8">
<li id="cite_note-FOOTNOTESega20074–5-8">
$ curl -s 'https://en.wikipedia.org/api/rest_v1/page/html/Nights:_Journey_of_Dreams' | ack -o '<[^<]*cite_note-FOOTNOTESega20074.{1,9}5-8[^>]*>'
<a href="./Nights:_Journey_of_Dreams#cite_note-FOOTNOTESega20074–5-8" style="counter-reset: mw-Ref 7;">
<li about="#cite_note-FOOTNOTESega20074–5-8" id="cite_note-FOOTNOTESega20074–5-8">
<span id="mw-reference-text-cite_note-FOOTNOTESega20074–5-8" class="mw-reference-text">

Might have something to do with the JS library you are doing. According to the spec the anchor should be percent-encoded when parsing the URL, so maybe the library automatically does that for you.

You are right, the mobile version does have percent-encoded anchors (although in general the DOM viewer in the browser is not a reliable way of making sure - it shows the source after it has been corrected for various HTML parsing errors).

$ curl -s 'https://en.m.wikipedia.org/wiki/Nights:_Journey_of_Dreams' | ack -o '<[^<]*cite_note-FOOTNOTESega20074.{1,9}5-8[^>]*>'
<a href="#cite_note-FOOTNOTESega20074%E2%80%935-8">
<li id="cite_note-FOOTNOTESega20074–5-8">

Probably caused by the DOMDocument processing in MobileFrontend then?

Jdlrobson triaged this task as Medium priority.Mar 15 2018, 4:35 PM

This will hopefully be fixed by T188547 which is in the current sprint. If so we'll be able to close this swiftly.

Sadly this is not easily fixable (see https://phabricator.wikimedia.org/T188547#4063268)
You'll need to update this in the client. We'll be doing the same in MobileFrontend.

Potentially you might be equipped to fix this in the REST base layer.

So is this something we should address in the PCS @bearND

@Fjalapeno Searching for FOOTNOTESega20074 in the Parsoid version or the local PCS read-html endpoint (draft version) I see no percent encoding used in this href or id value. So I don't expect PCS to have this problem since it uses Parsoid.
FWIW, the new references endpoint looks clean in that regard, too. I checked even though the issue was really with the link from the content to the reference.
The only place I see this encoding is in the action=mobileview output the iOS app still uses.

Plan to move to PCS anyways.