Page MenuHomePhabricator

Parsoid's Cite output could break gadgets, bots, user scripts
Closed, ResolvedPublic

Description

Parsoid's Cite output differs from the output currently rendered in desktop read views. A number of user scripts, bots, and gadgets likely depend on the structure and classes found in current output. But, as part of read views, when we roll out Parsoid's output, we could break them.

Concrete example: Accessibility attributes in Cite output
In this link, @Izno says: I notice that the "up arrow" in VE links pertaining to this discussion do not contain aria-label, which makes them more accessible, as content in CSS is not guaranteed to be read out in accessibility agents. This is a delta from read-mode Cite today. See the DEF reference in my sandbox. There is a similar full span for the other links in the multiple use reference.

The issue at this time seems to be that Cite's JS modules don't apply to Parsoid's HTML. Since Parsoid uses different class names, the JS code that inspects Cite content won't work out of the box. In this specific case, that code looks for .mw-cite-backlink which Parsoid doesn't add.

It is not too difficult to fix up Cite's resource modules to handle Parsoid output, but this is just an example of the larger issue raised earlier..

Possible strategies
Given the experience with media output changes in core, it might be prudent to switch over class names to what the Cite extension uses rather than try to update all the code that are probably referencing these classes. There may also be some impact from some minor HTML structure differences that we will have to evaluate.

Here are some possibilities:
(a) Add a html2html processor that adapts Parsoid's cite output to current output format expected by gadgets. But, in parallel, start an effort to get gadgets, bots, etc. to migrate over to Parsoid's native output so we can eliminate this processor
(b) Figure out what is involved in updating Parsoid's cite output to be a bit more closer to current output to minimize impact on existing gadgets, bots, etc. reduce the effort needed to migrate over to Parsoid -- from an early inspection, a number of clients already have dual-mode code to handle both output, but I doubt on-wiki tools and workflows are Parsoid-ready.
(c) Just embrace Parsoid's output and start an effort to migrate gadgets, user scripts, bots, etc. to be compatible with Parsoid output.

Note that there is no easy path available to us at this time. There are a number of clients, tools, and external users that have come to rely on Parsoid HTML. So our ecosystem is split into two sets of users. So, we need to do a quick evaluation of what our best forward path here is.

Event Timeline

Looks like this will get resolved once Parsoid is integrated into the rendering pipeline and ResourceLoader will add all the relevant modules, in this case, the ext.cite.a11y.js and ext.cite.highlighting.js.

Okay, even after integration, this is not resolved. I went to https://en.wikipedia.org/wiki/Hospet?useparsoid=1, and inspected the linked up-arrow in the 2nd reference and it is missing the aria-label. So, need to look at why (and if this is a case of missing modules in the Cite extension code in Parsoid).

Looks like this is related to how resource modules are handled in Parsoid's extension support. We should simply fall back to the extension's resource loader spec in the extension.json file rather than try to duplicate that support in Parsoid. Right now, Parsoid's cite implementation has the references tag define the modules it needs and adds them. But, the cite implementation that targets legacy parser does this differently and has additional scripts and modules defined in the extension.json file and those won't get automatically added by the integration. We need to figure out a way to access those defs. Alternatively, we have to get back to moving this implementation out of Parsoid's repo into the Cite repo where we have access to al this naturally. But, then, we will have to deal with the fact that we can no longer run cite parser tests in standalone mode which might be okay.

Unassigning myself for now. Might pick this up later.

Change 937566 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] Cite: Add missing resource modules

https://gerrit.wikimedia.org/r/937566

Change 937566 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Cite: Add missing resource modules

https://gerrit.wikimedia.org/r/937566

Change 938893 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.18.0-a17

https://gerrit.wikimedia.org/r/938893

Change 938893 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.18.0-a17

https://gerrit.wikimedia.org/r/938893

ssastry updated the task description. (Show Details)
ssastry updated the task description. (Show Details)

@Tacsipacsi @Izno @PerfektesChaos We are beginning to get closer and closer to using Parsoid for desktop read views and this phab task exposed a possible issue with the impact of Parsoid's Cite output on gadgets, bots, user scripts, etc.

You can right now try Parsoid-based read views by appending "?useparsoid=1" to any wiki page. While we are in no way ready to release this for widespread testing yet and while this is just an early preview for you all, if you notice any egregious breakages, please do report.

Do you have any thoughts on the notes I dumped in the phab task description?

ssastry updated the task description. (Show Details)

What's the current status here? It's hard to tell from the backlog exactly what accessibility attributes are missing, and where they are located in the DOM.

If you open any wiki page, go to the references section, and inspect the "^", you will see the difference in core and Parsoid output here. I thought parsoid integration would fix it. But, it was only one piece of it. I also had to update Parsoid's Cite to add the resource module. But now that both those pieces are done, the problem is that the JS code doesn't apply to Parsoid HTML. I used this example to raise the bigger issue with Cite's HTML and dependence on it in gadgets, bots, and user scripts.

I was hoping for an itemized list of differences. But I'll take https://en.wikipedia.org/wiki/Atlantic_City%E2%80%93Brigantine_Connector arbitrarily (it is today's article of the day on enwiki), and picking the markup for citation #1 in the references list. Legacy is:

<li id="cite_note-exit1-2"><span class="mw-cite-backlink">^ <a href="#cite_ref-exit1_2-0"><span class="cite-accessibility-label">Jump up to: </span><sup><i><b>a</b></i></sup></a> <a href="#cite_ref-exit1_2-1"><sup><i><b>b</b></i></sup></a></span> <span class="reference-text"><style data-mw-deduplicate="TemplateStyles:r1133582631">.mw-parser-output cite.citation{font-style:inherit;word-wrap:break-word}.mw-parser-output .citation q{quotes:"\"""\"""'""'"}.mw-parser-output .citation:target{background-color:rgba(0,127,255,0.133)}.mw-parser-output .id-lock-free a,.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/6/65/Lock-green.svg")right 0.1em center/9px no-repeat}.mw-parser-output .id-lock-limited a,.mw-parser-output .id-lock-registration a,.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/d/d6/Lock-gray-alt-2.svg")right 0.1em center/9px no-repeat}.mw-parser-output .id-lock-subscription a,.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/a/aa/Lock-red-alt-2.svg")right 0.1em center/9px no-repeat}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/4/4c/Wikisource-logo.svg")right 0.1em center/12px no-repeat}.mw-parser-output .cs1-code{color:inherit;background:inherit;border:none;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;color:#d33}.mw-parser-output .cs1-visible-error{color:#d33}.mw-parser-output .cs1-maint{display:none;color:#3a3;margin-left:0.3em}.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right{padding-right:0.2em}.mw-parser-output .citation .mw-selflink{font-weight:inherit}</style><cite class="citation web cs1"><a rel="nofollow" class="external text" href="https://www.sjta.com/acexpressway/acx_map_exit.asp?exit=3133">"Atlantic City Expressway: Exit 1"</a>. <a href="/wiki/South_Jersey_Transportation_Authority" title="South Jersey Transportation Authority">South Jersey Transportation Authority</a><span class="reference-accessdate">. Retrieved <span class="nowrap">December 21,</span> 2019</span>.</cite><span title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=unknown&amp;rft.btitle=Atlantic+City+Expressway%3A+Exit+1&amp;rft.pub=South+Jersey+Transportation+Authority&amp;rft_id=https%3A%2F%2Fwww.sjta.com%2Facexpressway%2Facx_map_exit.asp%3Fexit%3D3133&amp;rfr_id=info%3Asid%2Fen.wikipedia.org%3AAtlantic+City%E2%80%93Brigantine+Connector" class="Z3988"></span></span>
</li>

and Parsoid is:

<li about="#cite_note-exit1-2" id="cite_note-exit1-2"><span rel="mw:referencedBy" id="mwAck"><a href="./Atlantic_City–Brigantine_Connector#cite_ref-exit1_2-0" id="mwAco"><span class="mw-linkback-text" id="mwAcs">1 </span></a><a href="./Atlantic_City–Brigantine_Connector#cite_ref-exit1_2-1" id="mwAcw"><span class="mw-linkback-text" id="mwAc0">2 </span></a></span> <span id="mw-reference-text-cite_note-exit1-2" class="mw-reference-text"><style data-mw-deduplicate="TemplateStyles:r1133582631" typeof="mw:Extension/templatestyles mw:Transclusion" about="#mwt62" data-mw="{&quot;parts&quot;:[{&quot;template&quot;:{&quot;target&quot;:{&quot;wt&quot;:&quot;Cite web &quot;,&quot;href&quot;:&quot;./Template:Cite_web&quot;},&quot;params&quot;:{&quot;url&quot;:{&quot;wt&quot;:&quot;https://www.sjta.com/acexpressway/acx_map_exit.asp?exit=3133&quot;},&quot;title&quot;:{&quot;wt&quot;:&quot;Atlantic City Expressway: Exit 1&quot;},&quot;publisher&quot;:{&quot;wt&quot;:&quot;[[South Jersey Transportation Authority]]&quot;},&quot;access-date&quot;:{&quot;wt&quot;:&quot;December 21, 2019&quot;}},&quot;i&quot;:0}}]}" id="mwAc4">.mw-parser-output cite.citation{font-style:inherit;word-wrap:break-word}.mw-parser-output .citation q{quotes:"\"""\"""'""'"}.mw-parser-output .citation:target{background-color:rgba(0,127,255,0.133)}.mw-parser-output .id-lock-free a,.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/6/65/Lock-green.svg")right 0.1em center/9px no-repeat}.mw-parser-output .id-lock-limited a,.mw-parser-output .id-lock-registration a,.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/d/d6/Lock-gray-alt-2.svg")right 0.1em center/9px no-repeat}.mw-parser-output .id-lock-subscription a,.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/a/aa/Lock-red-alt-2.svg")right 0.1em center/9px no-repeat}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/4/4c/Wikisource-logo.svg")right 0.1em center/12px no-repeat}.mw-parser-output .cs1-code{color:inherit;background:inherit;border:none;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;color:#d33}.mw-parser-output .cs1-visible-error{color:#d33}.mw-parser-output .cs1-maint{display:none;color:#3a3;margin-left:0.3em}.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right{padding-right:0.2em}.mw-parser-output .citation .mw-selflink{font-weight:inherit}</style><cite class="citation web cs1" about="#mwt62" id="mwAc8"><a rel="mw:ExtLink nofollow" href="https://www.sjta.com/acexpressway/acx_map_exit.asp?exit=3133" class="external text" id="mwAdA">"Atlantic City Expressway: Exit 1"</a>. <a rel="mw:WikiLink" href="./South_Jersey_Transportation_Authority" title="South Jersey Transportation Authority" id="mwAdE">South Jersey Transportation Authority</a><span class="reference-accessdate" id="mwAdI">. Retrieved <span class="nowrap" id="mwAdM">December 21,</span> 2019</span>.</cite><span title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=unknown&amp;rft.btitle=Atlantic+City+Expressway%3A+Exit+1&amp;rft.pub=South+Jersey+Transportation+Authority&amp;rft_id=https%3A%2F%2Fwww.sjta.com%2Facexpressway%2Facx_map_exit.asp%3Fexit%3D3133&amp;rfr_id=info%3Asid%2Fen.wikipedia.org%3AAtlantic+City%E2%80%93Brigantine+Connector" class="Z3988" about="#mwt62" id="mwAdQ"></span></span></li>

The appearance in plaintext from this tag soup is something like:

1. ^ a b "Atlantic City Expressway: Exit 1". South Jersey Transportation Authority. Retrieved December 21, 2019.

So Parsoid is missing:

  1. <span class="cite-accessibility-label">Jump up to: </span> before the "a b" after the up caret. Worth noting that core puts this label inside the <a> tag for the 'a' link, but leaves the 'b' link alone.
  2. That's it?

In particular, I don't see aria-label in the legacy output either. But citation 5 has the aria-label; it seems like perhaps this is only added if there is exactly one backlink, but not if there are more than one? (A bug in legacy we shouldn't reproduce?)

Citation 5 in legacy is:

<li id="cite_note-6"><span class="mw-cite-backlink"><b><a href="#cite_ref-6" aria-label="Jump up" title="Jump up">^</a></b></span> <span class="reference-text"><link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r1133582631"><cite class="citation map cs1"><a rel="nofollow" class="external text" href="https://www.arcgis.com/apps/webappviewer/index.html?id=02251e521d97454aabadfd8cf168e44d&amp;marker=-8285281.755%2C4776498.5452%2C102100%2CMile%20post%202.3%2C%2CMile%20post%202.3&amp;level=18&amp;showLayers=wms_6584%3Bwms_6584_Natural2020"><i>NJ-GeoWeb</i></a> (Map). <a href="/wiki/New_Jersey_Department_of_Environmental_Protection" title="New Jersey Department of Environmental Protection">New Jersey Department of Environmental Protection</a><span class="reference-accessdate">. Retrieved <span class="nowrap">June 21,</span> 2022</span>.</cite><span title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=unknown&amp;rft.btitle=NJ-GeoWeb&amp;rft.pub=New+Jersey+Department+of+Environmental+Protection&amp;rft_id=https%3A%2F%2Fwww.arcgis.com%2Fapps%2Fwebappviewer%2Findex.html%3Fid%3D02251e521d97454aabadfd8cf168e44d%26marker%3D-8285281.755%252C4776498.5452%252C102100%252CMile%2520post%25202.3%252C%252CMile%2520post%25202.3%26level%3D18%26showLayers%3Dwms_6584%253Bwms_6584_Natural2020&amp;rfr_id=info%3Asid%2Fen.wikipedia.org%3AAtlantic+City%E2%80%93Brigantine+Connector" class="Z3988"></span></span>
</li>

and in Parsoid is:

<li about="#cite_note-6" id="cite_note-6"><a href="./Atlantic_City–Brigantine_Connector#cite_ref-6" rel="mw:referencedBy" id="mwAgI"><span class="mw-linkback-text" id="mwAgM">↑ </span></a> <span id="mw-reference-text-cite_note-6" class="mw-reference-text"><link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r1133582631" about="#mwt50" typeof="mw:Extension/templatestyles" data-mw="{&quot;name&quot;:&quot;templatestyles&quot;,&quot;attrs&quot;:{&quot;src&quot;:&quot;Module:Citation/CS1/styles.css&quot;},&quot;body&quot;:{&quot;extsrc&quot;:&quot;&quot;}}" id="mwAgQ"><cite class="citation map cs1" id="mwAgU"><a rel="mw:ExtLink nofollow" href="https://www.arcgis.com/apps/webappviewer/index.html?id=02251e521d97454aabadfd8cf168e44d&amp;marker=-8285281.755%2C4776498.5452%2C102100%2CMile%20post%202.3%2C%2CMile%20post%202.3&amp;level=18&amp;showLayers=wms_6584%3Bwms_6584_Natural2020" class="external text" id="mwAgY"><i id="mwAgc">NJ-GeoWeb</i></a> (Map). <a rel="mw:WikiLink" href="./New_Jersey_Department_of_Environmental_Protection" title="New Jersey Department of Environmental Protection" id="mwAgg">New Jersey Department of Environmental Protection</a><span class="reference-accessdate" id="mwAgk">. Retrieved <span class="nowrap" id="mwAgo">June 21,</span> 2022</span>.</cite><span title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=unknown&amp;rft.btitle=NJ-GeoWeb&amp;rft.pub=New+Jersey+Department+of+Environmental+Protection&amp;rft_id=https%3A%2F%2Fwww.arcgis.com%2Fapps%2Fwebappviewer%2Findex.html%3Fid%3D02251e521d97454aabadfd8cf168e44d%26marker%3D-8285281.755%252C4776498.5452%252C102100%252CMile%2520post%25202.3%252C%252CMile%2520post%25202.3%26level%3D18%26showLayers%3Dwms_6584%253Bwms_6584_Natural2020&amp;rfr_id=info%3Asid%2Fen.wikipedia.org%3AAtlantic+City%E2%80%93Brigantine+Connector" class="Z3988" id="mwAgs"></span></span></li>

Oddly we have a ↑ character literally in the HTML here, even though (a) we didn't have one in the multiple backlinks HTML, and (b) it is set to display:none. But again, the substantive difference is that legacy has:

<a href="#cite_ref-6" aria-label="Jump up" title="Jump up">^</a>

and Parsoid has

<a href="./Atlantic_City–Brigantine_Connector#cite_ref-6" rel="mw:referencedBy" id="mwAgI"><span class="mw-linkback-text" id="mwAgM">↑ </span></a>

Discussion about ways forward in the next comment, this one is getting too large!

It seems not-great that Parsoid's output differs so much in the "one linkback" versus "multiple linkback" cases. The spec at https://www.mediawiki.org/wiki/Specs/HTML/2.7.0/Extensions/Cite seems to imply this should just be:

<a href="./Main_Page#cite_ref-1" rel="mw:referencedBy">
    <span class="mw-linkback-text">↑ </span>
</a>

versus

<span rel="mw:referencedBy">
  <a href="./Main_Page#cite_ref-three_3-0"><span class="mw-linkback-text">1 </span></a>
  <a href="./Main_Page#cite_ref-three_3-1"><span class="mw-linkback-text">2 </span></a>
</span>

and I do wonder why we didn't just use the latter format consistently, even when the number of linkbacks was one; especially since all of the *actual text* here is being replaced by CSS rules. It seems we were trying to collapse an "unnecessary" span wrapper in the "only one linkback case" by moving the rel attribute to the <a>, but I think this is a false economy in terms of the complexity it adds. Rote span tags are cheap when gzip-compressed, and putting the rel attribute after the href in the "one linkback" case probably makes it *larger* than the separate span because a variable section is separating what would otherwise be a single coding directory entry. But I digress.

In any case, the concrete work to be done here seems to be two different tasks for "one linkback" and "more than one linkback". The former should probably be:

<a href="./Main_Page#cite_ref-1" rel="mw:referencedBy" aria-label="Jump up" title="Jump up"><span class="mw-linkback-text">↑</span></a>

(ie adding aria-label and title to the outer <a> tag, and yes there are issues with localization here alas, but that's orthogonal.)
And I'd propose that the latter should be:

<span rel="mw:referencedBy">
    <a href="./Main_Page#cite_ref-three_3-0" aria-label="Jump up to use #1" title="Jump up"><span class="mw-linkback-text">1 </span></a>
  ...

(ie, adding aria-label and title to the <a> tag, just like we did for the "one linkback" case), probably for each of the linkback links instead of just the first. There's another option, which is to add an (ignored) <span class="cite-accessibility-label">Jump up to: </span> just before the list of <a> tags, but that would require cooperation from VE to ignore this, and I think it complicates the machine-processing of citations.

In any case, I don't think that this task necessitates a wholesale reimagining of Parsoid output for citations, just adding some attributes to <a> tags, even though I did gripe about the inconsistency with the one-linkback and multiple-linkback cases. :) I don't see anything having to do with the class names here.

ssastry renamed this task from Parsoid's Cite output is missing accessibility attributes to Parsoid's Cite output could break gadgets, bots, user scripts.Jul 27 2023, 4:16 PM
ssastry updated the task description. (Show Details)

Here are some notes I am dumping here we are going to use in some sync discussion. The main difference seems to be in the HTML structure emitted for the backlinks section. See below. I also did a global search on all wikis for references to the classnames that are only present in legacy output (but not in Parsoid output). While we could ostentibly fix these all on our own, this doesn't help all the other content users. So, at the very least, if we decide not to make any changes to Parsoid's CIte output, we would have to publish docs and guidance for how to adapt selectors and JS code for Parsoid.

non-named refs

Legacy : ol > li > ( span.mw-cite-backlink > a + span.reference-text )
Parsoid: ol > li > ( a[rel=mw:referencedBy] > span.mw-linkback-text + span.mw-reference-text )

named refs

Legacy : ol > li > ( span.mw-cite-backlink > ( sup > a + sup > a + ... ) + span.reference-text )
Parsoid: ol > li > ( span[rel=mw:referencedBy] > ( a > span.mw-linkback-text + a > span.mw-linkback-text + .. ) + span.mw-reference-text )

JS & CSS (user-scripts & gadgets)

Code references

Briefly summarizing a discussion on CTT's tech forum, there are some specific changes which could move Parsoid and legacy output closer together:

  1. Add a span[rel=mw:referencedBy] wrapper to Parsoid's non-named refs output, which both brings the two cases of parsoid output closer together as well as makes parsoid's non-named refs output structurally more similar to legacy (this is proposed at T328695#9048889 as well)
  2. Add .mw-cite-backlink to the span[rel=mw:referencedBy] wrapper
  3. Rename .mw-reference-text in Parsoid to .reference-text
  4. Rename .reference-text in core to .mw-reference-text (yes, this is the mirror image of the previous proposal)
  5. Add [rel=mw:referencedBy] to the legacy output (@Arlolra believes there are some issues with cut-and-paste of RDF-contiaining output from legacy pages to VE, and has a proposal on how to deal with it)
  6. Add .mw-linkback-text to the legacy output

This is actually orthogonal to the accessibility task in the original description of this task. If we add the span wrapper in proposal one, however, that job gets a little easier though:

  1. Add aria-label and title to the a tag inside the span[rel=mw:referencedBy]

To investigate:

  • Is "non-named refs" vs "named refs" the only difference, or are we also generating different output based on # of references/uses? That is, if we name a ref but only use that name once, do we still generate the "named refs" structure, or do we generate something which looks more like "non-named refs"?
  1. Add [rel=mw:referencedBy] to the legacy output (@Arlolra believes there are some issues with cut-and-paste of RDF-contiaining output from legacy pages to VE, and has a proposal on how to deal with it)

This came up in T337438#8880302 but fleshing out the the details is still on my TODO

I can't remember what our consensus decision was among the proposals above, but I think we agreed that (1) followed by (7) would be a good first step. I think (2) and (6) were uncontroversial and that we hoped (5) was possible if @Arlolra figures out how to make it work.

For completeness, @subbu has proposed another alternative to #3 and #4 above, which I'll call "#3.5" make class="reference-text mw-reference-text" to both legacy and parsoid; aka have both parsers emit both classes.

I updated the stats in T328695#9125350. Given what I see there, here is an updated proposal:

  • Add reference-text class to Parsoid's output and mw-reference-text to legacy output. But, write up docs to deprecate reference-text class and encouraging everyone to move to mw-reference-text (in keeping with naming conventions).
  • Add mw-cite-backlink class in Parsoid's output to both [rel=mw:referencedBy](both span and a for named and non-named ref backlinks). But, any .mw-cite-backlink a CSS selectors will only select named refs in Parsoid's output. An alternative would be for us to add a <span class="mw-cite-backlink"> wrapper around the a[rel=referencedBy] node for the non-named ref case. However, there are only a couple hits for the .mw-cite-backlink a selector in the non-user namespace. So, we could skip the span wrapper for now and adjust those handful of gadget uses as well. But, adding it now brings Parsoid HTML wrappers for named ref backlinks and un-named ref backlinks to be almost identical (except for where the rel=mw:referencedBy attribute is added).

Both of these are non-breaking changes and will only improve compatibility of CSS and JS code that references any of these selectors.

I don't see any point in doing #6 proposed in T328695#9127541 since there are zero uses of it on-wiki. We cannot drop this wrapper from Parsoid's output because default CSS cite hides it letting on-wiki Common.css add different styling. I don't intend to implement #5 either. #7 will be automatically handled by the a11y and highlighting JS resource modules in the Cite extension once the '.mw-cite-backlink' changes are implemented for Parsoid HTML.

BREAKING CHANGES FOR LATER (timeline: likely after Parsoid readviews rollout):

  • Drop the reference-text class from Parsoid and legacy. Cannot do this now because of all the copious uses of reference-text in CSS & JS.
  • For the un-named ref backlink case, we could move the rel tag from a[rel=mw:referencedBy] to a span.mw-cite-backlink wrapper. Doing this now will likely break CSS and JS selectors both on-wikis and off-wikis. That said, these on-wiki references are not too many right now and @cscott was wondering if we should just do it now (rather than later) even if it is a breaking change.

Input welcome. I am going to start implementing these changes next week with the goal of deploying these the week after (mediawiki train week of June 3).

Change #1035809 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/extensions/Cite@master] WIP: Add missing classes to Parsoid's HTML

https://gerrit.wikimedia.org/r/1035809

For completeness, here are github (non-HTML) uses of:

There are likely a number of other Parsoid HTML users via the REST API whose code is not on github.

So, that is just to say that this comment from the description: "There are a number of clients, tools, and external users that have come to rely on Parsoid HTML. So our ecosystem is split into two sets of users". So, once we consolidate behind Parsoid output after the readviews rollout, we can do a major HTML version bump and clean it up.

Change #1036705 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/extensions/Cite@master] WIP: Add 'reference-text' class to Parsoid's HTML

https://gerrit.wikimedia.org/r/1036705

Change #1036705 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Add 'reference-text' class to Parsoid's HTML

https://gerrit.wikimedia.org/r/1036705

Change #1035809 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Add 'mw-cite-backlink' class to Parsoid's HTML

https://gerrit.wikimedia.org/r/1035809

I filed T378733 for followup post readview rollout.