Page MenuHomePhabricator

enwiktionary: Parsoid's output includes self-links for fragment links unlike legacy output
Open, Needs TriagePublic

Description

Compare Parsoid vs legacy, specifically the link "complete aspect of ngumisi". The HTML for Parsoid includes a full link whereas legacy output doesn't.

The output comes from templates, and it is likely that modules on enwiktionary are generating this output and their self-link checks are broken for Parsoid. https://en.wiktionary.org/wiki/Module:links#L-438--L-451 is a likely culprit that needs closer investiation.

There are two ways of handling this:

  1. Add a CSS class that matches the presentation of the link without removing the link. This matches how wikitext is handled -- self-links are rendered as links but CSS changes their rendering and behavior.
  2. Fix modules on wikis so that their self-link handling can work with Parsoid - this may also be related to other bugs in Parsoid that breaks certain modules (T348722 is an example and this could be related)

While it is useful to investigate a module as in strategy 2. above, strategy 1 is the right solution for this particular issue because it matches wikitext output for selflinks elsewhere. So, if we were go this route, we would add CSS to the wiki to match legacy rendering.

(note - I have struck out some elements of the description as they seem to be predicated on an incorrect understanding of the source of the issue -- @TTO)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1112292 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[integration/visualdiff@master] Use CSS to style selflink-fragments in Parsoid HTML to match legacy

https://gerrit.wikimedia.org/r/1112292

But, the CSS route breaks other pages like https://en.wiktionary.org/wiki/me which has lots of selflink-fragments that are left alone. So, maybe this will require us to look at the modules and fix them instead.

https://en.wiktionary.org/wiki/Module:links#L-291--L-304 might be the code in question that we might want to audit for Parsoid HTML.

Hi @TTO, per your message on my talk page, I just realized I had filed this phab task previously. Is this something you might have thoughts on the best way to fix for enwiktionary?

The code in question has since moved to Module:links#L-438--L-451.

I imagine the difference in behaviour occurs in the get_current_L2 Lua function in Module:pages. It's worth noting that the Parsoid version of the page also adds a spurious Tagalog entries with incorrect language header category to the output, and this functionality is also reliant on detecting the current L2.

Is it possible that this logic simply needs to be updated to work with Parsoid HTML? Is there a way to conditionally detect the use of Parsoid so that a separate Parsoid-capable logic branch can be provided? Is it even possible to rewrite that code under Parsoid? (I sure hope so.)

Please correct me if my understanding is wrong.

(As an aside, it's difficult to debug this, because TemplateSandbox does not use Parsoid even when the relevant user preference is set.)

(As an aside, it's difficult to debug this, because TemplateSandbox does not use Parsoid even when the relevant user preference is set.)

Does it help if you used the "?useparsoid=1" in the query param?

Is it possible that this logic simply needs to be updated to work with Parsoid HTML? Is there a way to conditionally detect the use of Parsoid so that a separate Parsoid-capable logic branch can be provided? Is it even possible to rewrite that code under Parsoid? (I sure hope so.)

I'll need to understand what you are trying to do there before we can think about what the equivalent code for Parsoid would be. I'm a little surprised that you are having to inspect strip markers, etc. But, if you can describe what you are trying to do (or point me to documentation related to that), we can think through this.

(As an aside, it's difficult to debug this, because TemplateSandbox does not use Parsoid even when the relevant user preference is set.)

Does it help if you used the "?useparsoid=1" in the query param?

This doesn't help, because query parameters are lost when you submit the edit form. Even adding <input type=hidden name=useparsoid value=1> to the edit form using DevTools doesn't have any effect.

I'll need to understand what you are trying to do there before we can think about what the equivalent code for Parsoid would be. I'm a little surprised that you are having to inspect strip markers, etc. But, if you can describe what you are trying to do (or point me to documentation related to that), we can think through this.

I think @Theknightwho wrote that code and would be able to offer more insights.

I wrote this code. I give a detailed explanation of the issue below, but the tl;dr is that get_current_L2 relies on a nasty kludge to get the current section number (the number used to determine the section when you edit a section), and this kludge seems to be incompatible with Parsoid.

The solution to this is to add a get_current_section function to Scribunto, which would avoid the need for the kludge.

get_current_L2 (found in Module:pages) calls get_current_section, which works as follows:

  1. Preprocess a new heading with something like frame:preprocess("=foo=").
  2. This generates a heading strip marker in the format \127'"`UNIQ--h-N--QINU`"'\127, where N is the next section number (i.e. the current section is N - 1).
  3. Unfortunately, this increments the heading count (i.e. checking the count also increments the count), so the next time the section count is checked, the real section will be N - 2, even if it's contained in another invoke.
  4. The way around this is to take advantage of nowiki strip markers, because you can call frame:extensionTag("nowiki", offset), and this will store the offset inside a nowiki strip marker. This offset can then be incremented each time, with the new offset being stored in a new nowiki strip marker every time the function is called during the current page parse.
  5. The current offset can be retrieved in the next invoke by unstripping nowiki strip markers one by one, counting backwards until you find the most recent offset. Offsets are stored with a special prefix, which avoids the chance of false-positives, so all you need to do is call frame:extensionTag("nowiki", "") at the start of the function, which will increment the nowiki counter once again, which gives you the value to start counting backwards from, and then you check for the offset by looking for the most recent nowiki strip marker that begins with the correct prefix once unstripped.

This is a (very inefficient) way to pass information between invokes, but it shouldn't be necessary to rely on this roundabout kludge for this particular issue in the first place. In my view, the section number should be freely available via an in-built function.

Change #1112292 abandoned by Subramanya Sastry:

[integration/visualdiff@master] Use CSS to style selflink-fragments in Parsoid HTML to match legacy

Reason:

This is not right

https://gerrit.wikimedia.org/r/1112292

Ya, that hack is definitely incompatible with Parsoid for a number of reasons. For one, with Parsoid's template processing, you will not get strip markers -- Parsoid's template processing model is equivalent to calling Special:ExpandTemplates. In the not too far future, we will roll out selective processing for high-performance updates on edits - T363421 - which means, only a small part of the page will get reprocessed. And, possibly non-sequential processing of templates, as well. We'll have to ponder this and see how to provide that via Scribunto that works with both legacy parser and Parsoid.