Page MenuHomePhabricator

Investigation: Re-use ref tags from templates
Closed, ResolvedPublic

Description

Goal: understand why VE is not able to recognize and process references produced by templates
Output: an explanation of the problem and feasibility assessment of solving it.

Problem statements

I am an editor using visual editor
I'm trying to identify and work with existing references in an article.
But the numbers in the body of the article don't match the numbers in the <references> section at the bottom of the article.
Because VE apparently gets confused when some <ref> are not in the article but in a template.

I am an editor using visual editor.
I'm trying to copy and paste an existing citation to reuse it for the first time in the article.
But the reference instance is given an automatic number by VE that already exists on the article page, and VE doesn't warn me before saving.
Because a reference named ":0" already exists (maybe inside of a template transclusion?) and a new name like ":0" is assigned which already exists on the page. Also VE doesn't preview the error message, even if there is one after the page is saved.

Notes

  • References added within a template (like an infobox) show up in the list now (see resolved ticket T53289), but they are not editable (T52896).
  • This is likely a bigger problem, there are other cards which seem to be related to the invisibility of refs inside a template transclusion.//

See also:

Outcomes

There are two very different issues preventing this from working already.

  • In the case that the template emits a ref tag as its top-level output node, the data-mw for the template conflicts with the ref's structured data, and the template data wins. This is discussed in T214241 and some ideas were suggested, but it's currently an open problem. The changes will need to take place in Parsoid, to generally support the case where multiple wikitext constructs overlap to create a single DOM element. Once that's solved, we can probably teach VE to understand this additional data.
  • If the template output has a deeper hierarchy and the ref tag appears with at least one level of parent node above it, then the ref attributes are discoverable in Parsoid output. VE is currently able to render these Cite elements correctly but they're ignored by the editor. This is probably intentional and it seems feasible to change VE to recognize the refs as read-only nodes which are still catalogued in the full list of refs, enabling reuse, avoiding name collisions, and other use cases. I assume that any general solution to the first problem (ref and template data-mw colliding) will also make the ref accessible to VE in a similar way as more deeply-nested refs currently are.

Event Timeline

Lena_WMDE updated the task description. (Show Details)
awight renamed this task from Investigation: Ref tags and template refs in VE to Investigation: Ref tags inside of templates in VE.Apr 18 2023, 10:22 AM
awight updated the task description. (Show Details)

I started digging into this but wasn't able to fully crack it.

This explains a series of issues:

  • Why references from a template can't be reused via the citation dialog.
  • Why the numbering is sometimes weird.
  • Why the rendering of the reference list at the end of the article is sometimes different between read and edit mode.
  • Why some references say "This reference is defined in a template or other generated block, and for now can only be edited in source mode."
  • Why it's possible to accidentally create conflicting names without any warning in VisualEditor, e.g. name=":0" when name=":0" already appears in a template.

At the moment my gut feeling is: It doesn't make much sense for us to dig into this. We would need weeks if not months to learn what's needed to make the necessary, probably fundamental changes to VE's internals. This is probably better done by the Editing-team.

Not much to add here. I came to similar conclusions. ( I was not even able to see placeholder for my transcluded templates in the data ).

Change 917903 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Fix empty previews in reference reuse dialog

https://gerrit.wikimedia.org/r/917903

All of the findings make sense to me. I just wanted to add that the Parsoid output seems to include all of the information we might need, so the necessary adjustments would be to Visual Editor so that it parses all of the available data. For example, in https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User:Adamw%2Fsandbox%2FCite/585110 there is a ref tag produced by a template, and as you see its name {{{name}}} and body {{{body}}} are present in the HTML.

lilients_WMDE renamed this task from Investigation: Ref tags inside of templates in VE to Investigation: Re-use ref tags inside of templates or image captions in VE.May 10 2023, 1:57 PM
lilients_WMDE updated the task description. (Show Details)
lilients_WMDE renamed this task from Investigation: Re-use ref tags inside of templates or image captions in VE to Investigation: Re-use ref tags from templates or image captions in VE.May 10 2023, 2:01 PM
WMDE-Fisch renamed this task from Investigation: Re-use ref tags from templates or image captions in VE to Investigation: Re-use ref tags from templates.May 11 2023, 8:31 AM
WMDE-Fisch updated the task description. (Show Details)

Another finding: VE is able to recognize the ref inside of a transclusion if we add mw:Transclusion to the list of ve.dm.MWReferenceNode.static.allowedRdfaTypes, and also disable ve.dm.MWTransclusionNode self-registration. There's a conflict between the two and it seems that the transclusion node type wins because it gets registered earlier than types defined by extensions. The MWTransclusionNode class even includes a hack to define a trivial matchFunction specifically to force a higher precedence for this type over other node types coming from extensions, so the whole thing feels very deeply baked-in. Defining a trivial ve.dm.MWReferenceNode.static.matchFunction as a counter-hack raises the precedence so that the ref is recognized, but this breaks the name parsing and causes template content to not appear.

My earlier comment was too optimistic about the "name" attribute appearing in output. It's present as part of the DOM element ID, but doesn't appear in structured "data-mw". At my current level of understanding, it's looking extremely challenging to get ref metadata from transclusion output.

Now that I see the shortcomings of the incoming Parsoid document, I'm realizing that the required change is actually to have Parsoid preserve some of the RDFa structures inside of template transclusions, then having VE recognize the elements but preventing direct editing of them. I've asked the parser team about this in Slack.

This exact discussion appears in T214241: data-mw info is clobbered by template annotations. The issue is slightly different than I understood: the ref wikitext parameters are only destroyed if the template emits a ref at the top level. If the ref is nested in a <div> in the template, for example, then the ref data is present: https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User:Adamw%2Fsandbox%2FRefB

There are still support gaps in this case, and the named ref isn't currently visible to VE but we can assume feasibility since the data is available.

The case of a ref appearing at the template's top level currently has no solution, although a proof of concept patch exists: https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/607632/ .

awight removed awight as the assignee of this task.Jan 8 2024, 3:45 PM
awight updated the task description. (Show Details)
awight moved this task from Doing to Tech Review on the WMDE-TechWish-Sprint-2024-01-04 board.

Tagging @MSantos, we think this set of related use cases (VE support for named refs produced by templates) might make it worthwhile to look at T214241 again.

WMDE-Fisch claimed this task.

Investigation is done for now. There will be follow up tickets for possible implementation ideas.