Page MenuHomePhabricator

References that are identical in everything but the name are not merged
Open, Needs TriagePublic

Description

This bug arises when adding two references to an article that are identical in everything but their name, such as:

<ref name="Name1">Foobar</ref><ref name="Name2">Foobar</ref>

In this circumstance, the references ought to display as one to the reader, since they display exactly the same. However, the software does not do so the same way it does when everything including the name is the same.

This can be resolved by changing the reference names to match, but it should still be fixed, as editors might neglect to do so. It also presents a currently insurmountable issue for uses of the Wikidata module as I explain here, where the reference name is generated and it often would not be advisable to change other instances of the reference supporting separate facts to the generated name.

Event Timeline

I found an example of this bug that I described here. It's a case where the duplicated citations are coming from three different locations: two transcluded templates and the article itself.
References coming from templates that are identical to those in the article could be a significant source of redundant citations, since they don't appear when editing the article text, but only in the rendered page.

Novem_Linguae subscribed.

This ticket is more complicated than it looks. For example

  • If one of these merged citations were to be edited or deleted in Visual Editor, what would happen to the wikicode?
  • If parsed HTML were to be passed to the Parsoid API and the API asked to convert the HTML to wikitext, how would it know what wikitext to generate?

Perhaps it is better to fix these manually using duplicate citation detection and fixing tools, rather than have the software try to detect these and change the parser output.

Honestly I think this ticket might need to be declined, since it would break the wikicode -> HTML -> wikicode loop by making HTML untranslatable back to wikicode. This seems to go against the design of Parsoid.

(I found this ticket via the reward board)

Also if references are created with Citoid, they could be identical but for their access_date, so any duplication logic would need to know which attributes to consider when comparing references.

@Novem_Linguae, thanks for taking a look at this. For your first question, I care a lot more about what happens in the published version for the reader than what happens in VisualEditor. If they display as two different independent citations in VE, and then only get merged in the display output once the edit is published, that's fine.

For your second question, I'm afraid I'm not really sure what you're asking, since you're at a deeper technical level than I get. I'm just describing the desired functionality. The software already knows how to merge references that are identical in everything (including ref name), so I'd think the easiest implementation would be to just tweak whatever code does that, telling it to just ignore the reference name when it does whatever comparison it does to determine whether two references are the same (or, alternatively, to pretend in the output that the second ref is named the same as the first, as this affects nothing for the reader (except the URL when they click on a footnote number)).

@Esanders, if the access date is different, then the two references are not the same in everything but name, and this ticket wouldn't seem to apply. Whether we'd want to merge those would be a question to take up in a separate ticket.

In the past, at enwiki, there have been a couple of editors who objected to this kind of merging. Some of the discussions will be in the archives of [[w:en:WT:CITE]], but if memory serves, it irritated them to see the reference numbers "out of order". That is, they disliked Sentence.[1] Sentence.[2] Sentence.[1] and preferred Sentence.[1] Sentence.[2] Sentence.[3], even though [1] and [3] were the same source.

@Whatamidoing-WMF, I'd be interested to see those discussions if you can find them. But that doesn't seem like a very coherent objection. We already have the ability to reuse references normally, and that's the standard best practice when wanting to use the same reference twice, creating the same 1-2-1 display. Anyone who dislikes that configuration is out of step with consensus (we could run a poll if anyone doubts this and wants it confirmed) and violating DRY. All this ticket is doing is seeking to patch a hole in what is already established software behavior.

@Novem_Linguae, just to quickly follow up, did my message above speak to your queries?

In regards to WAID's comment above, I agree with Sdkb that a minority of editors that want to purposely duplicate citations so they can manually control citation numbering isn't a great objection. Seems like those folks are breaking best practices.

In regards to my comment in June 2023, I have since discovered that the Parsoid -> HTML -> Parsoid loop is not 1:1 and has exceptions, so my original objection isn't great either. I'm neutral on this one now, and will defer to the experts (such as Ed from the Editing Team, who commented above).