Page MenuHomePhabricator

Named ref are considered different when the content is a whitespace for refs 2 to n
Closed, ResolvedPublic

Description

It seems that when you have <ref name="Test">...</ref> and {{#tag:ref| |...}}, the second one is considered as having a value (the whitespace character ?), so the result is an error in the page. If you remove the whitespace, the error disappears. Apparently, this is a new problem and was working before.

Error: https://fr.wikipedia.org/w/index.php?title=Embl%C3%A8mes_moraux&oldid=119970458 (see § Notes)
Fix by using <ref>: https://fr.wikipedia.org/w/index.php?title=Embl%C3%A8mes_moraux&diff=next&oldid=119970458

Error: https://fr.wikipedia.org/w/index.php?title=Liste_de_chansons_interpr%C3%A9t%C3%A9es_par_Amiati&oldid=111318825 (see § Notes)
Fix by removing the whitespace character: https://fr.wikipedia.org/w/index.php?title=Liste_de_chansons_interpr%C3%A9t%C3%A9es_par_Amiati&action=historysubmit&type=revision&diff=119975642&oldid=111318825

Event Timeline

NicoV raised the priority of this task from to Needs Triage.
NicoV updated the task description. (Show Details)
NicoV added projects: Cite, Regression.
NicoV subscribed.

Fix should be as simple as adding a trim() function at https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCite/HEAD/Cite_body.php#L461.

I also suspect the error message shouldn't be added umpteen times like it is at NicoV's first example...

This is a corner-case, but I think that's the expected behavior. (And yes, the error message for duplicate references with different contents is new.) You should be able to also use {{#tag:ref||...}} (with no space between the two |) to get the same behavior as <ref … />.

@matmarex
Yes, removing the space works, that's my fix for the second example I provided ;-)
So, it wasn't really correct before, it's just that now the error is visible when it's was completely transparent before ?

I'm not sure but I think that Cite should trim() the value each time rather than keep the entire text as is.
But it may have other impacts:

  • for example, <ref></ref> is considered an error, while <ref> </ref> is not -> it will become an error (which seems a good thing also to me)

Yes the error report on refs with same name and different content is a new behavior which we added to Cite extension recently as there were many cases where users mistakenly give two different refs the same name and it was failing quietly, using only one of the refs content.

In case we trim the content we should carefully test it, as there are many cases where the parser behaves (or may behave) differently with spaces. e.g "\n*" vs "\n_*" or "\n{|" vs "\n_{|" and so on.

OK !
I think a content with only whitespace characters could at least be treated as empty (a ref with only whitespace characters is useless).
This will avoid side effects with the parser, and will have 2 advantages :

  • do not raise an error in the situation I reported above
  • raise an error when you have <ref> </ref> like you have with <ref></ref>

For information, I've added the detection of references with the same name and different content to WPCleaner (#527), with automatic trimming when the "different content" is only due to trailing whitespace characters.

thiemowmde claimed this task.
thiemowmde subscribed.

I'm not 100% sure I understand the issue exactly. It seems the test case provided in this tasks description is incomplete. This is the test case I used:

<ref name="a">a</ref>
{{#tag:ref| |name=a}}

This behavior changed with https://gerrit.wikimedia.org/r/552073, done as part of T237241, and later evaluated via T240548: References with no visible content are reported as empty now. As a result of this, the example articles provided in the task description don't show an error any more.