Page MenuHomePhabricator

Duplicate reference name errors in English Wikipedia caused by MediaWiki templatestyle handling?
Closed, ResolvedPublic

Description

On the English-language Wiipedia site, I've been working through the category "Pages with duplicate reference names" to clean up duplicated reference errors. These errors are caused by two (or more) <ref name="tags with the same name"> that contain different content.

Over the weekend of September 28, 2018, I noticed a large number (about 500) of new entries in the category. Sometimes, this happens because templates that generate named references are edited and start producing duplicated (but different) references. But this time I can't find any such changes that might be to blame.

After asking after the issue, another user suggested this issue is caused by Wikimedia's handling of templatestyles itself, and asked that I report that problem here.

Template:Inflation/fn is one template that seems to be having trouble, and it's talk page has notes about what I've observed. An example effected article is 2010 FIFA World Cup, where subsequent invocations of the template used to work fine (before the weekend) and now generate duplicate reference name errors.

Template:Certification Table Entry is another template with problems, and it's talk page also has notes. The problem with this template seems specific to the use of the Finland region with the "region=Finland" parameter. These issues, too, appeared over the weekend and I can't seem to find any changes to the templates involved in the problem. Affected topics include Call Me Maybe and Jaged Little Pill.

I'm an experienced software developer, but I haven't before reported a bug in Wikimedia's Phabricator; I've got the basics, but I don't know the local customs. If anything is missing or needed, please do let me know and I can do my best to provide it. I'm raising this issue because of the large number of articles affected due to something that changed which seems to interrupt a common-place practice, but doesn't seem to have a cause or remedy available to end-users.

Event Timeline

Mikeblas created this task.Sep 30 2018, 4:01 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 30 2018, 4:01 PM

This is enough to show the error, without involving any templates or LUA code:

{{#tag:ref|<cite>The citation goes here</cite><templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>|name=X}}
{{#tag:ref|<cite>The citation goes here</cite><templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>|name=X}}

References

<references/>

Tgr added a subscriber: Tgr.Sep 30 2018, 10:29 PM

Huh, nice job tracking that down.

I guess TemplateStylesHooks::handleTag could have its own internal "seen" cache and assign a unique marker index to every style, instead of incrementing mMarkerIndex for every <templatestyles> tag encountered?

The number of articles listed in Category:Pages with duplicate reference names continues to grow, I guess as the safe copies of the cache expired.

Anomie added a subscriber: Anomie.

IMO it would probably be better to fix it in Cite, by having it unstrip markers when testing for exact duplication.

Near this line, make the code look like

$stripState = $parser->mStripState;
if (
    isset( $this->mRefs[$group][$key]['text'] ) &&
    $str !== $this->mRefs[$group][$key]['text'] &&
    $stripState->unstripBoth( $str ) !== $stripState->unstripBoth( $this->mRefs[$group][$key]['text'] )
) {
Izno added a subscriber: Izno.Oct 1 2018, 3:46 PM

IMO it would probably be better to fix it in Cite, by having it unstrip markers when testing for exact duplication.
Near this line, make the code look like

$stripState = $parser->mStripState;
if (
    isset( $this->mRefs[$group][$key]['text'] ) &&
    $str !== $this->mRefs[$group][$key]['text'] &&
    $stripState->unstripBoth( $str ) !== $stripState->unstripBoth( $this->mRefs[$group][$key]['text'] )
) {

I would tend to agree the more-appropriate fix is in Cite than in TemplateStyles.

Izno moved this task from Unsorted backlog to Defect backlog on the Cite board.Oct 1 2018, 3:46 PM

Can we please revert the offending change?

Can we please revert the offending change?

https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1 would be the place to make this request.

Change 467109 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/Cite@master] Unstrip <ref> contents before comparing

https://gerrit.wikimedia.org/r/467109

If there was no change to the MediaWiki software that caused a problem then there is no need for this change.

Izno added a comment.Oct 14 2018, 7:45 PM

If there was no change to the MediaWiki software that caused a problem then there is no need for this change.

Huh? No, that's not how that works. Sometimes we find not-great behavior in the software that is illuminated by the intended use of some other function.

Not-great behaviour is not bad behaviour, and the risk involved in changes like this is high. Such changes need to be very carefully considered and tested. The risk to reward ratio is too high.

If there was no change to the MediaWiki software that caused a problem then there is no need for this change.

The change to the MW software was the introduction of Template Styles. There appears to be a bug in its implementation that was brought to light when en.WP started using this new feature in one of its templates. We are asking for that bug to be fixed.

Over the last 12 hours (or so) there has been a significant drop in the number of topics in Category:Pages with duplicate reference names. I don't see here any notes that indicate an intentional fix was made. But I also note that all I know about the mechanisms involved came from researching this issue.

I note that the fix doesn't seem to be particularly complete. For example, Template:Wikidata revenue still seems to be adversely effected and generates duplicate reference name errors for multiple invocations within the same article when those invications were previously safe.

It's certainly true that complicated, flexible software sometimes has unintentional interactions beteen apparently disassociated features. However, it doesn't seem very responsible to make a breaking change and then leave it there without initiating a request to have the newly-identified problem addressed. Or, without offering to roll back the change that revealed the problem. It seems that this issue is exposing either a weak process or evidence of a political problem -- where someone wants to get something fixed by torching it first.

Whatever the root cause, it doesn't seem acceptable to make a breaking change and then ignore the results for two weeks.

Over the last 12 hours (or so) there has been a significant drop in the number of topics in Category:Pages with duplicate reference names. I don't see here any notes that indicate an intentional fix was made. But I also note that all I know about the mechanisms involved came from researching this issue.

The drop comes from editors carefully kludging around this Mediawiki bug. The kludges are designed to be removed when this bug is fixed. Nobody has been ignoring anything. The bug was reported, and a patch was uploaded two weeks later. Compared to some of the more serious phab tasks I follow, that is an amazingly quick turnaround.

I'm sure there's a lot that I don't know about the developmnet process used by this project. For example, I don't understand what "a patch was uploaded" means. I guess it doesn't mean that a fix was actually deployed, because if the fix was live, the kludges wouldn't be necessary.

When I say the results were ignored, I'm refering to the negative aspects of the original breaking change. The negative effects were evident before this issue was opened. If A is dependent on B, and A is changed to reveal a bug in B, it's my opinion that both A and B have a bug because they're incompatible. Sure, B should be fixed, but the change to A should be reverted until the fix in B can be implemented and both the initial change to A and the accomodating change in B are released at the same time. That might seem slow, but it results in better stability -- where kludges aren't required to patch over a regression.

Here, the negative effects caused by a change in A seem to have been ignored until this issue drew attention to them. Or, maybe they weren't known until this issue was opened. Because of inadequate testing, perhaps?

With the kludges in place, how will the effects of the patch be tested? How can site disruption be avoided (or at least, minimzed) in the future?

Please feel free to ask general "how does/should development work" questions in a forum or mailing list, as they are not directly related to the very topic of this task. [offtopic]: In short, patch upload happens in Gerrit. For deployment of code on production servers, see Deployments etc. For testing on servers before hitting production, see Beta cluster.

Change 467109 merged by jenkins-bot:
[mediawiki/extensions/Cite@master] Unstrip <ref> contents before comparing

https://gerrit.wikimedia.org/r/467109

Can we please revert the offending change?

The "offending change" was people on-wiki starting to use <templatestyles> in templates that are used inside <ref> tags (or, more specifically, ref tags created with {{#tag:ref}}).

If there was no change to the MediaWiki software that caused a problem then there is no need for this change.

This is incorrect. It should be possible to use <templatestyles> in templates that are used inside <ref> tags.

The change to the MW software was the introduction of Template Styles. There appears to be a bug in its implementation that was brought to light when en.WP started using this new feature in one of its templates. We are asking for that bug to be fixed.

There's actually no bug in TemplateStyles here, it's a bug in Cite's duplicate handling with respect to strip markers that hide the same text. It could be reproduced without TemplateStyles by something like

{{#tag:ref|<nowiki>abc</nowiki>|name=baz}}
{{#tag:ref|<nowiki>abc</nowiki>|name=baz}}

<references/>

Over the last 12 hours (or so) there has been a significant drop in the number of topics in Category:Pages with duplicate reference names. I don't see here any notes that indicate an intentional fix was made. But I also note that all I know about the mechanisms involved came from researching this issue.

It was most likely a change on the wiki. Perhaps Special:Diff/864009635.

However, it doesn't seem very responsible to make a breaking change and then leave it there without initiating a request to have the newly-identified problem addressed.

This task is that request.

Or, without offering to roll back the change that revealed the problem.

As noted above, there's no change in MediaWiki to roll back. It's the use on-wiki of TemplateStyles within Cite references that is causing this.

I'm sure there's a lot that I don't know about the developmnet process used by this project. For example, I don't understand what "a patch was uploaded" means. I guess it doesn't mean that a fix was actually deployed, because if the fix was live, the kludges wouldn't be necessary.

"A patch was uploaded" means that a developer created a software patch which is intended to fix the problem in the code.

The next step is for another developer to review and test the patch, with the eventual goal being that the patch is "merged" into the code. I just did that, BTW.

After that, the new version of the software has to be deployed. For Wikimedia wikis, this usually happens via the weekly "deployment train", as explained on enwiki's page WP:ITSTHURSDAY. For this task, that deployment should happen this week, October 16–18. If the issue is considered important enough to fix immediately, anyone is welcome to follow the instructions at https://wikitech.wikimedia.org/wiki/SWAT_deploys#How_to_submit_a_patch_for_SWAT to have it deployed sooner.

The Phabricator task is typically marked as "resolved" when the patch is merged, even though it hasn't been deployed to the Wikimedia wikis yet.

it's my opinion that both A and B have a bug because they're incompatible.

You're welcome to your opinion. Others need not agree. In this case, it's not merely "A is incompatible with B" because the problem only arises when someone writes wikitext that uses A and B together in a particular way.

Or, maybe they weren't known until this issue was opened.

That is, indeed, the case.

Because of inadequate testing, perhaps?

You're welcome to work on improving testing within MediaWiki.

With the kludges in place, how will the effects of the patch be tested?

Very easily. Most developers have one or more private installations of MediaWiki that they use for manually testing all sorts of things, and this bug is fairly easy to reproduce.

Anomie closed this task as Resolved.Oct 15 2018, 4:36 PM
Anomie claimed this task.

As noted above, we typically close Phabricator tasks when the fix is applied to the code, even though it hasn't been deployed to the production wikis yet. This fix should be deployed this week, reaching enwiki on Thursday. Anyone who wants it faster is welcome to follow the instructions at https://wikitech.wikimedia.org/wiki/SWAT_deploys#How_to_submit_a_patch_for_SWAT or seek someone to do it for them.

Anomie reassigned this task from Anomie to Tgr.Oct 15 2018, 4:37 PM