What happened:
In copying back their VE-updated sandbox into a live article, a H:CERDK error is displayed in the References section ("named reference defined multiple times"). (permalink)
What should have happened:
All reference "names must be unique". Editors using VE are not in control of what reference names are created.
Why this happened:
This happened (see caveat below) because less than the entire article was copied into the sandbox, and while working in the sandbox with VE, a ref name was assigned by VE to new content containing a new citation, without having knowledge of the full set of ref names in portions of the live article that were not copied to the sandbox, and thus VE acting in the sandbox assigned a numeric name that was unique in the sandbox, but not unique in the live article.
This is partly conjecture, but is based on my best understanding and reconstruction of events, based on reports by the editor, and by examination of article, sandbox, and editor history: the editor worked in a sandbox created empty, then progressively expanded it, interlacing some edits with new content (e.g., here) or new citations (e.g., here) with other edits being copies of source from the live article into the sandbox (e.g., here). I believe the smoking gun is this edit to their sandbox, which ultimately found its way back into the article, causing the collision. (Note: the term 'sandbox' is a convenience term; in fact, it was a user subpage.)
Reproduce:
It's difficult to know the exact sequence of events carried out by a student editor, and in any case reproduction involves multiple pieces, such as the "smoking gun" edit (if that reconstruction is correct), but by itself, if VE created that reference, by itself that is not a bug, as the numeric reference is unique in the sandbox. And a user "copying sandbox material back to the live article" is not, ipso facto, a problematic procedure; in fact, it is required of editors in the Wiki Ed program. Finally, I don't know VE's algorithm for assigning new numeric names for citations added by a user. However, given all that, I can make the following proposal about how to reproduce the problem, although in reality it's not definitive, unless one has internal knowledge of VE's numeric-id assignment algorithm, so multiple trials may be needed until the observed problematic behavior occurs:
Possible step sequence to reproduce:
- Find an article having some numeric, VE-style references.
- Create a user sandbox for the article.
- Optional: copy some source content (say, one or two sections to be worked on) from article to sandbox, ensuring that:
- The copy must not be the entire article.
- The portion of the article not copied must contain at least one VE numeric named reference (the more the better, to improve the likelihood of a given trial reproducing the bug).
- Using VE, add content in the sandbox, adding at least one citation (adding more than one may improve your chances).
- Observe sandbox for added numeric names: run diffs to see what numeric names were added, and note them.
- Search the live article for citations matching any of those names.
- Did you find any of those ref names in live? NO: go back to step 4. YES: continue.
- Match them up, live vs. sandbox. Are they for the same source (author, date, pub, etc.) in every case? YES: go back to step 4. NO: continue.
- Merge back: copy the updated section content overlaying the original content in the article (after assuring no intervening updates by other editors to working sections (see below).
- Click Preview mode, and scroll down to the References section.
- Note the Red message "Cite error: The named reference ":123" was defined multiple times with different content (see the help page)."
This reproduction sequence contains a loop because of the uncertainty about what numeric name VE will assign, and whether that will or won't create a collision.
Note: in the case of two editors working on the same article with VE, even on mutually exclusive sections (as I believe sometimes happens among student editors sharing work on an article), the problem becomes more complex, and is not part of this bug description. At a minimum, collisions become more likely, and more tedious to find and resolve.
As a practical matter: analysis of this type of collision is tedious and time-consuming, and requires a fairly experienced editor to perform. Repairing the collision can be even worse, and if not caught early on, or if involving multiple citations (or god forbid, multiples by two editors) may be so tedious as to be effectively impossible. (See this UTP discussion for an example. In this case, the collision was reported rapidly, involved no intervening edits by any other editor (save two minor bot edits), and involved only a single collision of one numeric reference–i.e., the simplest possible case of this. Still, analysis and especially repair was tedious.)
A sidebar about the assignment to project VE, and setting it as task='bug': I'm not unaware that from a narrow point of view, this might not be considered a 'bug' (especially by VE software developers), as strictly speaking, there is no way for VE to know what's going on in a separate file that is later merged back; or if it is a bug, maybe it's not a VE bug. I understand and sympathize (I've been a developer, and have been on the other side of this) but I try to take a user-first view, and from the point of view of a user editing a valid article containing valid citations using long-established procedures such as "developing in your sandbox" and "merging back", using an approved editor, for it to all go kablooey and generate an error message on the page is certainly not the editor's fault. That, to me, is the definition of a bug: something is not working right someplace, and the editor is not responsible for it. It doesn't matter much to the user who owns the task, or whether it is called a "bug" or something else, or what project board it ends up on. Maybe the locus of the problem lies somewhere in the interstices of insufficiently robust software, loose procedures, missing copy tools, vague documentation, or something as yet not clearly identified, but whatever it is, this is not the editing experience we wish our users to have. On that basis, I've raised it as a "bug", because that's how it looks to a user, and I've attached it to "VE" even if it is blameless in some sense, because that seems to be the "proximate cause" here; I have no objection if someone reassigns the task type and/or the project appropriately.
Possible mitigating factors, workarounds, or solutions:
If an article contains no VE numeric ref names, then regardless what editor a user employs, there is very low risk of collision at merge-back time, if that editor is the only one editing the article. Even if the article contains no numeric names, if two editors are editing it with VE and each is using their own sandbox for updates, even in strictly separate sections, there is a risk.
These conditions are probably too complex to expect student editors to be able to handle. So, to be safe under current circumstances, only one student editor should edit an article using a sandbox at one time, until the sandbox is merged back. (Switching to wikitext editor is a workaround; then two student editors may edit, with low risk of collision.)
The collision possibility arises from editing only a partial copy of an article in a sandbox, where VE doesn't know the full set of refnames already in use in the article. A workaround, is to not make partial copies; that is, instruct VE users who wish to edit in a sandbox using VE, to copy the entire article into the sandbox, even if they only want to work on one portion of it. This is safe wrt VE and will avoid collisions, but may be error-prone for student editors in other ways, and thus is less than optimal, although possibly better than nothing, for some student editors. (Feedback on this point from Wiki Ed experts would be helpful.)
Workaround: don't do anything; the problem doesn't appear to occcur all that often, and when it does, get somebody else to fix it. One downside: who ya gonna call? My one experience with one of the easiest manifestations of the problem is far too tedious to ever want to repeat.
Mitigate it with a tool: proposed function: VE-section-copy:
Create a new copy tool (or function within VE); let's call it "VE-Section-Copy". This is designed to copy a section(s) of an article to a sandbox (or other destination page), and contain everything VE needs to know, in order that future citations added by an editor using VE in the copied sandbox will never collide with existing ones, even in uncopied parts of the article.
(With my designer hat on, I can't help envisioning a specific implementation for this: copy the desired section wikitext to the dest page, then append a metadata setion within hidden text delimiters consisting of a string with every named ref in the article (numeric or not) as a self-closed named ref, i.e.: 'Lorem.<ref name="Foo"/><ref name=":3"/><ref name=":17"/>' etc. VE would require a modification to be able to recognize and read the metadata section with the ref names *as if they were not hidden* and so avoid assigning them, as the editor starts modifying the sandbox section with new citations. On merge-back, ideally the student would know enough to just copy the updated section and skip the metadata, but if they forget and copy the hidden section, too, no harm done: the worst that would happen is that there would be a hidden text section in the article that doesn't belong there (as long as the delimiters were not corrupted); the references tool wouldn't generate any spurious citations. A bot could harvest the hidden metadata later, if no one else does. Okay, sorry for the digression; couldn't help myself.)
Mitigate it after the fact with an analysis/repair tool:
Create a tool able to examine article history, sandbox history, and find last good version before a H:CERDK error, identify the edit that caused the error, and either on its own or interactively with editor assistance, resolve the problem.
Lower the risk before the fact: use script RefRenamer to remove all VE numeric ref names from the article, before copying sections to a sandbox. Unless editors are working simultaneously in two sandboxes, this lowers the risk of any collision upon merge-back to near zero.
Conclusion:
The upshot of all this, is that at the very least, changes to doc or Help pages may be required in various locations, for starters to describe the current situation and advise users about possible risks and workarounds, and later about any new tools that might be created. In particular, Wiki Ed procedures and training materials for students and professors may require changes to document best practices regarding the use of sandboxes.