Add TemplateData configuration for how reference names should be generated.
- Mentioned In
- T231754: Request to add a warning for broken/orphaned reference names before saving edit
T169841: Task Request: Change the default ref name produced by visual editor
T215867: Reference auto name uses an existing name
T208981: Reference context doesn't show that the reference is reused, which is confusing if you try to delete and recreate it (converting a re-used basic reference to template citation)
T208781: Visual Editor contravenes long-standing guidelines: numeric reference names
T52568: VisualEditor: Be able to name references manually in the reference dialog
- Mentioned Here
- T52568: VisualEditor: Be able to name references manually in the reference dialog
The ideal system would be to use TemplateData to specify which parameters should be used to generate the name from, but in cases where there is no such data we still need a sensible fallback, and the current system has the advantage of using latin numbers, which work with other languages well.
Another reason to fix this bug: If people are copying wikitext between articles (which happens often enough that enwiki has a bot to look for it), sensibly-named references are easier to figure out what's going on if they copy text with a <ref name="..."/> (resulting in a broken reference in the copied-to article) and are less likely to have name collisions if they copy a reference that includes the full reference body text.
I ran into this earlier today. name=":0" is too generic and easily causes problems when bringing in content from another page (both in wikitext and through VE) because it is basically guaranteed to conflict if both pages had at least one VE-generated citation.
One small improvement we could make is use simple hash. Nothing strong or cryptographic, but something like string-hash.js (5 lines of code).
We can convert the digest number to a string with .toString(36) and produce a short unique string (taking care to check if it already exists, at which point one could add Math.random and hash again).
That takes care of the cross-article conflict problem and is better than starting the count at 0 and using :0 as id (which we currently do – also no idea why there is a colon in the name).
@Anomie, I believe that the colon was chosen because it's in the very small set of (things that can be used) and (characters present on the keyboards of most MediaWiki users). The first requirement explains why it's not all numbers (some non-numeric character is required), and the second explains why it's not a Latin alphabet character.
I agree. We don't (yet) need a system that creates meaningful names, but we need one that doesn't generate likely collisions.
If someone can point me to where the current code lives, I'll write up a patch and submit it.
I think this is modules/ve-cite/ve.dm.MWReferenceNode.js in the Cite extension, see the "Generate a name starting with ':' to distinguish it from normal names" comment
Thanks for the pointer. I'm working on this. For folks who are also looking that isn't the only place where we expect that reference numbering: https://phabricator.wikimedia.org/diffusion/ECIT/browse/master/modules/ve-cite/ve.ui.MWReferenceSearchWidget.js;06376669d9c1895d9b312998d0ee331520eea6a1$161-165
While ref tags that take the form of ":0", ":1", ":2" are unique, they are not very informative. One alternative would be a Harvard style ref tag in the form of first authors last name + year of publication (i.e., "Smith_2017").
Auto-label them before insertion, but allow them to be changed by pressing the Edit button when our mouse pointer is hovered on the newly created Citation. This would be done before the changes are Saved.
I'm not surprised to find that this has already been raised, but am surprised and disappointed that it's been allowed to remain unresolved for so long.
If a reference uses a citation template, then there are fields which can be used to make a reference name. It doesn't depend on Artificial Intelligence solutions, just a "If LAST1 is present, use it. If that name matches an existing reference, and DATE is present, add the year. If no year, add a running number. etc etc". Even if the flowchart had some "too difficult" end boxes saying "If all else fails use a colon and a number", we could get the vast majority of reference names chosen sensibly, in a way compliant with the spirit of the enwiki guideline which forbids the use of purely numeric reference names. ":0" is not purely numeric, but all arguments against purely numeric names apply to it.
While this would be easy to implement for any specific language (e.g. only for English), keep in mind that citation templates are translated to 200+ languages. When this task was filed, we had no way to know that e.g. "nazwisko" in Polish is equivalent to "last" in English.
It seems that since then, someone has invented Citoid and TemplateData :), and as part of these, invented a way for communities to specify a mapping like this – see e.g. https://pl.wikipedia.org/w/index.php?title=Szablon:Cytuj_stronę/opis&action=edit (search for "maps"; this is the "cite web" template).
We could probably use those mappings now, there is some documentation here: https://www.mediawiki.org/wiki/Citoid/Maps_TemplateData
As for the actual algorithm for generating the name, surely there exists some bot or something already that merges and names identical references? It would be a lot easier if such a thing was out there and if we could borrow that code.
This has been proposed as part of the 2019 Community Wishlist: https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019/Citations/VisualEditor:_Allow_references_to_be_named It's too early to know whether it will make the top 10 (voting will be open until the 30 November 2018), but it's currently among the more popular items, which suggests that solving this problem has widespread community support.
Remember the good ol', "don't let the perfect be the enemy of the good." When I go to https://www.wikipedia.org, I only see ten Wikipedias listed there. If you implement the fix just for those ten, I'm guessing you're fixing a very significant percentage of the problem. Nothing wrong with incremental rollout: I see no reason to hold up an initial fix for a handful of languages, while someone figures out how to say "last1" and "year" in Inuktitut, Kapampangan, Tuvinian and Cherokee.
The discussion above seems to ignore the needs of human editors. When I try to work in the text editor on an article which has multiple multi-used references, created in VE, I need to be able to see which reference is which. Initially I can see that "footnote n refers to reference colon - n - minus - one; by the time I've rearranged the text of the article I now have footnote "4" as ref ":3", and so on. See https://en.wikipedia.org/wiki/Kate_Jagoe-Davies as an example.