Page MenuHomePhabricator

Come up with a better way to auto-label references
Open, LowPublic8 Estimated Story Points

Description

Add TemplateData configuration for how reference names should be generated.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

No progress on this? <ref name=":0"> is not just ugly, it looks like an error code or something.

It's not in the top 100 things we're working on.

Anomie added a subscriber: Anomie.Nov 4 2015, 2:22 PM

Another reason to fix this bug: If people are copying wikitext between articles (which happens often enough that enwiki has a bot to look for it), sensibly-named references are easier to figure out what's going on if they copy text with a <ref name="..."/> (resulting in a broken reference in the copied-to article) and are less likely to have name collisions if they copy a reference that includes the full reference body text.

I ran into this earlier today. name=":0" is too generic and easily causes problems when bringing in content from another page (both in wikitext and through VE) because it is basically guaranteed to conflict if both pages had at least one VE-generated citation.

One small improvement we could make is use simple hash. Nothing strong or cryptographic, but something like string-hash.js (5 lines of code).

We can convert the digest number to a string with .toString(36) and produce a short unique string (taking care to check if it already exists, at which point one could add Math.random and hash again).

That takes care of the cross-article conflict problem and is better than starting the count at 0 and using :0 as id (which we currently do – also no idea why there is a colon in the name).

also no idea why there is a colon in the name

The Cite extension doesn't support integers as reference names. I don't know if colon specifically was chosen for any particular reason.

nshahquinn-wmf removed nshahquinn-wmf as the assignee of this task.Dec 11 2015, 10:31 PM

I'd love to see Citoid use some sort of reference naming! The idea with auto-generating name for each reference is a good alternative to manually setting them and I support it.

@Anomie, I believe that the colon was chosen because it's in the very small set of (things that can be used) and (characters present on the keyboards of most MediaWiki users). The first requirement explains why it's not all numbers (some non-numeric character is required), and the second explains why it's not a Latin alphabet character.

Krinkle removed a subscriber: Krinkle.May 20 2016, 6:08 PM

I ran into this earlier today. name=":0" is too generic and easily causes problems when bringing in content from another page (both in wikitext and through VE) because it is basically guaranteed to conflict if both pages had at least one VE-generated citation.

I agree. We don't (yet) need a system that creates meaningful names, but we need one that doesn't generate likely collisions.

If someone can point me to where the current code lives, I'll write up a patch and submit it.

If someone can point me to where the current code lives

I think this is modules/ve-cite/ve.dm.MWReferenceNode.js in the Cite extension, see the "Generate a name starting with ':' to distinguish it from normal names" comment

I think this is modules/ve-cite/ve.dm.MWReferenceNode.js in the Cite extension, see the "Generate a name starting with ':' to distinguish it from normal names" comment

Thanks for the pointer. I'm working on this. For folks who are also looking that isn't the only place where we expect that reference numbering: https://phabricator.wikimedia.org/diffusion/ECIT/browse/master/modules/ve-cite/ve.ui.MWReferenceSearchWidget.js;06376669d9c1895d9b312998d0ee331520eea6a1$161-165

Boghog added a subscriber: Boghog.Apr 2 2017, 7:12 PM

While ref tags that take the form of ":0", ":1", ":2" are unique, they are not very informative. One alternative would be a Harvard style ref tag in the form of first authors last name + year of publication (i.e., "Smith_2017").

@Boghog I agree + if there were more different publications by Smith from 2017, then Smith_2017a, Smith_2017b...

TheDJ added a subscriber: TheDJ.Apr 4 2017, 8:53 AM
tomasz removed a subscriber: tomasz.Jun 11 2017, 10:59 PM
Krinkle removed a subscriber: Krinkle.
Izno added a project: Cite.Nov 20 2017, 2:21 AM
Izno moved this task from Unsorted backlog to External on the Cite board.Nov 20 2017, 2:26 AM

Auto-label them before insertion, but allow them to be changed by pressing the Edit button when our mouse pointer is hovered on the newly created Citation. This would be done before the changes are Saved.

PamD added a comment.Nov 5 2018, 9:38 PM

I'm not surprised to find that this has already been raised, but am surprised and disappointed that it's been allowed to remain unresolved for so long.

If a reference uses a citation template, then there are fields which can be used to make a reference name. It doesn't depend on Artificial Intelligence solutions, just a "If LAST1 is present, use it. If that name matches an existing reference, and DATE is present, add the year. If no year, add a running number. etc etc". Even if the flowchart had some "too difficult" end boxes saying "If all else fails use a colon and a number", we could get the vast majority of reference names chosen sensibly, in a way compliant with the spirit of the enwiki guideline which forbids the use of purely numeric reference names. ":0" is not purely numeric, but all arguments against purely numeric names apply to it.

While this would be easy to implement for any specific language (e.g. only for English), keep in mind that citation templates are translated to 200+ languages. When this task was filed, we had no way to know that e.g. "nazwisko" in Polish is equivalent to "last" in English.

It seems that since then, someone has invented Citoid and TemplateData :), and as part of these, invented a way for communities to specify a mapping like this – see e.g. https://pl.wikipedia.org/w/index.php?title=Szablon:Cytuj_stronę/opis&action=edit (search for "maps"; this is the "cite web" template).

We could probably use those mappings now, there is some documentation here: https://www.mediawiki.org/wiki/Citoid/Maps_TemplateData

As for the actual algorithm for generating the name, surely there exists some bot or something already that merges and names identical references? It would be a lot easier if such a thing was out there and if we could borrow that code.

Elitre removed a subscriber: Elitre.Nov 8 2018, 4:42 PM

Such a bot has operated in the past: https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Polbot_8 I don't know if any bots are currently doing this.

Looks like that also didn't generate the names cleverly. It just used "botgen1", "botgen2" etc., instead of ":0", ":1" etc.

This has been proposed as part of the 2019 Community Wishlist: https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019/Citations/VisualEditor:_Allow_references_to_be_named It's too early to know whether it will make the top 10 (voting will be open until the 30 November 2018), but it's currently among the more popular items, which suggests that solving this problem has widespread community support.

While this would be easy to implement for any specific language (e.g. only for English), keep in mind that citation templates are translated to 200+ languages. When this task was filed, we had no way to know that e.g. "nazwisko" in Polish is equivalent to "last" in English. ...

Remember the good ol', "don't let the perfect be the enemy of the good." When I go to https://www.wikipedia.org, I only see ten Wikipedias listed there. If you implement the fix just for those ten, I'm guessing you're fixing a very significant percentage of the problem. Nothing wrong with incremental rollout: I see no reason to hold up an initial fix for a handful of languages, while someone figures out how to say "last1" and "year" in Inuktitut, Kapampangan, Tuvinian and Cherokee.

Izno added a comment.Dec 8 2018, 12:03 AM

This has been proposed as part of the 2019 Community Wishlist: https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019/Citations/VisualEditor:_Allow_references_to_be_named It's too early to know whether it will make the top 10 (voting will be open until the 30 November 2018), but it's currently among the more popular items, which suggests that solving this problem has widespread community support.

This along with T52568: VisualEditor: Be able to name references manually in the reference dialog were in the top 10.

Tgr added a subscriber: Tgr.Dec 25 2018, 7:02 AM
PamD added a comment.Apr 17 2019, 9:03 AM

The discussion above seems to ignore the needs of human editors. When I try to work in the text editor on an article which has multiple multi-used references, created in VE, I need to be able to see which reference is which. Initially I can see that "footnote n refers to reference colon - n - minus - one; by the time I've rearranged the text of the article I now have footnote "4" as ref ":3", and so on. See https://en.wikipedia.org/wiki/Kate_Jagoe-Davies as an example.

Sophivorus added a subscriber: Sophivorus.EditedMar 23 2020, 2:03 AM

The current reference naming system is a problem for excerpts (transclusions of part of an article into another article) which are used heavily on the Spanish Wikipedia and may soon start seeing much more use on other wikis. The reason is that often, a part of an article containing a reference named :0 is transcluded into another article that also has a reference named :0, causing a conflict. The current solution is to just rename one of the references, but this is not easy for new users and sometimes it's not even easy for advanced users. This issue would no longer happen (or at least be very rare) if the references were named "semantically".

Valereee added a comment.EditedMay 2 2020, 6:02 PM

How is this still languishing after five years? Visual editor adds useless ref names such as ":2". When I need to switch to source, I can't tell which ref is which. The only way I've found to prevent this is to add a ref, switch to source before using it a second time, add a useful ref name like jonesNYT1may2020 or whatever, which besides being tedious when I'm adding more than one ref doesn't fix all the other stupidly named refs added by someone else using VE. For editors who switch back and forth often, or editors who edit primarily in source, these useless ref names are just infuriating. Why are we deciding that the priority on this should be low? It's literally to me the most irritating thing about Visual Editor.

Dvorapa added a comment.EditedMay 2 2020, 6:28 PM

This would need some sort of simple hash codes. I'm not sure, what should they be based on. Timestamp? Url? Contents generally? revisionid?

With support of Citoid, we could generate simple acronyms using Zotero or some Zotero plugin like BetterBibTeX, which can handle this quite well.

Mathglot added a comment.EditedMay 2 2020, 6:53 PM

This would need some sort of simple hash codes.

I think what you meant was, "one solution might involve some sort of simple hash codes." It's certainly not true that this would need hash codes.

Another solution was proposed by PamD (Nov 5, 2018 1:38PM), and hers is far better, in my opinion, as it is human-friendly, and hash codes are not.

Even a hash code solution would have to deal with what to do about collisions, which could be made as improbable as desired, but not impossible. So, you're going to have to code the collision pathway anyway, and figure out what to do. Or, don't code it, allow the collision, and leave the rare named reference collisions lying around like little unexploded mines, that virtually no user, no matter how advanced, will ever disentangle.

Let's prioritize the users, here. PamD's solution seems better to me. If there are weak points in her design that I'm not seeing, let's identify them, and resolve them.

Dvorapa added a comment.EditedMay 2 2020, 7:03 PM

Another solution was proposed by PamD (Nov 5, 2018 1:38PM), and hers is far better, in my opinion, as it is human-friendly, and hash codes are not.

There is no difference between my and PamD's suggestion:

With support of Citoid, we could generate simple acronyms using Zotero or some Zotero plugin like BetterBibTeX, which can handle this quite well.

We both want to use Citation templates data if there is any. Simple hash codes would be needed for references without Citation templates only. I just suggested a way PamD's suggestion could be made possible.

Mathglot added a comment.EditedMay 2 2020, 7:29 PM

I see, my apologies in that case; I misread what you were proposing. Thanks for the clarification.

For references without citation templates, could one scan for something resembling date, isbn, doi, or pmid, and use that if possible? And in either case, what happens with collisions?

Why was this moved from medium to low priority?

@Valereee because in whatever ticket it was they did the actual investigation they found whole bunch of reasons that doing this would be hard and so I think it's not getting done if I recall.

The priority got lowered in 2015, likely because there was no developer assigned to work on it at that point. See https://www.mediawiki.org/wiki/Phabricator/Project_management#Priority_levels

Please move it to highest priority. Develop the fix per PamD and patch it in to the existing visual editor, or just disable the visual editor until this is done. It is unacceptable for the visual editor to generate names that (a) frequently cause name conflicts and (b) go against :en:WP:REFNAME, which discourages this style of ref name.

Who has the authority to upgrade or downgrade a priority?

I agree that this would still be an important improvement to consider. Appropriate naming using the "reuse" button on vis editor would be helpful for edu projects as well. Thank you for flagging me here!

disable the visual editor until this is done.

That's a bit of an overreaction. The way the visual editor is doing things right now isn't ideal, but it's easy enough to fix afterwards.

It is unacceptable for the visual editor to generate names that (a) frequently cause name conflicts

Can you give an example of this happening? I've never seen the visual editor do this.

Who has the authority to upgrade or downgrade a priority?

The Editing Team, who have almost certainly seen the comments here.

Community Tech, who is probably more likely to actually work on this, has also looked at this (T243300: Spike: Investigate Named References in VE [8 hours]), but it hasn't made it onto their list of upcoming projects. We might see them prioritize it sometime this year.

@Barkeep49 I won't pretend to understand why a fix would be hard, but for heaven's sake the fact something might be difficult shouldn't be a reason to downgrade its importance. That just tries to hide the biggest problems by calling them minor. It's like dropping your keys in the street but looking for them on the porch because the porch light makes it easer to search there. We should be prioritizing by actual priority as assessed by the people who are using the tool. I use this tool and honestly this is for me the single biggest frustration I have with Vis Ed. And honestly whoever decided it would be okay to make Vis Ed work like this in the first place must not actually edit. No one who had written an article from scratch would ever have thought this was by ANY measure a reasonable decision.

@Valereee You appear to be confusing "community importance" with "developer priority".

Importance is how impactful a particular bug or feature request is to a project's users. It's important to consider, but prioritizing only by importance would not be an efficient use of development resources.

Development priority refers to how a bug or feature request fits into a volunteer developer's interests or a WMF team's planning. There are vastly more bugs and feature requests than developers with the time and knowledge to fix them, so developers have to. The priority on a task communicates from the developers to the community (and other developers) what is being worked on now and in the near future. Increasing the priority of a task doesn't change how WMF teams and volunteer developers plan their time.

WMF development teams typically focus on one project at a time, often based on priorities set in WMF Annual Plans and other strategy processes. This often does result in a de-prioritization of smaller fixes and feature requests that don't require a large team to be redirected for a period of time or that don't produce a new thing that can be shown off. That's unfortunate, and I don't like it, but I also don't see it changing anytime soon.

The Community Tech team uses the Community Wishlist to pick up some of those smaller, high-importance feature requests. This task has come up twice in the Wishlist, and CommTech does appear to be looking at picking it up -- they just haven't done it yet.
For clarity, @ifried, is Community Tech planning to work on how VE names references at some point in the near-ish future, or has that decision not been made yet?

The other option is convincing someone to do this as a volunteer. I'm not sure that all-volunteer work is the best way to handle this task, as anything VE-related and TemplateData-related can be fairly involved.

@Barkeep49 I won't pretend to understand why a fix would be hard, but for heaven's sake the fact something might be difficult shouldn't be a reason to downgrade its importance. That just tries to hide the biggest problems by calling them minor.

Difficulty to fix should and does affect priority; teams have to figure out the reward per unit effort and prioritise accordingly. There's whole management frameworks based on this concept (e.g. impact-effort matrix).

It's like dropping your keys in the street but looking for them on the porch because the porch light makes it easer to search there. We should be prioritizing by actual priority as assessed by the people who are using the tool.

That's exactly what teams do, and why users are able to file Phabricator tasks in the first place. The teams then have to balance user requests with other strategic and infrastructural work.

I use this tool and honestly this is for me the single biggest frustration I have with Vis Ed.

You're not alone in that. There are also thousands of visual editor users out there that do not share your frustration, and thousands of other tasks the team have to work on too, many of which have far larger impact than this.

And honestly whoever decided it would be okay to make Vis Ed work like this in the first place must not actually edit. No one who had written an article from scratch would ever have thought this was by ANY measure a reasonable decision.

James Forrester and I both have over 20,000 edits each, so no, not really. How about we focus on the substance of the task, and not on the characterstics of the people involved?

ifried added a comment.EditedAug 5 2020, 4:09 PM

Hey, @AntiCompositeNumber, thanks for pinging me! Yes, the Community Tech team did receive a wish from the 2019 wishlist to allow named references in VE. However, we haven't conducted an analysis to make a decision yet. We are currently working on two other projects (watchlist expiry and the ebook export improvement project), which are our main focuses right now. When we do conduct an analysis, we'll share our findings with the community. Thanks!

My apologies; I didn't mean to make this personal and shouldn't have said that. I'm just finding it so difficult to understand how anyone who adds sources to articles and uses Visual Editor to do it wouldn't be ridiculously frustrated by Vis Ed's manner of naming sources and consider this more than 'low' priority. I started editing nearly fifteen years ago. I am very comfortable with source editing, and I think I'm probably pretty unusual in that as someone who edited in source for over a decade, I now use Vis Ed for probably 99% of my editing and only switch to source when necessary. When I first encountered these refnames I didn't realize they were from Vis Ed and thought there was some really prolific editor out there who was naming stuff in ways only they could possibly understand. If you're using Vis Ed to add sources how do you workaround this problem? Are you adding the source in vis ed, then switching to source each time so you don't leave behind a meaningless ref name mess for source editors and switchers to have to deal with, then switching back to Vis Ed until you add the next ref? And if so, don't you find that incredibly frustrating when it would be so much easier if you could just specify what Vis Ed names the reference instead of having to switch to source every time you add a ref for the first time?

@AntiCompositeNumber, and yes, if this is something we need to pay someone to do because it's not interesting for volunteer developers, then of course let's for heaven's sake pay someone to solve a problem that causes high levels of editor frustration.

I'm just finding it so difficult to understand how anyone who adds sources to articles and uses Visual Editor to do it wouldn't be ridiculously frustrated by Vis Ed's manner of naming sources

If you were using the visual editor for all/nearly all of your editing, then you would never see these 'names' anyway, so it's not frustrating at all.

If you're using Vis Ed to add sources how do you workaround this problem? Are you adding the source in vis ed, then switching to source each time so you don't leave behind a meaningless ref name mess for source editors and switchers to have to deal with, then switching back to Vis Ed until you add the next ref? And if so, don't you find that incredibly frustrating when it would be so much easier if you could just specify what Vis Ed names the reference instead of having to switch to source every time you add a ref for the first time?

I mostly don't worry about it, and if I do, then I switch to a wikitext editor and use find-and-replace to change all ":0" to "something sensible", and then move on. You could also generate them in the 2017 wikitext editor in the first place. It has the same toolbar, with the same magical referencing system.

And now I have a request: If you all want to carry on a non-technical conversation about this problem, could we please do that at https://www.mediawiki.org/wiki/VisualEditor/Feedback ? Phabricator isn't a great place to discuss whether a problem should be solved, what year it should happen in, or who should do the work.

It is unacceptable for the visual editor to generate names that (a) frequently cause name conflicts

Can you give an example of this happening? I've never seen the visual editor do this.

It basically doesn't happen when people create new content. It does happen occasionally, e.g., if someone is careless about a wikitext-based page merge or copying between articles. (In the visual editing mode, such ref names are resolved automagically but a little strangely –@Deskana, do you remember the bug about an unexpected <ref name=":12"/>, and then it becomes <ref name=":122"/>, and <ref name=":1222"/>? Pasting a conflicting refname into the visual mode will trigger the addition of the extra 2 at the end. It was probably meant to turn Smith into Smith2.)

Some years back, there was a bot at enwiki that tried to use ref names to 'rescue' refs across articles. The idea was that <ref name="pmid112233" /> was going to be the same across all articles. He eventually had to stop doing that for short ref names, because not only does <ref name=":0" /> not always refer to the same source across all articles, but <ref name="Lee"/> and <ref name="WHO" /> don't, either. Before he realized what was happening, there were a few messes created.

Pbsouthwood added a comment.EditedAug 6 2020, 10:48 AM

The generation of meaningless, easily confused ref names happens, it is a frustration to editors trying to maintain the integrity of sourcing, therefore it should be fixed. By all means pay someone to do it if the volunteers don't want to do it for whatever reason. This is one of the fundamental reasons for the WMF's existence, and the annual fundraising.

Just to get a sense of perspective here, how many of those 20 000 edits were fixing problems caused by VE?

Cheers, Peter

From: Deskana [mailto:no-reply@phabricator.wikimedia.org]
Sent: Tuesday, August 4, 2020 22:03
To: Phabricator
Cc: peter.southwood@telkomsa.net
Subject: [Maniphest] [Commented On] T92432: Come up with a better way to auto-label references

Deskana added a comment. https://phabricator.wikimedia.org/T92432 View Task

In https://phabricator.wikimedia.org/T92432#6360105 T92432#6360105, https://phabricator.wikimedia.org/p/Valereee/ @Valereee wrote:

https://phabricator.wikimedia.org/p/Barkeep49/ @Barkeep49 I won't pretend to understand why a fix would be hard, but for heaven's sake the fact something might be difficult shouldn't be a reason to downgrade its importance. That just tries to hide the biggest problems by calling them minor.

Difficulty to fix should and does affect priority; teams have to figure out the reward per unit effort and prioritise accordingly. There's whole management frameworks based on this concept (e.g. impact-effort matrix).

It's like dropping your keys in the street but looking for them on the porch because the porch light makes it easer to search there. We should be prioritizing by actual priority as assessed by the people who are using the tool.

That's exactly what teams do, and why users are able to file Phabricator tasks in the first place. The teams then have to balance user requests with other strategic and infrastructural work.

I use this tool and honestly this is for me the single biggest frustration I have with Vis Ed.

You're not alone in that. There are also thousands of visual editor users out there that do not share your frustration, and thousands of other tasks the team have to work on too, many of which have far larger impact than this.

And honestly whoever decided it would be okay to make Vis Ed work like this in the first place must not actually edit. No one who had written an article from scratch would ever have thought this was by ANY measure a reasonable decision.

James Forrester and I both have over 20,000 edits each, so no, not really. How about we focus on the substance of the task, and not on the characterstics of the people involved?

TASK DETAIL

https://phabricator.wikimedia.org/T92432

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Deskana
Cc: ifried, Deskana, Pbsouthwood, JenOttawa, Anomalocaris, Pcoombe, Valereee, Sophivorus, AntiCompositeNumber, Barkeep49, Secundus_Zephyrus, Cirdan, Tgr, Mathglot, matmarex, Izno, PamD, Checkingfax, Liuxinyu970226, TheDJ, Boghog, Dvorapa, AlexMonk-WMF, Protonk, Thryduulf, Vojtech.dostal, Anomie, Ltrlg, rmoen, Whatamidoing-WMF, Krenair, Mvolz, TrevorParscal, Jdforrester-WMF, Aklapper, keithbrianpadilla, Saimongoltinio, WikimeSteve, ppelberg, marcella, Revansx, OhKayeSierra, takidelfin, Necroarcano, Robinma, Tramullas, merbst, Wess, Srdjan, Jrf, Husun1297, jeblad, jayvdb, Swainr, fbstj, Jackmcbarn

And now I have a request: If you all want to carry on a non-technical conversation about this problem, could we please do that at https://www.mediawiki.org/wiki/VisualEditor/Feedback ? Phabricator isn't a great place to discuss whether a problem should be solved, what year it should happen in, or who should do the work.

Sure, but that directs you to a page that directs you to a page that directs you back here, just FYI. :)