Page MenuHomePhabricator

Phabricator should suggest possible duplicates when creating a new task
Open, LowPublic

"Love" token, awarded by kostajh."100" token, awarded by Tbayer."Like" token, awarded by David_Hedlund."Like" token, awarded by Liuxinyu970226."Like" token, awarded by RandomDSdevel."Like" token, awarded by He7d3r."Like" token, awarded by Bene."Baby Tequila" token, awarded by Ragesoss."Like" token, awarded by Kozuch."Like" token, awarded by Gryllida."Mountain of Wealth" token, awarded by Nemo_bis.
Assigned To
Authored By
Apr 11 2014, 5:18 AM


When you create a new bug report in Bugzilla, you get a list of possible duplicates. I find it useful. It might be even more useful for new users filing what is going to be probably an obvious duplicate.

Upstream ticket:



Related Objects

Event Timeline

flimport raised the priority of this task from to Lowest.Sep 12 2014, 1:23 AM
flimport set Reference to fl74.

aklapper wrote on 2014-04-18 09:46:05 (UTC)

Upstreamed as

qgil wrote on 2014-04-22 06:40:26 (UTC)

Your feedback upstream is welcome:

"There's some discussion of this in T397. In short:

We had a feature like this at Facebook.
But, it seemed to be nearly useless.
It has seemed useless to me in every other system that I've filed a bug and been given duplicates, too.
For example, T397 is a duplicate here, but uses different words for everything -- "consider" instead of "suggest/propose", "dedupe" instead of "duplicates", "bugs" instead of "task": basic title-based search would not have found it. My experience was that this was the norm.

Do you have any data about how effective this feature really is for your install?

I'm not totally opposed to pursuing this, but I worry that it doesn't actually work, or works so poorly that it's worse on the balance. I'd like to see data (or, at least, hear experiences) showing that it works well.

Particularly, the cost of a false positive (a user incorrectly believes they have found a duplicate) is higher than the cost of a false negative (a user incorrectly believes their bug is unique) since mechanically merging issues is easy but mechanically separating them is difficult. Generally, our strategy for dealing with this is currently "make merging cheap and easy"."

aklapper wrote on 2014-04-25 08:42:58 (UTC)

I think the usefulness of proposed duplicates really depends on the search algorithm used in the backend, and I don't know enough neither about Bugzilla's nor Phabricator's and I cannot judge if Bugzilla's duplicate search is good or bad as I miss comparisons.

Slightly offtopic: In my perfect world which is 10 years away, I'd expect duplicate proposals to prefer existing tickets filed under the same project over tickets in other projects, stemming entered words (duplicat*), covering spelling differences in English variants ("colour" vs "color") and potentially using an English language thesaurus ("delete" and "remove"). This touches Natural Language Processing (NLP) and while there are tons of research papers on this topic, I still have to see a real implementation in any bugtracker. Maybe I'm just not aware of an implementation example.

scfc wrote on 2014-05-07 08:11:36 (UTC)

I find it hard to imagine how to gather meaningful statistics on the use in our Bugzilla given our rather small audience in general :-).

I wouldn't subscribe to that false positives are a hindrance here; users are supposed to search for duplicates anyhow, so they might assume a false duplicate without the convenience of an auto-search as well.

aklapper wrote on 2014-05-23 07:13:05 (UTC)

I propose to not make this block Day 1 in production. It might be nice to have but we will also survive without: We might receive a higher amount of duplicates that need to get triaged compared to Wikimedia Bugzilla nowadays.
We have no statistics to base this assumption on, and neither how many people look at duplicate proposals, or how many people consider these proposals useful. Anecdotal too: I am aware of some users who do try to find a ticket first about the issue they want to report (in order to avoid creating a duplicate) but they are sometimes unsuccessful to find the entry in Bugzilla.

qgil wrote on 2014-05-24 17:35:47 (UTC)

I propose to not make this block Day 1


Aklapper removed Aklapper as the assignee of this task.Oct 10 2014, 7:13 PM
Aklapper updated the task description. (Show Details)
Aklapper set Security to None.

Screenshot of how Quora handles this.

Screenshot of how Bugzilla handles this.

Elitre added a subscriber: Elitre.Nov 14 2014, 12:08 PM

Imagine a world where James Forrester spends his day just marking my tasks here as duplicate.

He7d3r added a subscriber: He7d3r.Nov 24 2014, 2:25 AM
Bene awarded a token.May 30 2015, 4:52 PM
Restricted Application removed a subscriber: Mjbmr. · View Herald TranscriptApr 13 2016, 12:11 AM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptMay 23 2016, 6:07 PM

This project is selected for the Developer-Wishlist voting round and will be added to a MediaWiki page very soon. To the subscribers, or proposer of this task: please help modify the task description: add a brief summary (10-12 lines) of the problem that this proposal raises, topics discussed in the comments, and a proposed solution (if there is any yet). Remember to add a header with a title "Description," to your content. Please do so before February 5th, 12:00 pm UTC.

Last update in from Apr 11 2015:

If we did first and got a generic ApplicationSearch endpoint out of it, I'd be open to writing this as an extension CustomField and then disavowing all knowledge of it. The results UI wouldn't be custom, but maybe that's fine. We might need to pay down some infrastructure debt to let installs put this immediately underneath the "Title" field, I think a couple of the fields are still hard-coded.

Tgr added subscribers: greg, Tgr.Feb 28 2017, 1:52 AM

This suggestion got the most votes in the Developer Wishlist (with a quite confident lead). @Qgil @greg is either of you interested in making this an official WMF goal?

I'm already working on phabricator search stuff so this is not that far out of scope...

Qgil added a comment.Mar 1 2017, 4:46 PM

@mmodell @greg I am assuming that you have this ball in your court. If you think that sponsoring the development of this feature might help, please let me know. Depending on the cost, we might be able to cover it during this fiscal year (before the end of June).

(I still would like to see the completion of T136213 before jumping on new funded tasks, though).

greg added a comment.Mar 1 2017, 5:42 PM

Thinking through the ways of addressing this. Will post more when we're ready to commit :)

Qgil raised the priority of this task from Lowest to Low.Apr 20 2017, 8:12 AM
Qgil added a comment.Sep 12 2017, 8:32 AM

In relation to T158149: Find an owner for top 10 Developer Wishlist 2017 proposals, I dare to ask: what is the current status? :)

greg added a comment.Sep 12 2017, 5:37 PM

Open, Low :P

But seriously, not on RelEng's radar (and our Q2 goals are already fleshed out and too many ;) ). Looks like not on upstream's current plans either (based on lack of updates there).

Cirdan added a subscriber: Cirdan.Jul 2 2018, 6:59 AM
Tbayer added a subscriber: Tbayer.
kostajh added a subscriber: kostajh.