Page MenuHomePhabricator

Phabricator should suggest possible duplicates when creating a new task
Open, LowPublic

Assigned To
None
Authored By
Qgil
Apr 11 2014, 5:18 AM
Referenced Files
F560: Screenshot_2014-10-29_13.09.43.png
Oct 29 2014, 8:11 PM
F562: Screenshot_2014-10-29_13.10.48.png
Oct 29 2014, 8:11 PM
Tokens
"Love" token, awarded by Sj."Baby Tequila" token, awarded by mmodell."Like" token, awarded by Jdlrobson."Like" token, awarded by Ladsgroup."Love" token, awarded by kostajh."100" token, awarded by Tbayer."Like" token, awarded by David_Hedlund."Like" token, awarded by Liuxinyu970226."Like" token, awarded by RandomDSdevel."Like" token, awarded by He7d3r."Like" token, awarded by Bene."Baby Tequila" token, awarded by Ragesoss."Like" token, awarded by Kozuch."Like" token, awarded by Gryllida."Mountain of Wealth" token, awarded by Nemo_bis.

Description

When we created a new bug report in Bugzilla, we got a list of possible duplicates. I find it useful. It might be even more useful for new users filing what is going to be probably an obvious duplicate.

Upstream ticket: https://secure.phabricator.com/T4828

Context:

It seems 6.57%~ of tickets are marked as duplicates in Wikimedia Phabricator (see comment from aklapper).

What is needed:

A "language pattern" or something to reduce that 6.57 percent, without slowing down 93.43 percent of the people.

Details

Reference
fl74

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Aklapper updated the task description. (Show Details)
Aklapper set Security to None.

Screenshot_2014-10-29_13.09.43.png (880×1 px, 158 KB)

Screenshot of how Quora handles this.

Screenshot_2014-10-29_13.10.48.png (652×2 px, 229 KB)

Screenshot of how Bugzilla handles this.

Imagine a world where James Forrester spends his day just marking my tasks here as duplicate.

Restricted Application removed a subscriber: Mjbmr. · View Herald TranscriptApr 13 2016, 12:11 AM

This project is selected for the Developer-Wishlist voting round and will be added to a MediaWiki page very soon. To the subscribers, or proposer of this task: please help modify the task description: add a brief summary (10-12 lines) of the problem that this proposal raises, topics discussed in the comments, and a proposed solution (if there is any yet). Remember to add a header with a title "Description," to your content. Please do so before February 5th, 12:00 pm UTC.

Last update in https://secure.phabricator.com/T4828#106500 from Apr 11 2015:

If we did https://secure.phabricator.com/T7805 first and got a generic ApplicationSearch endpoint out of it, I'd be open to writing this as an extension CustomField and then disavowing all knowledge of it. The results UI wouldn't be custom, but maybe that's fine. We might need to pay down some infrastructure debt to let installs put this immediately underneath the "Title" field, I think a couple of the fields are still hard-coded.

This suggestion got the most votes in the Developer Wishlist (with a quite confident lead). @Qgil @greg is either of you interested in making this an official WMF goal?

I'm already working on phabricator search stuff so this is not that far out of scope...

@mmodell @greg I am assuming that you have this ball in your court. If you think that sponsoring the development of this feature might help, please let me know. Depending on the cost, we might be able to cover it during this fiscal year (before the end of June).

(I still would like to see the completion of T136213 before jumping on new funded tasks, though).

Thinking through the ways of addressing this. Will post more when we're ready to commit :)

Qgil raised the priority of this task from Lowest to Low.Apr 20 2017, 8:12 AM

Open, Low :P

But seriously, not on RelEng's radar (and our Q2 goals are already fleshed out and too many ;) ). Looks like not on upstream's current plans either (based on lack of updates there).

We really need this. I just merged the seventh duplicate task of T259565, and they were all created in a span of few hours, some just minutes apart. It would really be good if Phabricator is to be smarter on this end. (I was about to create duplicate of this task too, before recalling that I saw something like this.)

Jdlrobson subscribed.

Is now a good time to reconsider low? The lack of this does create a lot of work for triaging bugs and managing fragmented conversations - in particular for user facing products

As long as I neither see a good NLP / AI algorithm for the English language (more relevant to me) nor much research that an implementation to suggest potential duplicates significantly lowers the number of created duplicates (less relevant to me) this feels low/lowest priority to me. Maybe it's just my usually disappointing personal (anecdotal) experience in Gitlab and Bugzilla instances with such "proposals" which makes me relucant.
An algorithm might be way more successful if it gave way more weight to recently created (or edited?) tickets, I guess?
In any case, regarding WMF I doubt that there are currently resources to tackle such a huge project. :-/ Feels like upstream territory.

I just merged the seventh duplicate task of T259565

Looking at the task summary I have problems to find language based patterns that would have allowed proposing the "right" existing ticket.
I see the root parse three times, I see the word flow in two ticket summaries, four times mobile, and history in five. Hmm. Some cross-matching but still quite mixed for having 10 items in the pool.

T259565: [Regression] Unparsed wikitext in various JavaScript messages
T259696: Footnote in Flow messages in not parsed
T259602: Last edit indicator is broken on Minerva skin
T259601: History box error on Mobile Web for enwiki
T259584: Link to history broken
T259583: Revision History not accessible on mobile
T259581: Mobile page history "footer" showing raw URL
T259575: [regression -wmf.2] Homepage - SE filter "Create a new article" description displays ulr -encoded text not a link
T259580: "flow-wikitext-editor-help-and-preview" message is broken on flow pages on all wikis
T259571: Page history log bug
T259579: "Last modified" footer on mobile unparsed date and user links

Maybe we could do something as simple as showing a list of all the most recently submitted tasks on the submission page? That might catch some things.

Maybe we could do something as simple as showing a list of all the most recently submitted tasks on the submission page? That might catch some things.

I don't think many people want to get a list of 50 tasks into their face and then spend time reading that list every single time.
It might catch a few things.
It will also condition basically everybody to scream and quickly scroll down.

Looking at the last 10000 tickets created, 4.19% of tickets marked as a duplicate.
Might be biased (too recently created to have been triaged?), so looking at all tickets created since launching Phab, 6.57% of tickets are marked as duplicates.

SELECT t.status,COUNT(t.id) FROM phabricator_maniphest.maniphest_task t WHERE t.id > 249776 GROUP BY t.status;
+-----------+-------------+
| status    | COUNT(t.id) |
+-----------+-------------+
| declined  |         213 |
| duplicate |         419 |
| invalid   |         324 |
| open      |        5158 |
| resolved  |        3800 |
| stalled   |          86 |
+-----------+-------------+

SELECT t.status,COUNT(t.id) FROM phabricator_maniphest.maniphest_task t WHERE t.id > 75682 GROUP BY t.status;
+-----------+-------------+
| status    | COUNT(t.id) |
+-----------+-------------+
| declined  |       12500 |
| duplicate |       12088 |
| invalid   |        9638 |
| open      |       36926 |
| resolved  |      112026 |
| stalled   |         901 |
+-----------+-------------+

Any changes that would make this possible? ^_^

See my previous comments here; has some situation changed, or have new arguments arisen?

Another root problem: it's difficult, if not impossible for newcomers, to get a "big picture" after landing on a bug reporting form. That of course causes some of these duplicates.

I very like the idea of "minimal forms", but, I'd like to reduce this feeling of «Welcome in this form, put your complaint/request here in this box, we'll find duplicates for you».

In the case of forms with at least one Tag, it would make sense if there was a way to visit that Tag. I'm definitely taking it very far, but it's a problem. A side-effect is to help people to be curious and discover other things and help each other "finding many friends on this journey" or stuff like that.