Page MenuHomePhabricator

Suggest possible duplicates at toolinfo creation time via UI
Open, Needs TriagePublicFeature

Description

In discussing T293518: Figure out how to deal with duplicate toolinfo records on IRC I was struck with what now seems like an obvious thought for trying to prevent the creation of duplicate records in the first place. The Toolhub UI could search for likely duplicate records that already exist and prompt for confirmation before creating a potential duplicate record.

Things we could compare as duplicate checks:

  • name (after stripping toolforge- prefixes)
  • url (after normalizing https://tools.wmflabs.org/<toolname>/ -> https://<toolname>.toolforge.org/)
  • author

Name and author checks could use some form of string similarity comparison as well for fuzzy matching.

Event Timeline

bd808 changed the subtype of this task from "Task" to "Feature Request".Dec 7 2022, 10:34 PM

@Raymond_Ndibe thought of some of this months ago. :)

A little thought about solving the duplicate tool problem (or atleast making progress) :
Since the url field is mandatory for all tools and is one direct link from toolhub to the tool being described, one way of solving this (or atleast solving a subset of this problem) is to create a RepoURL model which is one-to-many related to the toolinfo model. Then whenever a duplicate tool (that points to the same tool repo) is added, the RepoURL model instance will have two tools pointing to it. In this situation we can do interesting things like:

  • Displaying an is possible duplicate of flag on the tool.
  • Adding the tools to a duplicate tools list (or some variations).

And if if a "duplicate" tool is not pointing to the same authoritative repository as the other duplicate, are they really duplicates?

( Maybe this is better titled as better discovery for duplicate tools )