In discussing T293518: Figure out how to deal with duplicate toolinfo records on IRC I was struck with what now seems like an obvious thought for trying to prevent the creation of duplicate records in the first place. The Toolhub UI could search for likely duplicate records that already exist and prompt for confirmation before creating a potential duplicate record.
Things we could compare as duplicate checks:
- name (after stripping toolforge- prefixes)
- url (after normalizing https://tools.wmflabs.org/<toolname>/ -> https://<toolname>.toolforge.org/)
- author
Name and author checks could use some form of string similarity comparison as well for fuzzy matching.