Page MenuHomePhabricator

Item search for statements ranks disambiguation items too highly
Closed, ResolvedPublic

Description

I often find that a disambiguation item will be the top result when searching for items while editing statements. Disambiguation items should rarely be used in statements, so a disambiguation item is almost never going to be what someone is looking for.

Examples (with the UI in English):

Someone wants to add "located in the administrative territorial entity: Sannat"
The top search results are:

Someone wants to add "stated in: GNIS"
The top search results are:

Other examples where the first result is a disambiguation page: "Naxxar", "Gudja", "Grade 2"

IIRC, Lydia said that the number of descriptions is taken into account when ranking items in the results. There are bots which add descriptions (and sometimes labels) in lots of languages for disambiguation items, so items with lots of descriptions are disproportionately disambiguation pages (and other internal things like templates and categories).

There could perhaps be a negative weighting for items which have P31 Q4167410, or it could take into account the number of uses by other items (when linking an item to another item, items which are often linked from other items are more likely to be what you're looking for).

Event Timeline

Is there ever a reason to link to a disambiguation? Is that reason good-enough that we even need disambiguation items in the suggester?

I don't think disambiguation items should be removed from the suggester entirely:

We have some statements using "different from" (e.g. https://www.wikidata.org/wiki/Q16479551) to prevent names being merged with disambiguation pages, and also some statements using "said to be the same as" (e.g. https://www.wikidata.org/wiki/Q421231) to link disambiguation pages with related spellings or meanings. Hiding disambiguation items would mean people can only create these statements if they have the ID.

People sometimes incorrectly merge disambiguation items with non-disambiguation items. Hiding disambiguation items would mean real items would vanish from the search results after such merges. The same thing would happen when a sitelink turns into a disambiguation page and a bot comes along and marks the item as a disambiguation page. Things like that do need fixing, but hiding the items makes it harder to find them and makes it more likely that people will create duplicates.

As I understand it, the same search is used for the search in the top right of the page. People sometimes need to be able to find disambiguation items, e.g. if you have a new disambiguation page and want to see if there's already an item you can add it to, or if a sitelink (like in the scenario above) needs moving to a disambiguation item.

Change 384629 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/CirrusSearch@master] Add generic term_boost function for boosting by matching terms

https://gerrit.wikimedia.org/r/384629

Change 384632 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Enable configurable boosting certain statement values.

https://gerrit.wikimedia.org/r/384632

Change 385193 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig (WMDE)):
[mediawiki/extensions/Wikibase@master] Don't inject WikibaseRepo as dependency of StatementBoostScoreBuilder

https://gerrit.wikimedia.org/r/385193

Change 384629 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add generic term_boost function for boosting by matching terms

https://gerrit.wikimedia.org/r/384629

Smalyshev triaged this task as Medium priority.

Change 384632 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Enable configurable boosting certain statement values.

https://gerrit.wikimedia.org/r/384632

Change 385193 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Minor cleanups to StatementBoostScoreBuilder and related

https://gerrit.wikimedia.org/r/385193

Change 386462 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Set links weight back to old value

https://gerrit.wikimedia.org/r/386462

Change 386464 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/mediawiki-config@master] Add negative weight to disambig entities

https://gerrit.wikimedia.org/r/386464

Change 386462 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Set links weight back to old value

https://gerrit.wikimedia.org/r/386462

Change 386464 merged by jenkins-bot:
[operations/mediawiki-config@master] Add negative weight to disambig entities

https://gerrit.wikimedia.org/r/386464

Change 386554 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/mediawiki-config@master] Revert "Revert "Add negative weight to disambig entities""

https://gerrit.wikimedia.org/r/386554

Change 386554 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Revert "Add negative weight to disambig entities""

https://gerrit.wikimedia.org/r/386554

Mentioned in SAL (#wikimedia-operations) [2017-11-01T23:09:42Z] <thcipriani@tin> Synchronized wmf-config/Wikibase-production.php: SWAT: [[gerrit:386554|Revert "Revert "Add negative weight to disambig entities""]] T148411 (duration: 00m 51s)