Page MenuHomePhabricator

in suggester, labels are redundantly and incorrectly listed as aliases
Closed, ResolvedPublic1 Estimated Story Points

Description

I just updated Wikibase (including everything in the build, e.g. quality and property suggester). Now when entering entity id values, labels are redundantly and incorrectly listed as aliases ("also known as...") in the suggester.

idk if it is somehow just me, but suspect it is a bug

suggestions.png (647×994 px, 49 KB)
.

Event Timeline

aude raised the priority of this task from to Needs Triage.
aude updated the task description. (Show Details)
aude subscribed.

example api results

{
"searchinfo": {
"search": "c"
},
"search": [
{
"id": "Q14",
"url": "http://wikidatawiki/wiki/Q14",
"label": "Cairo",
"match": {
"type": "label",
"language": "en",
"text": "Cairo"
},
"aliases": [
"Cairo"
]
},
{
"id": "Q155",
"url": "http://wikidatawiki/wiki/Q155",
"label": "Calgary",
"match": {
"type": "label",
"language": "en",
"text": "Calgary"
},
"aliases": [
"Calgary"
]
},

problem occurs with all suggestions (properties and values)

Change 221811 had a related patch set uploaded (by Aude):
Add searchentities alias param only for alias search results

https://gerrit.wikimedia.org/r/221811

i still see further issues, but my patch fixes the obvious issue.

On test.wikidata, if I search "kitt", it shows "Also known as: kitties, kitty" (e.g. two aliases) my patch doesn't fix that.

kittens.png (661×1 px, 64 KB)

Change 221814 had a related patch set uploaded (by Hoo man):
Add searchentities alias param only for alias search results

https://gerrit.wikimedia.org/r/221814

Change 221811 merged by jenkins-bot:
Add searchentities alias param only for alias search results

https://gerrit.wikimedia.org/r/221811

Change 221814 merged by jenkins-bot:
Add searchentities alias param only for alias search results

https://gerrit.wikimedia.org/r/221814

This issue is fallout of T90692. The suggester uses the contents of "aliases" to indicate why a term was matched (if the label wasn't the match, it would have to be the alias). So when changing the search infrastructure that backs wbsearchentities, we simply added the matched label as the "alias" for backwards compatibility. This will indeed often be redundant.

The patch above (Ia5eaa1ba74a4f6f) will exclude matched labels from "aliases" to avoid the redundancy. This however only works if the search language and the display language are the same - if they are not, we *need* the matched label somewhere, otherwise it's not possible to see why a given item was found in the search.

I instead suggest to provide the matched term in "aliases" if it's different from the display label. If not, the "aliases" key should not be present in the result at all (instead of containing an empty array), to avoid the issues mentioned in the description.

Change 221876 had a related patch set uploaded (by Addshore):
SearchEntities return 'aliases' when not same as label

https://gerrit.wikimedia.org/r/221876

Change 221876 merged by Addshore:
SearchEntities return 'aliases' when not same as label

https://gerrit.wikimedia.org/r/221876

Addshore claimed this task.
Addshore subscribed.

I believe this can now be closed as resolved.
Other issues / tasks have been opened (See the description and comments)

Change 222268 had a related patch set uploaded (by Daniel Kinzler):
SearchEntities return 'aliases' when not same as label

https://gerrit.wikimedia.org/r/222268

Change 222268 merged by jenkins-bot:
SearchEntities return 'aliases' when not same as label

https://gerrit.wikimedia.org/r/222268