Page MenuHomePhabricator

Lexical Category lookup - prioritize Items for lexical categories in the Item lookup
Closed, ResolvedPublic13 Estimated Story Points

Description

As a Lexeme creator I want to have Items for lexical categories prioritized in the entity lookup for lexical categories in order to more easily find the correct Item for the Lexical category I am looking for.

Problem:
We want editors to be able to specify the lexical category of the Lexeme they are creating. To make this easier we want to prioritize the Items for lexical categories.

Screenshots/mockups:
https://www.figma.com/file/XoYktoKEBnC0VIf1SRuxJK/Lexicographical-Data-UI?node-id=493%3A3077

BDD
GIVEN a Lexeme lexical category input on Special:New Lexeme
WHEN typing in the lookup
THEN Items representing lexical categories are boosted in the results

Acceptance criteria:

  • Items representing lexical categories are boosted in the lexical category input
  • Items not representing lexical categories can still be added
  • configuration variables to set boosted lexicat are documented

Notes:

  • This is the same as for language lookup but with lexical category instead.
  • When the user queries, they see the Items that are Lexical categories first and then all the other Items in the Lookup component.
  • Create a config list of top ones ( https://w.wiki/56qX (might be the best thing here )

Lydia is leaning towards using these (via a config?):

Relates to T307441

Event Timeline

Task Breakdown notes:

  • The results of the pre-configured list of lexical categories always appears before the results from the API
  • Ensure that the boosting remains effective for different interface languages
  • Are lexical categories filtered against the search results in the same way that the spelling variants are filtered (i.e. there's a word starting with the search term, in the name)? @Lydia_Pintscher

Potential plan of action:

  1. Create a config for item IDs
  2. In the server side we can get their labels
  3. Get item IDs and labels to the client side (RL or otherwise)
  4. Once the list is in the client we can filter out this list against the results, prepend the results list with the boosted items and de-dupe the list.
  5. Add configuration for Beta Wikidata so that the task can be verified.
  • Are lexical categories filtered against the search results in the same way that the spelling variants are filtered (i.e. there's a word starting with the search term, in the name)? @Lydia_Pintscher

I'd say it should behave the same way with the lexical categories prioritized as without. So that'd be matching on the start of words.

Change 789612 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexeme@master] Add lexical category suggestions to Special:NewLexemeAlpha

https://gerrit.wikimedia.org/r/789612

  • Are lexical categories filtered against the search results in the same way that the spelling variants are filtered (i.e. there's a word starting with the search term, in the name)? @Lydia_Pintscher

I'd say it should behave the same way with the lexical categories prioritized as without. So that'd be matching on the start of words.

Hm, but the search results might include one of the suggested/prioritized lexical categories even when we didn’t match it without the search API (e.g. it matches an alias while we only check labels, or it matches in another language). What happens in that case?

For example: suppose I search for “supin”, in English. The wbsearchentities result (currently) starts with two items: supine, “a noun-like verb form used in various languages” (i.e. actually a grammatical feature rather than a lexical category, but ignore that for the moment), and verb, which has the alias “supin” in Romanian. Should we show “verb” above “supine”, because the former is one of the suggested lexical categories while the latter isn’t, even though without the API we wouldn’t have any connection between the search term “supin” and the item “verb”?

(Actually… what’s the order of the suggested lexical categories anyways, i.e. in what order do we show the matching items after filtering? The same as the order in the full list from the config?)

Hm, but the search results might include one of the suggested/prioritized lexical categories even when we didn’t match it without the search API (e.g. it matches an alias while we only check labels, or it matches in another language). What happens in that case?

We discussed this in a call – as long as the items are included in the results somewhere, the exact position isn’t important. We’ll see what is easiest in the code.

Change 790372 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexeme@master] Show suggested lexical category item IDs

https://gerrit.wikimedia.org/r/790372

Change 790398 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[operations/mediawiki-config@master] Configure wgLexemeLexicalCategoryItemIds on Beta Wikidata

https://gerrit.wikimedia.org/r/790398

Change 789612 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Add lexical category suggestions to Special:NewLexemeAlpha

https://gerrit.wikimedia.org/r/789612

Change 790372 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Show suggested lexical category item IDs

https://gerrit.wikimedia.org/r/790372

Change 790398 merged by jenkins-bot:

[operations/mediawiki-config@master] Configure wgLexemeLexicalCategoryItemIds on Beta Wikidata

https://gerrit.wikimedia.org/r/790398

Moving back to doing because I noticed we’re including the suggested items in the search offset, meaning some search results get skipped.

Moving back to doing because I noticed we’re including the suggested items in the search offset, meaning some search results get skipped.

This was fixed as part of the work on T308118 (specifically #184), the task should be good to review again.