Page MenuHomePhabricator

Use aliases in search
Closed, ResolvedPublic

Description

Aliases should help with searching for objects

  • Change the labels database table to be able to store aliases (non-primary-labels) as well as primary ones.
  • Change the SecondaryDataUpdater code to write/update the aliases labels as well as the primary labels.
  • Change the label search code (and API) to return both primary and alias values, ordering primary labels ahead of aliases.

Related Objects

StatusSubtypeAssignedTask
ResolvedJdforrester-WMF
Resolvedgengh

Event Timeline

DVrandecic created this task.
DVrandecic removed a project: Epic.

Change 713484 had a related patch set uploaded (by Genoveva Galarza; author: Genoveva Galarza):

[mediawiki/extensions/WikiLambda@master] [WIP] Add alias to labels table

https://gerrit.wikimedia.org/r/713484

Some doubts about alias/label uniqueness arose as part of Jame's comments on my patch, and I'd like to bring the conversation here so that we can discuss it more comfortably.

The comment is: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/713484/comment/0626bfe7_524c1a2b/

Currently the table is indexed by ["wlzl_type", "wlzl_language", "wlzl_label"] but this index does not impose a uniqueness constraint (for that, I believe we would need to declare the JSON schema with unique=true)

There are uniqueness constraints on the PHP code though, that make sure that primary labels are unique (there cannot be two identical labels for the same type and the same language). However I'm not sure about what uniqueness constraints we want to impose over aliases. Here's a summary with examples:

Label uniqueness:

Labels are quite restrictive because they are the identifying string. They must be unique by type, language and label, which means:

  • [wlzl_type, wlzl_language, wlzl_label] must be unique when wlzl_label_primary=true

Alias uniqueness:

Alias are not so restrictive, as they do not identify ZObjects but enrich them. That's why I believe aliases don't need to be unique, but let's think case by case.

  • Can we have two identical aliases for different types?
    • [type=Z6, language=EN, label="javascript", is_primary=false]
    • [type=Z61, language=EN, label="javascript", is_primary=false]
  • Can we have two identical aliases for different languages?
    • [type=Z61, language=EN, label="javascript", is_primary=false]
    • [type=Z61, language=FR, label="javascript", is_primary=false]

Alias and Label clashes:

I'm very confused about this!

  • Can we have an alias that is identical to its primary label? I understand it makes no sense as it adds no information, but does that mean we should forbid it?
    • [zid=100, type=Z61, language=EN, label="javascript", is_primary=false]
    • [zid=100, type=Z61, language=EN, label="javascript", is_primary=true]
  • Can we have an alias be identical to other primary labels in the same language? I think it does make sense in here and we should allow it:
    • [zid=100, type=Z61, language=EN, label="JavaScript", is_primary=true]
    • [zid=200, type=Z61, language=EN, label="JavaScript (ES6)", is_primary=true]
    • [zid=200, type=Z61, language=EN, label="JavaScript", is_primary=true]

Some doubts about alias/label uniqueness arose as part of Jame's comments on my patch, and I'd like to bring the conversation here so that we can discuss it more comfortably.

The comment is: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/713484/comment/0626bfe7_524c1a2b/

Currently the table is indexed by ["wlzl_type", "wlzl_language", "wlzl_label"] but this index does not impose a uniqueness constraint (for that, I believe we would need to declare the JSON schema with unique=true)

There are uniqueness constraints on the PHP code though, that make sure that primary labels are unique (there cannot be two identical labels for the same type and the same language). However I'm not sure about what uniqueness constraints we want to impose over aliases. Here's a summary with examples:

Label uniqueness:

Labels are quite restrictive because they are the identifying string. They must be unique by type, language and label, which means:

  • [wlzl_type, wlzl_language, wlzl_label] must be unique when wlzl_label_primary=true

Alias uniqueness:

Alias are not so restrictive, as they do not identify ZObjects but enrich them. That's why I believe aliases don't need to be unique, but let's think case by case.

  • Can we have two identical aliases for different types?
    • [type=Z6, language=EN, label="javascript", is_primary=false]
    • [type=Z61, language=EN, label="javascript", is_primary=false]

Yes.

  • Can we have two identical aliases for different languages?
    • [type=Z61, language=EN, label="javascript", is_primary=false]
    • [type=Z61, language=FR, label="javascript", is_primary=false]

Yes.

Alias and Label clashes:

I'm very confused about this!

  • Can we have an alias that is identical to its primary label? I understand it makes no sense as it adds no information, but does that mean we should forbid it?
    • [zid=100, type=Z61, language=EN, label="javascript", is_primary=false]
    • [zid=100, type=Z61, language=EN, label="javascript", is_primary=true]

No, I think we want to prevent this.

  • Can we have an alias be identical to other primary labels in the same language? I think it does make sense in here and we should allow it:
    • [zid=100, type=Z61, language=EN, label="JavaScript", is_primary=true]
    • [zid=200, type=Z61, language=EN, label="JavaScript (ES6)", is_primary=true]
    • [zid=200, type=Z61, language=EN, label="JavaScript", is_primary=true]

No. How would the user know which to select if they're both going to be shown as "JavaScript" and of the same type?

DVrandecic raised the priority of this task from Low to High.Aug 25 2021, 4:17 PM

Change 713484 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Add aliases to the labels table and return in label search requests

https://gerrit.wikimedia.org/r/713484