Page MenuHomePhabricator

Implement match for any-language label (haslabel:*)
Closed, ResolvedPublic

Description

From the user request:

It is currently not possible to evaluate all items without labels. Currently, through the use of haslabel like haslabel:en one can see all labels written in specific language.

Use case

As a user, I'd like to only add labels to items that don't have any at all.

As a user, I'd like to see the total number of items that do have labels.

As a user, for tools like Extension:WikibaseMediaInfo, I'd like to filter only items within a category without labels.

It seems like a natural extension of the current functionality.

Event Timeline

EBernhardson moved this task from needs triage to Wikibase Search on the Discovery-Search board.
EBernhardson subscribed.

seems reasonable enough, should be relatively easy to implement.

Matching via individual languages might create pretty big query though... We could match against labels_all, but that would not work for descriptions.

Certainly matching an _all field is the only thing reasonably performant here. We could create a descriptions_all if that's needed.

Change 514415 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseCirrusSearch@master] Implement haslabel:*

https://gerrit.wikimedia.org/r/514415

Change 514415 merged by jenkins-bot:
[mediawiki/extensions/WikibaseCirrusSearch@master] Implement haslabel:*

https://gerrit.wikimedia.org/r/514415

Somehow doesn't seem to work...

The query: https://commons.wikimedia.org/w/index.php?search=haslabel%3A*&title=Special%3ASearch&go=Go&ns0=1&ns6=1&ns12=1&ns14=1&ns100=1&ns106=1&cirrusDumpQuery=yes

seems to be ok but no results. Negative search works though. I wonder what could be the reason?

Interestingly enough, it seems to work on Wikidata but not Commons. I wonder why?

I have no clue why it's not working yet but I think youwe should have indexed the captions to description fields not label fields. Label fields are optimized for exact matches which is not suited for captions.

Right now from what I understand we're indexing them as label fields since they are recorded as labels. Should this change? Data model lists MediaInfo as having both labels and descriptions, though I am not sure what this means.

@Cparle could you shed some light on this?

Smalyshev renamed this task from Consider adding haslabel:all to Implement match for any-language label (haslabel:*).Jun 27 2019, 8:42 PM

Change 519514 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseCirrusSearch@master] Fix haslabel:* to use labels_all.plain field

https://gerrit.wikimedia.org/r/519514

Change 519514 merged by jenkins-bot:
[mediawiki/extensions/WikibaseCirrusSearch@master] Fix haslabel:* to use labels_all.plain field

https://gerrit.wikimedia.org/r/519514

Works on test.wikidata, needs to be verified on main production after T227136: Reindexing search index wikidatawiki for eqiad fails is fixed.