Page MenuHomePhabricator

Investigation: why do statements on Senses and Forms not show up in searches using haswbstatement
Closed, ResolvedPublic5 Estimated Story Points

Description

Problem:
The search on Wikidata allows searching using the haswbstatements parameter. When doing this the search finds hits on Lexemes but only on the main part of the Lexeme. Statements on Forms and Senses don't seem accessible. It would however be very useful to have for Wikifunctions/Abstract Wikipedia. We need to understand why they are not searchable. Is it because they are not indexed or are they indexed in a way that makes them not show up in the searches we are doing?

Example:

Acceptance criteria:

  • We understand why the statements on Forms and Senses do not show up in searches using haswbstatement.

Open questions:

  • Should the statements on Forms and Senses be included in the Lexeme searches or rather be treated as their own entities here and return the Sense or Form in the result?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We'll have a look at the implementation. Intuition: those are indexed slightly differently and it is probably not too hard to align.

@Lydia_Pintscher how important is it to fix this?

We'll have a look at the implementation. Intuition: those are indexed slightly differently and it is probably not too hard to align.

That'd be great.

@Lydia_Pintscher how important is it to fix this?

The Abstract Wikipedia team would like to use it for accessing "Item for this Sense" statements. If I understood correctly for them it'd be important next quarter but @DVrandecic would need to confirm.

Gehel set the point value for this task to 5.Nov 4 2024, 4:38 PM

Yes, that's correct. This will be crucial to have in Q3/FY24/25 ( Q1 calendar year 2025), in order to be able to find the right Lexemes for a given Item (i.e. to go from Q144 to L1122). The only other option to do so would be through the SPARQL endpoint, which has its own set of issues.

Took a look over everthing. The short answer to the investigation is per-form statements are not currently indexed. We do index some per-form data but not this part. Solution is likely to integrate these statements into the existing statement_keywords field. Essentially serialize the form statements the same as the entity statements and then append them to the list. Not 100% sure on how that fits into the cross-extension implementation yet, but will figure something out.

Change #1094529 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/WikibaseCirrusSearch@master] Allow entity types to decide statement_keywords

https://gerrit.wikimedia.org/r/1094529

I see that sense's can also have statements. Should those be similarly indexed?

I see that sense's can also have statements. Should those be similarly indexed?

Yes, this would address the second example in the task description.

Yeah. The statements on Senses are the ones even more important for Abstract Wikipedia here.

Change #1097422 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Index statements on forms and senses

https://gerrit.wikimedia.org/r/1097422

Change #1094529 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Allow entity types to decide statement_keywords

https://gerrit.wikimedia.org/r/1094529

Change #1097422 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Index statements on forms and senses

https://gerrit.wikimedia.org/r/1097422

Once this is deployed the indexing pipeline will be fixed, but the existing pages all need to be re-indexed before this will be fixed from the user perspective. Once the train rolls forward we will need to run a reindex on the lexeme namespace.

Excellent work! Checking manually now that the train has rolled out, this definitely looks fixed to me (though as you say, there's data updates needed to roll out):

<3