Page MenuHomePhabricator

Write wikibase statements data to search index in MediaInfo
Closed, ResolvedPublic

Description

First step for finding files based on 'depicts' statements is writing the statements to the search index

Wikibase can already write statements into the search index

Properties for which statements are written are whitelisted in $wgWBRepoSettings['searchIndexProperties']

ATM the statements data in the search index is only used to de-prioritise 'instance of wikipedia disambiguation page' (P31=Q4167410) in the search results during the rescore phase of an autocomplete search query on wikidata

Related Objects

Event Timeline

Cparle triaged this task as Normal priority.Apr 16 2018, 2:14 PM
Cparle created this task.
Cparle edited projects, added Multimedia-Team-Working-Board; removed Epic.
Cparle moved this task from To Do to Doing on the Multimedia-Team-Working-Board board.

Change 426944 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] Write wikibase statements into search index

https://gerrit.wikimedia.org/r/426944

Cparle renamed this task from Write 'depicts' data to elasticsearch index to Write wikibase statements data to search index in MediaInfo.Apr 16 2018, 3:16 PM
Cparle added a comment.EditedApr 16 2018, 3:33 PM

How to test this:

  1. Load wikibase data into whatever wiki you're using (use the WikibaseImport extension if you need it)
  2. $wgWBRepoSettings['searchIndexProperties'] is used to whitelist the ids of the properties you want to index statements for, so for example if you want to index statements for wikibase properties P999 and P888 you set (e.g. in LocalSettings.php)
$wgWBRepoSettings['searchIndexProperties'] = [ 'P999', 'P888' ];
  1. Add one or more statements concerning your chosen property to a MediaInfo item (e.g. if 'depicts' is in your property whitelist add depicts=(something) as a statement)
  2. You can check what has been indexed by tacking ?action=cirrusDump onto the url for the MediaInfo item or its corresponding File page. You should be able to see a statement_keywords array with an element for each statement you added in the format PXXX=QYYY where PXXX is the id of your property and QYYY is the id of your item

Note that property ids are not preserved by WikibaseImport, and the property id for 'depicts' will not be the same locally as it is on wikidata

Cparle added a comment.EditedApr 16 2018, 3:37 PM

Note that if you're using a federated wikibase instance for wikibase properties, the name you give the foreign wikibase repo is prepended to the property id (and the same for items), and so you must account for this when whitelisting properties. Here's how I have it set up locally in LocalSettings.php (note that P737 is the id of the 'depicts' property in my federated wikibase)

//Federation
unset($wgWBRepoSettings['entityNamespaces']['item']);
unset($wgWBRepoSettings['entityNamespaces']['property']);
$wgWBRepoSettings['foreignRepositories']['federatedWikibase'] = [ 'repoDatabase' => 'wiki', 'baseUri' => 'http://127.0.0.1:8080/wiki/Special:EntityData/', 'supportedEntityTypes' => [ 'item', 'property' ], 'prefixMapping' => [], 'entityNamespaces' => [ 'item' => WB_NS_ITEM, 'property' => WB_NS_PROPERTY ] ];
$wgWBRepoSettings['searchIndexProperties'] = [ 'federatedWikibase:P737' ];

And here's statement_keywords data from the search index for a MediaInfo item

"statement_keywords":["federatedWikibase:P737=federatedWikibase:Q351"]
Cparle updated the task description. (Show Details)Apr 16 2018, 3:41 PM
Cparle updated the task description. (Show Details)
Ramsey-WMF moved this task from Untriaged to Triaged on the Multimedia board.Apr 18 2018, 6:59 PM
Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Apr 23 2018, 7:18 AM

Change 426944 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Write wikibase statements into search index

https://gerrit.wikimedia.org/r/426944

Note: we'll QA this more when it moves to another server environment (Beta probably)

How to test this:

  • upload a file, add a 'depicts' statement, and take note of the P-id of 'depicts' and the Q-id of the statement you added
  • wait a little bit
  • add ?action=cirrusDump to the url for the file page and make sure that P-id=Q-id is in the 'statement_query' array

... then do the same when editing a file

Cparle closed this task as Resolved.Feb 12 2019, 4:49 PM

This has been done for ages, and is working on production wikidata