Page MenuHomePhabricator

Implement PropertyLabelResolver using normalized storage
Closed, ResolvedPublic

Description

The current implementation, TermPropertyLabelResolver, uses a TermIndex to get all the property labels in a certain language at once, cache that in memcached, and then look up property IDs by label. We should do something similar in the normalized schema – i. e., still pre-fetch everything (and still cache it, using the same key if possible) instead of actually searching for a single text in the term storage – but without implementing TermIndex (because we don’t want to implement full search support yet).

However, the current interfaces for the new term store would only allow us to get all the term IDs for properties (effectively the entire content of wbt_property_terms), then resolve all those terms to a giant array, and then filter for labels in a certain language afterwards. We can’t afford to transfer all that data between PHP and the database, so instead the wbt_property_terms and wbt_term_in_lang parts will have to exchange JOIN conditions somehow, so that we can load all property labels in a certain language with just one query. (It still won’t be as efficient as in wb_terms, but with caching it should do for a while, and we can investigate improvements later.)

Event Timeline

Change 516465 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@master] Decouple TermPropertyLabelResolver from TermIndex

https://gerrit.wikimedia.org/r/516465

We could change resolveTermIds( array $termIds ): array (from TermIdsResolver) to resolveTermIds( array $termIds, array $types = null, array $languages = null ): array where we restrict the result set to only contain thing of $types and/ or only things in language(s) $languages. I would change resolveGroupedTermIds accordingly.

If we don't want that, we could create new lookup methods for this like resolveAndFilterTermIds or set this option via a cloning setter (like TermIdsResolver::withLanguageFilter()) on TermIdsResolver implementations (which would then apply to all subsequent calls).

@alaa_wmde @Lucas_Werkmeister_WMDE Does that make sense to ^?

Another thing that just came into my mind would be something like resolveTermIds( array $termIds, TermIdsResolverFilter[] $filters ): array and then have LanguageTermIdsResolverFilter and TypeTermIdsResolverFilter… but that's probably harder to integrate with the JOINs.

That makes sense to me – it also matches TermBuffer’s array $termTypes = null, array $languageCodes = null arguments. A separate method doesn’t seem necessary, and a cloning setter seems harder to use to me. And a TermIdsResolverFilter interface doesn’t really work, I think, because you can’t actually inject arbitrary instances of the interface there, the implementation will have to know how to filter for it in the database. (Also, that would open the door to a MatchingPrefixTermIdsResolverFilter, i. e. a kind of search, which we don’t actually want to support as far as I understand.)

Change 516465 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Decouple TermPropertyLabelResolver from TermIndex

https://gerrit.wikimedia.org/r/516465

That makes sense to me – it also matches TermBuffer’s array $termTypes = null, array $languageCodes = null arguments. A separate method doesn’t seem necessary, and a cloning setter seems harder to use to me. And a TermIdsResolverFilter interface doesn’t really work, I think, because you can’t actually inject arbitrary instances of the interface there, the implementation will have to know how to filter for it in the database. (Also, that would open the door to a MatchingPrefixTermIdsResolverFilter, i. e. a kind of search, which we don’t actually want to support as far as I understand.)

Make sense to me too!

Change 516661 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@master] TermIdsResolver: Allow filtering terms by type and lang

https://gerrit.wikimedia.org/r/516661

I just realized that limiting the type and language in TermIdsResolver wont do the trick, as we then still need to get all wbpt_term_in_lang_ids first.

I will thus need to join wbt_property_terms with wbt_term_in_lang and wbt_text_in_lang when getting the term ids to resolve :/

Yeah, that’s what I meant with this part of the task description:

However, the current interfaces for the new term store would only allow us to get all the term IDs for properties (effectively the entire content of wbt_property_terms), then resolve all those terms to a giant array, and then filter for labels in a certain language afterwards. We can’t afford to transfer all that data between PHP and the database, so instead the wbt_property_terms and wbt_term_in_lang parts will have to exchange JOIN conditions somehow, so that we can load all property labels in a certain language with just one query. (It still won’t be as efficient as in wb_terms, but with caching it should do for a while, and we can investigate improvements later.)

I was thinking of something like ActorMigration::getJoin() or CommentStore::getJoin(), so you can still keep the components somewhat separate.

Rough sketch with Alaa – here, DatabaseTermIdsResolver gets the extra join conditions, and DatabasePropertyTermStore would call that special method if it detects its injected TermIdsResolver is a DatabaseTermIdsResolver.

	// in DatabaseTermIdsResolver

	public function resolveGroupedTermConditions( array $groupedWhatevers ) {
		foreach ( $groupedWhatevers as $groupedWhatever ) {
			list ( $tables, $conditions ) = $groupedWhatever;

			$result = $db->select(
				array_merge( 
					[ 'wbt_term_in_lang', 'wbt_text_in_lang', 'wbt_text' ],
					$tables
				),
				[ 'wbtl_id', 'wbtl_type_id', 'wbxl_language', 'wbx_text' ],
				array_merge( [
					'wbtl_text_in_lang_id=wbxl_id',
					'wbxl_text_id=wbx_id',
				], $conditions ),
				__METHOD__
			);
			$this->preloadTypes( $result );
			foreach ( $result as $row ) {
				foreach ( $groupNamesByTermIds[$row->wbtl_id] as $groupName ) {
					$this->addResultTerms( $groupedTerms[$groupName], $row );
				}
			}
			return $groupedTerms;
		}
	}

	// in DatabasePropertyTermStore

	$this->resolver->resolveGroupedTermConditions( [
		'P1' => [
			// extra tables
			[ 'wbt_property_terms' ],
			// extra conditions
			[
				'wbpt_term_in_lang_id=wbtl_id',
				'wbpt_property_id' => $numericPropertyIds,
			],
		],
	] );

For the unkown types acquiring issue, we thought we would add another interface that DatabaseTypeIdsResolver would implement and DatabasePropertyTermStore would use.

The interface would be called, we thought TypeIdsLookup with a function lookupTypeIds( array $types) : array // returns type ids for known types

Moved back to Ready To Go until sub-tasks are done so that this tasks logic can use them

So the change https://gerrit.wikimedia.org/r/516661 now introduces filtering and is rebased on and adjusted to use TypeIdsLookup and DatabaseTermIdsResolver::resolveTermsViaJoin.

Change 517096 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] WIP. Add CachedDatabasePropertyLabelResolver implementation using new store.

https://gerrit.wikimedia.org/r/517096

Change 516661 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] TermIdsResolver: Allow filtering terms by type and lang

https://gerrit.wikimedia.org/r/516661

Change 517096 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add CachedDatabasePropertyLabelResolver implementation using new store.

https://gerrit.wikimedia.org/r/517096

@Addshore: Patch merged. Do you know if there is more to do in this task, or can this task be closed as resolved?

Change 565258 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Pass term type and language to TermIdsResolver in PrefetchingTermLookup

https://gerrit.wikimedia.org/r/565258

Change 565258 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Pass term type and language to TermIdsResolver in PrefetchingTermLookup

https://gerrit.wikimedia.org/r/565258