Wikidata entity search sometimes case-sensitive, uses wb_terms instead of CirrusSearch
Closed, ResolvedPublic

Description

Error

Request ID: W-62LApAAEoAAHHylHQAAABT

message
[W-62LApAAEoAAHHylHQAAABT] /w/api.php?action=wbsearchentities&search=A&format=json&language=en&uselang=en&type=item&useCirrus=0   ErrorException from line 309 of /srv/mediawiki/php-1.33.0-wmf.6/includes/debug/MWDebug.php: PHP Warning: Using wb_terms table for wbsearchentities API action but not using search-related fields of terms table. This results in degraded search experience, please enable the useTermsTableSearchFields setting. [Called from Closure$#10 in /srv/mediawiki/php-1.33.0-wmf.6/extensions/Wikibase/repo/WikibaseRepo.entitytypes.php at line 178]
trace
#0 /srv/mediawiki/php-1.33.0-wmf.6/includes/debug/MWDebug.php(309): MWExceptionHandler::handleError(integer, string, string, integer, array, array)
#1 /srv/mediawiki/php-1.33.0-wmf.6/includes/debug/MWDebug.php(164): MWDebug::sendMessage(string, array, string, integer)
#2 /srv/mediawiki/php-1.33.0-wmf.6/includes/GlobalFunctions.php(1104): MWDebug::warning(string, integer, integer, string)
#3 /srv/mediawiki/php-1.33.0-wmf.6/extensions/Wikibase/repo/WikibaseRepo.entitytypes.php(178): wfLogWarning(string)
#4 /srv/mediawiki/php-1.33.0-wmf.6/extensions/Wikibase/repo/includes/Api/TypeDispatchingEntitySearchHelper.php(49): Closure$#10(WebRequest)
#5 /srv/mediawiki/php-1.33.0-wmf.6/extensions/Wikibase/repo/includes/Api/SearchEntities.php(101): Wikibase\Repo\Api\TypeDispatchingEntitySearchHelper->getRankedSearchResults(string, string, string, integer, boolean)
#6 /srv/mediawiki/php-1.33.0-wmf.6/extensions/Wikibase/repo/includes/Api/SearchEntities.php(211): Wikibase\Repo\Api\SearchEntities->getSearchEntries(array)
#7 /srv/mediawiki/php-1.33.0-wmf.6/includes/api/ApiMain.php(1576): Wikibase\Repo\Api\SearchEntities->execute()
#8 /srv/mediawiki/php-1.33.0-wmf.6/includes/api/ApiMain.php(531): ApiMain->executeAction()
#9 /srv/mediawiki/php-1.33.0-wmf.6/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#10 /srv/mediawiki/php-1.33.0-wmf.6/api.php(87): ApiMain->execute()
#11 /srv/mediawiki/w/api.php(3): include(string)
#12 {main}

Impact

Wikidata entity search is case-sensitive, which it isn’t supposed to be, making it harder to find entities.

Notes

This seems to be because we’re for some reason using wb_terms for search instead of CirrusSearch? Did some config get messed up?

(Contrary to what the error message suggests, we should not enable the useTermsTableSearchFields setting, since we shouldn’t be using the terms table at all here.)

For some reason, this doesn’t always seen to happen. @Nikki and @Lydia_Pintscher can reproduce it, I can’t (but I can see the warnings in logstash).

Restricted Application added a project: Discovery-Search. · View Herald TranscriptWed, Nov 28, 3:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Lydia_Pintscher triaged this task as Unbreak Now! priority.
Lydia_Pintscher added a subscriber: WMDE-leszek.
Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptWed, Nov 28, 3:47 PM

Context: We used to use the wb_terms table for entity search, and for this it has two extra columns, term_search_key and term_weight. However, since we now use CirrusSearch for entity search on Wikidata (and have done so for, I think, at least a year? a while, at least), those columns were unnecessarily bloating the database, so in T188993 we wiped them. (The columns still exist because having a different table schema on some wikis would be icky, but they’re never written to nor read from.)

The warning

Using wb_terms table for wbsearchentities API action but not using search-related fields of terms table. This results in degraded search experience, please enable the useTermsTableSearchFields setting.

indicates that, for some reason, we are now using wb_terms for searching again; since the term_search_key is not available, it falls back to term_text, which doesn’t work very well (making the search case sensitive is the most obvious effect, but various other normalizations are now also missing), hence the warning.

Lucas_Werkmeister_WMDE lowered the priority of this task from Unbreak Now! to High.Wed, Nov 28, 4:00 PM

Lowering priority from UBN to High, since Wikidata has been rolled back to wmf.4 for now.

According to Logstash:

  • This only happens on Wikidata, not on Test Wikidata. No idea why.
  • This happened before the wmf.6 deployment as well, we just didn’t notice it. Therefore, I’m removing this as a train blocker, since it’s more likely to be due to a wmf-config change IMHO.

So Logstash says errors start happening around 1AM today (except for a single event about four hours earlier, apparently?), and SAL says T209402: A/B testing plan for wbsearchentities, context=item was deployed around that time, which sounds very related.

This comment at the bottom of that task sounds even more related (emphasis added):

Test is shipped out. 10% of english item searches will use the newly tuned parameters. Another 10% will use the classic SQL search. Test is intended to run one week, to be disabled on Dec 4.

Change 476303 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[operations/mediawiki-config@master] Disable classic_entity wbsearchentities AB test

https://gerrit.wikimedia.org/r/476303

Change 476303 merged by jenkins-bot:
[operations/mediawiki-config@master] Disable classic_entity wbsearchentities AB test

https://gerrit.wikimedia.org/r/476303

Mentioned in SAL (#wikimedia-operations) [2018-11-28T17:11:12Z] <thcipriani@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:476303|Disable classic_entity wbsearchentities AB test]] T209402 T210618 (duration: 00m 55s)

Lucas_Werkmeister_WMDE closed this task as Resolved.Wed, Nov 28, 5:14 PM
Lucas_Werkmeister_WMDE claimed this task.

Fixed, as far as I can tell. (By which I mean: I could never reproduce the bug anyways – presumably I’m in the wrong group for the AB test, which is apparently not sampled per-request – but Lydia was able to test the fix and now the bug isn’t happening for her anymore.)

Does it mean SQL-based search doesn't work now on Wikidata, or that we need to do some more setup to make it work? Would setting useTermsTableSearchFields be enough? Should we have it enabled in our mediawiki configs or disabling it also means we don't have proper index now and SQL search is unusable for us?

SQL search is intentionally disabled on Wikidata, and all the data that would back it (i. e. the term_search_key and term_weight columns) is not available. If for some reason we want to enable SQL search again, all that data needs to be reconstructed first, but the reason we disabled it in the first case is to reduce database bloat, because wb_terms is such a worryingly large table.

Basically, keeping the ability to fall back to SQL search costs us large amounts of database storage space, which doesn’t seem to be worth it when we really want to be using CirrusSearch anyways. We could discuss that decision, but it’s not a simple config change.

Got it, thanks. I guess then we have to give up on the idea to compare to SQL for the test in T209402: A/B testing plan for wbsearchentities, context=item and not do it in other tests. Not a big deal, I was just not aware it's not an option anymore.