Page MenuHomePhabricator

query taking 10s: TermSqlIndex::getMatchingIDs
Closed, ResolvedPublic

Description

Example:
SlowTimer [10385ms] at runtime/ext_mysql: slow query: SELECT /* Wikibase\TermSqlIndex::getMatchingIDs IP.REM.O.VED */ DISTINCT term_entity_id,term_weight FROM wb_terms WHERE (term_language='pl' AND term_search_key LIKE 'p%' AND term_type='label' AND term_entity_type='item') OR (term_language='pl' AND term_search_key LIKE 'p%' AND term_type='alias' AND term_entity_type='item') LIMIT 5000

About 200 matching "at runtime/ext_mysql: slow query: SELECT /* Wikibase\\TermSqlIndex::getMatchingIDs" in the last 2 days.

Related Objects

StatusSubtypeAssignedTask
Resolved Wikidata-bugs
DeclinedNone
ResolvedSmalyshev
ResolvedLydia_Pintscher
DuplicateSmalyshev
ResolvedSmalyshev
InvalidNone
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddebt
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddcausse
ResolvedSmalyshev
DuplicateNone
DeclinedNone
DeclinedNone
Resolveddaniel
ResolvedLydia_Pintscher
OpenNone
DeclinedNone
ResolvedSmalyshev
Resolveddcausse
Resolveddcausse
ResolvedSmalyshev
Invalid Wikidata-bugs

Event Timeline

JanZerebecki assigned this task to Wikidata-bugs.
JanZerebecki raised the priority of this task from to High.
JanZerebecki updated the task description. (Show Details)
JanZerebecki added a project: Wikidata.
JanZerebecki changed Security from none to None.
JanZerebecki subscribed.

This query modified for pol instead of p still takes 12s on the analytics slave (down from about 16s for p). With 4 letters it goes below 4seconds.

There is an index named tmp1 over term_language, term_type, term_entity_type, term_search_key on that table, which is not in the source (we might want to fix that).

One solution would be to do type-ahead queries via Cirrus (Elastic (Lucene)). We would probably want to implement T78011 for that.

As a stop gap, we could start type ahead queries only after at last 3 (or 2, or 4) characters are present. This should be configurable.

We should also look into avoiding fetching 5000 hits all the time. There gotta be a better way.

This is only used from the wbsearchentities API used for autosuggest. The code currently first gets an exact match on entity id, then exact matches, then in a 3rd query prefix matches if more matches are wanted. Quite many 3 or less letter combinations have enough exact matches. The query in question is a 3rd one. Limiting prefix matches to 4 or more letters is probably sensible for now. Long term I like the cirrus way.

Request that causes this query: https://www.wikidata.org/w/api.php?action=wbsearchentities&search=p&format=json&language=pl&type=item&continue=0

On a related note, we should not use the terms table for uniqueness checks. See T74430

Lydia_Pintscher subscribed.

What are the next steps here? Can we create tasks for them? It's been moved to done in the sprint but not closed.

The investigation was done. All the work needed is now in subtasks.

Ok thanks. Closing this one then.