query taking 10s: TermSqlIndex::getMatchingIDs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	JanZerebecki
	Dec 8 2014, 7:43 PM

Description

Example:
SlowTimer [10385ms] at runtime/ext_mysql: slow query: SELECT /* Wikibase\TermSqlIndex::getMatchingIDs IP.REM.O.VED */ DISTINCT term_entity_id,term_weight FROM wb_terms WHERE (term_language='pl' AND term_search_key LIKE 'p%' AND term_type='label' AND term_entity_type='item') OR (term_language='pl' AND term_search_key LIKE 'p%' AND term_type='alias' AND term_entity_type='item') LIMIT 5000

About 200 matching "at runtime/ext_mysql: slow query: SELECT /* Wikibase\\TermSqlIndex::getMatchingIDs" in the last 2 days.

Related Objects
Search...

Status	Assigned	Task
Resolved	• Wikidata-bugs	T77898 query taking 10s: TermSqlIndex::getMatchingIDs
Declined	None	T78156 [Task] Cache wbsearchentities responses for short prefixes
Resolved	Smalyshev	T78157 [Story] Use ElasticSearch for entity search on wikidata.org
Resolved	Lydia_Pintscher	T88535 [Task] Investigate preconditions & constraints for using Elastic directly
Duplicate	Smalyshev	T117520 Index Wikidata labels, aliases and descriptions as separate fields in ElasticSearch
Resolved	Smalyshev	T125500 [Epic] Index Wikidata labels and descriptions as separate fields in ElasticSearch
Invalid	None	T132444 [Task] Create repository for WikibaseElastic
Resolved	Smalyshev	T150891 Find a good way to represent multi-lingual text fields in Elastic
Resolved	Smalyshev	T155139 'ContentHandlerForModelID' hook allows creating of handlers that aren't registered
Resolved	Smalyshev	T157604 Make Wikibase register all its content handlers
Resolved	Smalyshev	T157626 ContentHandlerSanityTest assumes every handler can create empty content, but Wikibase can't
Resolved	Smalyshev	T171548 Wikidata Elastic search ignores limit
Resolved	debt	T172422 Add stats marker to wikidata entity search
Resolved	Smalyshev	T172467 Make good prefix search profile for Wikidata entities
Resolved	Smalyshev	T173231 Wikidata Elastic search drops results with matches in different language label
Resolved	Smalyshev	T175741 Set ElasticSearch implementation as default for wbsearchentites on Wikidata
Resolved	Smalyshev	T178851 Use label & description index for fulltext search
Resolved	Smalyshev	T176903 Index wikidata descriptions
Resolved	Smalyshev	T180169 Make list of languages where using stemmed analyzer for Wikibase is beneficial
Resolved	Smalyshev	T182271 Wikidata fulltext search should handle search syntax properly
Resolved	dcausse	T182293 Tune wikidata fulltext search similarity parameters
Resolved	Smalyshev	T181426 Reindex wikidata to enable description index
Duplicate	None	T117522 Create standard "completion suggestion" API for Search
Declined	None	T120089 Add an internal completion or suggestions API to core SearchEngine
Declined	None	T170392 Create gadget that enables the use of the elastic search backend for the entity selector
Resolved	daniel	T170400 Define metrics for search result quality for the entity selector widget on wikidata.
Resolved	Lydia_Pintscher	T170405 Manually evaluate cirrus based entity search on test system
Open	None	T170547 Metrics to evaluate new search for item suggestor
Declined	None	T170549 Provide A/B test for item suggestor
Resolved	Smalyshev	T162292 Reindex wikidata to pick up labels/descriptions mappings
Resolved	dcausse	T160926 Make noop script be able replace whole fields with nested subfields
Resolved	dcausse	T166589 Update wikidata code to take advantage of nested fields noop script
Resolved	Smalyshev	T175199 Index certain statements for Wikidata items
Invalid	• Wikidata-bugs	T85415 [Task] limit prefix match queries of table wb_terms to more than 3 chars

Event Timeline

JanZerebecki created this task.Dec 8 2014, 7:43 PM

JanZerebecki assigned this task to • Wikidata-bugs.

JanZerebecki raised the priority of this task from to High.

JanZerebecki updated the task description. (Show Details)

JanZerebecki added a project: Wikidata.

JanZerebecki changed Security from none to None.

JanZerebecki subscribed.

Tobi_WMDE_SW added a project: Wikidata-Sprint-2014-12-09§.Dec 9 2014, 4:32 PM

This query modified for pol instead of p still takes 12s on the analytics slave (down from about 16s for p). With 4 letters it goes below 4seconds.

There is an index named tmp1 over term_language, term_type, term_entity_type, term_search_key on that table, which is not in the source (we might want to fix that).

JanZerebecki moved this task from Backlog to Done on the Wikidata-Sprint-2014-12-09§ board.Dec 9 2014, 4:52 PM

One solution would be to do type-ahead queries via Cirrus (Elastic (Lucene)). We would probably want to implement T78011 for that.

As a stop gap, we could start type ahead queries only after at last 3 (or 2, or 4) characters are present. This should be configurable.

We should also look into avoiding fetching 5000 hits all the time. There gotta be a better way.

This is only used from the wbsearchentities API used for autosuggest. The code currently first gets an exact match on entity id, then exact matches, then in a 3rd query prefix matches if more matches are wanted. Quite many 3 or less letter combinations have enough exact matches. The query in question is a 3rd one. Limiting prefix matches to 4 or more letters is probably sensible for now. Long term I like the cirrus way.

Request that causes this query: https://www.wikidata.org/w/api.php?action=wbsearchentities&search=p&format=json&language=pl&type=item&continue=0

Aklapper added a project: Performance Issue.Dec 9 2014, 8:37 PM

On a related note, we should not use the terms table for uniqueness checks. See T74430

daniel added a subtask: T78156: [Task] Cache wbsearchentities responses for short prefixes.Dec 10 2014, 4:28 PM