This ticket serves as the main placeholder for the functionality for looking up entity terms from elastic search rather than SQL.
@Addshore wrote this shortly before going on vacation, and this ticket likely needs a bit more love.
Why
There are many use cases that require an entities terms or multiple entities terms (labels descriptions and aliases) to be known.
These are currently looked up from the wb_terms table, rather than loading the whole entity JSON.
Currently in SQL these lockups are batched by # of entities, with batch sizes of 9
https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/128707d2aaa9ba2fadd384af6d7033c43070e69a/lib/includes/Store/Sql/TermSqlIndex.php#L611-L613
Elasticsearch allows faster lookups for complex cases compared to SQL, when many languages are requested (for fallback) or hundreds of entities are requested (large lookups).
The ongoing modifying the terms storage in SQL will also likely make the current performance for these lookups decrease slightly.
Rate
Tracking was added to see how many of these bulk lookups occur
https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/128707d2aaa9ba2fadd384af6d7033c43070e69a/lib/includes/Store/Sql/TermSqlIndex.php#L616-L618
The data can be seen on https://grafana.wikimedia.org/d/000000548/wikibase-wb_terms?refresh=30s&orgId=1
This indicates anywhere between 150k and 350k bulk lookup requests per minute.
How
Elastic search uses the pageid as the document ID for storage, Wikibase can do a single sql lookup from entity ID -> page id to allow for efficient document fetching in elastic search rather than a search using an entity id as a search term.
According to Discovery-ARCHIVED this could also mean that a large increase in hit rate for these lookups to elastic search would be fine due to the document id / page id dictating where the record is (needs testing)
Examples of lookup by entity id and by page id can be seen in P8373
Dealing with stale data
Elastic search can be out of date due to maintenance or slow running jobs.
Our lookup needs to be up to date.
A fallback and check may be needed
- Lookup in ES, and get the revid?
- For entities that have newer revisions fallback to sql storage and perform the lookup there
Possible rollout
It would make the most sense to start with the larger more complex queries, such as lookups that require over 5 languages, or over 10 entities.