wb_terms is one of the largest tables controlled by Wikibase, and probably that is queried and updated most frequently. This RFC aims to improve the performance of the different operations performed on the table.
There are currently three major use cases covered by wb_terms:
- Uniqueness constraints (items by language+label+description, properties by language+description)
- Finding properties by label or alias / suggesting entities by label or alias prefix
- Looking up labels and descriptions for a given set of entities, for display (no aliases)
I suggest to re-implement these use cases:
- Uniqueness constraints should be re-implemented based on hashes, see T74430
- Lookup by label/alias should be implemented based on Cirrus/Elastic (see also T89733: Allow ContentHandler to expose structured data to the search engine.)
- Label lookup could also be re-implemented based on Elastic, see T143706.
If we need wb_terms only for label lookups, to optimize it better for that use case: drop aliases, and have separate fields for labels and descriptions, so all relevant info for a given language would be in a single row. This would also allow for a natural primary key: entity+language would be unique.
On the other hand, we may want to keep support for aliases in wb_terms, so 3rd party installs can work without Elastic.