Investigate optimzing wb_terms
Open, NormalPublic


wb_terms with 1.4B rows (with average of 140 bytes per row) is one of the biggest tables on that needs clean up in several areas:

  • Drop term_entity_id (and maybe term_entity_type) in favor of term_full_entity_id. We just switched to reading from term_full_entity_id everywhere.
  • Normalize term_entity_type (item=0, property=1, etc.) if we are not dropping it. It's better to be a settings file
  • Normalize term_type (label=0, etc.). It's better to be a settings file
  • Normalize term_lang (en=0, en-gb =1, etc.) in either another table or a settings file

How we are going to normalize (where to store the mapping) and migration plan needs to be determined.

Related Objects

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMon, Feb 26, 5:09 PM
thiemowmde triaged this task as Normal priority.Fri, Mar 2, 2:14 PM
thiemowmde added a subscriber: thiemowmde.
  • The term_entity_type is not needed any more in Wikibase code, as far as I'm aware of. Third parties can extract the entity type from the first letter of term_full_entity_id, if needed.
  • The mapping for term_type should live in code, in my opinion. E.g. as constants in the TermIndex interface.
  • I would not "normalize" term_lang further. It's a language code, it's short, and the allowed values potentially change constantly. If anything, this should be a primary key to the sites table. But the primary key there is also a string, and it's even longer, so there would be no benefit with regards of size.