Page MenuHomePhabricator

Fallback chains produced by LanguageFallbackChainFactory::newFromContext can be very long, leading to poorly performing queries
Open, HighPublic

Description

If a user has a lot of babel language boxes on their user page, their personal language fallback chain also gets huge. If we then do term (pre)fetching, based on the user's personal language fallback chains, the queries get huge and complex, which leads to timeouts.

For example https://www.wikidata.org/w/index.php?title=User:Marcus_Cyron&oldid=287931352 results in queries like:

Query: SELECT  term_entity_type,term_type,term_language,term_text,term_weight,term_entity_id  FROM `wb_terms`   WHERE term_entity_type = 'item' AND term_entity_id IN (…)  AND term_language IN ('de','en','la','ru','af','pl','pih','pfl','pdc','pcd','nrm','oc','pt','nb','nn','nl','nds-nl','nds','mwl','pms','ro','rm','stq','wa','vls','vec','tr','szl','sv','sq','lv','sl','sk','sco','scn','sc','roa-tara','mt','lij','lt','cy','es','eo','el','dsq','dsb','da','cs','eu','co','ca','bs','bar','ast','ang','an','et','fi','lmo','ht','gsw','li','lb','ksh','it','is','hu','hsb','fo','hr','grc','gl','ga','fy','frr','fr','zea','pt-br')  AND term_type IN ('label','description')

There are a few things we could do here:

  • Remove personal language fallback chains as they are hard to get right (especially regarding caching), and not used consistently.
  • Only take the first n languages into account. That would make the query cheaper, but could lead to strange behavior.
  • Start by trying to load labels in the first n languages, if we can't find a label in any of these, do a second query for the next batch of n languages. This would sometimes mean two (or even more) queries instead of one and it would make the logic in TermSqlIndex more complex.

Event Timeline

hoo raised the priority of this task from to High.
hoo updated the task description. (Show Details)
hoo added subscribers: hoo, Lydia_Pintscher, daniel.
hoo set Security to None.