Page MenuHomePhabricator

Gracefully handle jumbo Wikidata entities and minimize crashes
Closed, ResolvedPublic

Description

Description

When the orchestrator fetches an extremely large Wikidata entity, that can cause out of memory or timeout conditions. In the long run, Wikifunctions will be able to handle any size of entity, but in the meantime we need more graceful handling of these situations. Some possible steps to take:

  1. Determine whether a further increase in memory allocation will take care of things. (See discussion in T400515.)
  2. Provide a feature flag to control whether qualifiers and references are imported: T402357.
  3. The feature flag (now implemented) could be extended to provide a means of turning off statements whose values have "datatype": "external-id". There are many such statements and they are relatively unlikely to be used. T405991
  4. Provide a entity fetch function that includes optimization arguments: T382921.
  5. When we retrieve Wikidata JSON for an entity, check its size.
    1. Proposal for how to measure size: count the number of snaks in the entity's statements list.
  6. If the size is over a particular threshold, return an error that indicates we aren't importing it due to size.

Desired behavior/Acceptance criteria (returned value, expected error, performance expectations, etc.)

  • In the near- to medium-term, avoid out of memory and timeout errors caused by fetching large Wikidata entities, and provide an alternate path by which users can get the content they need from those large entities.

Completion checklist

Event Timeline

DMartin-WMF renamed this task from Gracefully handle & return an error when a fetched Wikidata entity is too large to Gracefully handle jumbo Wikidata entities and minimize crashes.Aug 21 2025, 6:10 AM
DMartin-WMF updated the task description. (Show Details)
DMartin-WMF claimed this task.

Closing because most of the listed ideas now have separate Phab tickets (2 of which have been completed), as noted in the Description. Also, ideas 5 and 6 no longer seem very compelling. This is mainly because when a fetched Wikidata entity is too large, it usually results in a recognizable error condition, which gets reported back to the user (for example, as "time limit exceeded").