Description
When the orchestrator fetches an extremely large Wikidata entity, that can cause out of memory or timeout conditions. In the long run, Wikifunctions will be able to handle any size of entity, but in the meantime we need more graceful handling of these situations. Some possible steps to take:
- Determine whether a further increase in memory allocation will take care of things. (See discussion in T400515.)
- Provide a feature flag to control whether qualifiers and references are imported: T402357.
- The feature flag (now implemented) could be extended to provide a means of turning off statements whose values have "datatype": "external-id". There are many such statements and they are relatively unlikely to be used. T405991
- Provide a entity fetch function that includes optimization arguments: T382921.
- When we retrieve Wikidata JSON for an entity, check its size.
- Proposal for how to measure size: count the number of snaks in the entity's statements list.
- If the size is over a particular threshold, return an error that indicates we aren't importing it due to size.
Desired behavior/Acceptance criteria (returned value, expected error, performance expectations, etc.)
- In the near- to medium-term, avoid out of memory and timeout errors caused by fetching large Wikidata entities, and provide an alternate path by which users can get the content they need from those large entities.
Completion checklist
- Before closing this task, review one by one the checklist available here: https://www.mediawiki.org/wiki/Abstract_Wikipedia_team/Definition_of_Done#Back-end_Task/Bug_completion_checklist