Description
zobjectUtils.js convertTableToJson has quite a lot of complexity, and it's used extensively. Particularly:
- Every time we need to get the type of an object by rowId, we call convertTableToJson (for the sub-zobject): this is called many times
- When we use function call widget, we convert the metadata to table, and then convert back to Json: this is called once but it's an unnecessary step
We should audit the usage of this and limit it to the minimum
Additionally, we should improve the performance of the convertTableToJson so that it's not so costly on the UI
Root cause
Why are these methods so costly?
Building a JSON representation of the flattened zobject table representation is costly when the object is very large.
I can identify two main reasons:
- Row ids are different to the index that the row takes in the array.
- As a result, getRowById requires walking all the table, row by row, searching for an id match.
- getRowById is a nuclear operation performed again and again
- Children rows point at their parent, but there's no reverse index
- As a result, getChildrenByParentRowId requires walking all the table, row by row, searching for all the id=parent matches
- getChildrenByParentRowId is a nuclear operation performed again and again
Ideas:
- Don't fall back to convertTableToJson so often when we do getZObjectTypeByRowId:
- getZObjectTypeByRowId is a widely used method, we should be simplifying it as much as possible and only falling back to returning a json in necessary cases
- ✅ done in patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/1083205/3
- Don't transform Z22K2/metadata into flat table, because for the Metadata dialog we transform it back to JSON again
- not only the two-way transformation takes a lot of computation time, but it doubles up the size in memory and worsens performance considerably
- ✅ done in patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/1085375/5 (very significant improvement in the function evaluator)
- Transform flat zobject table array into object indexed by the row id
- This should improve object access, we would access rows directly without having to do zobject.find()
- This should also improve operations like deletion
- ❌ tried this. Observed no improvement. Javascript arrays are much more efficient than objects and the application becomes heavier in memory and slower.
- Create a reverse index from parent -> children rows
- This should improve performance when finding the children of a row (for example, when deleting a sub-tree)
- ❌ tried this. Application becomes very heavy in memory.
- Keep row Ids and array indexes in sync
- This should allow row access by simply doing zobject[ rowId ] without having to do zobject.find()
- But this means that rows cannot be deleted, they just need to be nullified or marked as destroyed
- Extremely complex and nullified rows end up adding a lot of load in terms of size (mostly on Function orchestrator
- Namespace zobject tables T391136: [Stretch] Namespace ZObject tables to remove the extra complexity of deleting Detached objects
- Other mitigation strategies captured on T390560: Improve the performance in the Wikifunctions front-end, so that function creators and users can use complex and large Objects such as those from Wikidata without difficulty and subtasks
Test cases
(added 2025-01-23, by @DMartin-WMF )
The UI performance can be stress-tested by fetching sizable Wikidata items (as described below). For example, two large Wikidata items, which the UI currently is unacceptably slow in presenting, are Q144 / dog and Q64 / Berlin. I have seen both of these display successfully, after several minutes of waiting and telling Chrome to continue waiting. But more often, after some waiting, Chrome shows me an Out of Memory message.
To observe a quick, acceptable response time, try fetching a small item such as Q302556 / Catholic Encyclopedia. Q42 / Douglas Adams is a moderately large item, which usually shows up quickly for me in collapsed form, but after clicking the chevron can cause the browser to run out of memory.
Here are rough indications of the sizes of the 3 problematic items (by number of labels) and typical reported orchestrator processing times. In addition to the labels, these items have many descriptions, aliases, and statements.
- Q42 / Douglas Adams
- approx. 69 labels
- Orchestrator duration: 4134 ms
- Q144 / dog
- approx. 300 labels
- Orchestrator duration: 6483 ms
- Q64 / Berlin
- approx. 270 labels
- Orchestrator duration: 7017 ms
To fetch and attempt to display a Wikidata item:
- Visit Z6821
- In Try this function / Enter inputs, type in the Q-ID (e.g. "Q42")
- When you see the item title (e.g., "Douglas Adams"), click on it
- Then click Run function
Completion checklist
- Before closing this task, review one by one the checklist available here: https://www.mediawiki.org/wiki/Abstract_Wikipedia_team/Definition_of_Done#Front-end_Task/Bug_Completion_Checklist