Description
Overview:
We can greatly reduce the bandwidth and processing needed for calls that fetch Wikidata entities if we provide parameters allowing those calls to specify
- which content is needed (labels, aliases, descriptions and/or statements)
- which languages are of interest in labels, aliases, and descriptions
- which properties are of interest in statements.
(1) and (2) can be passed through to Wikidata's wbgetentities API, which already provides those filters (using the props and languages parameters, as documented here). Filtering for (3) can easily be implemented in the orchestrator.
Update, 17 August 2025: rather than adding parameters to Z6821/2/4/5/6, as originally proposed here, we could just add them to Z6820/Fetch Wikidata entities (which probably didn't exist when this ticket was first written). (Z6820 has only been used in 2 implementations so far, so the impact on existing Wikifunctions would be negligible, and those implementations could easily be updated.) Then Z6820 could also be wrapped in various ways to create single-parameter functions like "Fetch full Wikidata entities", "Fetch entity statements <entity>", "Fetch entity labels <entity>", or 2-parameter functions like "Fetch entity content in languages <entity> <list of languages>", "Fetch entity statements with properties <entity> <list of property reference>", etc.
This would involve 3 new arguments for Z6820: (1) could be a new enumeration type for labels, aliases, descriptions, etc. (2) could be a typed list of Z60/Natural language. (3) could be a typed list of Z6092/Wikidata property reference. Whenever an empty list is passed in for any of these, that would indicate to fetch everything governed by that parameter.
Benefits of using these parameters:
- The JSON returned from Wikidata can be far smaller, greatly reducing the bandwidth used for fetches.
- The resulting ZObject can also be much smaller, eliminating some orchestrator processing.
- If the resulting ZObject is passed to another Wikifunction that needs to sift through labels/aliases/descriptions, the much-smaller size of those lists will also help with the performance of that sifting.
- The smaller ZObjects will also mean less processing in WikiLambda for presenting them, and will be somewhat easier to browse in the UI.
Concerns
- Getting the benefits requires some user knowledge & effort, in specifying the values of the optimization parameters.
- When a fetch function is called in a composition, and it's desirable to pass these parameters to the fetch function, that might sometimes necessitate adding the parameters to the calling function.
Background:
- Wikidata Items typically have labels in many languages, aliases in many languages, and descriptions in many languages.
- Many calls to our fetch functions will be made for a specific language-generation purpose, in which the target language is known in advance.
- Similarly, many calls will be made in a context in which the value of a particular statement is needed, and the statement property is known in advance.
- Some known performance problems related to Wikidata fetches (such as the problem underlying T378414) are caused by text snippets in many different languages (such as 90+ glosses in L1).
- In addition to helping with current performance problems, these changes will also provide substantial network & processing savings over time.
Originally this ticket contemplated adding these parameters to Z6821/2/4/5/6, but now it makes more sense to add them to Z6820.
Completion checklist
- Before closing this task, review one by one the checklist available here: https://www.mediawiki.org/wiki/Abstract_Wikipedia_team/Definition_of_Done#Back-end_Task/Bug_completion_checklist