Page MenuHomePhabricator

Add optimization parameters to Z6820/Fetch Wikidata entities
Closed, ResolvedPublic

Description

Description

Overview:
We can greatly reduce the bandwidth and processing needed for calls that fetch Wikidata entities if we provide parameters allowing those calls to specify

  1. which content is needed (labels, aliases, descriptions and/or statements)
  2. which languages are of interest in labels, aliases, and descriptions
  3. which properties are of interest in statements.

(1) and (2) can be passed through to Wikidata's wbgetentities API, which already provides those filters (using the props and languages parameters, as documented here). Filtering for (3) can easily be implemented in the orchestrator.

Update, 17 August 2025: rather than adding parameters to Z6821/2/4/5/6, as originally proposed here, we could just add them to Z6820/Fetch Wikidata entities (which probably didn't exist when this ticket was first written). (Z6820 has only been used in 2 implementations so far, so the impact on existing Wikifunctions would be negligible, and those implementations could easily be updated.) Then Z6820 could also be wrapped in various ways to create single-parameter functions like "Fetch full Wikidata entities", "Fetch entity statements <entity>", "Fetch entity labels <entity>", or 2-parameter functions like "Fetch entity content in languages <entity> <list of languages>", "Fetch entity statements with properties <entity> <list of property reference>", etc.

This would involve 3 new arguments for Z6820: (1) could be a new enumeration type for labels, aliases, descriptions, etc. (2) could be a typed list of Z60/Natural language. (3) could be a typed list of Z6092/Wikidata property reference. Whenever an empty list is passed in for any of these, that would indicate to fetch everything governed by that parameter.

Benefits of using these parameters:

  • The JSON returned from Wikidata can be far smaller, greatly reducing the bandwidth used for fetches.
  • The resulting ZObject can also be much smaller, eliminating some orchestrator processing.
  • If the resulting ZObject is passed to another Wikifunction that needs to sift through labels/aliases/descriptions, the much-smaller size of those lists will also help with the performance of that sifting.
  • The smaller ZObjects will also mean less processing in WikiLambda for presenting them, and will be somewhat easier to browse in the UI.

Concerns

  • Getting the benefits requires some user knowledge & effort, in specifying the values of the optimization parameters.
  • When a fetch function is called in a composition, and it's desirable to pass these parameters to the fetch function, that might sometimes necessitate adding the parameters to the calling function.

Background:

  • Wikidata Items typically have labels in many languages, aliases in many languages, and descriptions in many languages.
  • Many calls to our fetch functions will be made for a specific language-generation purpose, in which the target language is known in advance.
  • Similarly, many calls will be made in a context in which the value of a particular statement is needed, and the statement property is known in advance.
  • Some known performance problems related to Wikidata fetches (such as the problem underlying T378414) are caused by text snippets in many different languages (such as 90+ glosses in L1).
  • In addition to helping with current performance problems, these changes will also provide substantial network & processing savings over time.

Originally this ticket contemplated adding these parameters to Z6821/2/4/5/6, but now it makes more sense to add them to Z6820.

Completion checklist

Event Timeline

I think the most useful of these is the language one. Perhaps we could have a function that doesn't fetch any information for languages not in a list?

DMartin-WMF renamed this task from Consider adding optimization parameters to Z6821/Fetch Wikidata item to Consider adding optimization parameters to Z6821 and other Wikidata fetch functions.Mar 4 2025, 2:22 AM

Community members have been expressing a need for this. See Telegram Abstract Wikipedia channel, April 19; e.g.

The performance [of Z24041 and similar functions] is pretty scary, but the caching should make it bearable until we have more selective fetches from Wikidata. It’s not necessarily impossible, but I think displaying a date like “Saturday April 19th” would probably fail (currently) even when both Saturday and April are cached results. ...
One day we will not have to fetch the entire Wikidata item to get the labels in Z23754.
DMartin-WMF renamed this task from Consider adding optimization parameters to Z6821 and other Wikidata fetch functions to Consider adding optimization parameters to Z6820/Fetch Wikidata entities.Aug 18 2025, 2:18 AM
DMartin-WMF updated the task description. (Show Details)
DMartin-WMF updated the task description. (Show Details)
DMartin-WMF renamed this task from Consider adding optimization parameters to Z6820/Fetch Wikidata entities to Add optimization parameters to Z6820/Fetch Wikidata entities.Oct 24 2025, 5:15 AM

Change #1211757 had a related patch set uploaded (by Cory Massaro; author: Cory Massaro):

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (6ab584d)

https://gerrit.wikimedia.org/r/1211757

Change #1211875 had a related patch set uploaded (by Cory Massaro; author: Cory Massaro):

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2025-11-18-175356 to 2025-11-26-175208

https://gerrit.wikimedia.org/r/1211875

Change #1211875 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2025-11-18-175356 to 2025-11-26-175208

https://gerrit.wikimedia.org/r/1211875

Change #1211757 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (6ab584d)

https://gerrit.wikimedia.org/r/1211757

Change #1217214 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-12-03-005631 to 2025-12-08-185405

https://gerrit.wikimedia.org/r/1217214

Change #1217214 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-12-03-005631 to 2025-12-08-185405

https://gerrit.wikimedia.org/r/1217214