As part of work implementing T385540 (allow lua functions to perform transform for data in charts) I'll be enhancing JsonConfig's remote fetching for Data: pages to allow running through a Lua function. This allows transform functions, their maintenance, and their execution to live on the same wiki as the Data: pages themselves (Commons in production) while allowing for caching and rate limiting within the cluster.
Conceptual model:
- each JsonConfig-using wiki in the farm is either the 'store' wiki for a given type of data and is where the pages in the Data: namespace actually "live", or is a remote wiki that fetches the Data: pages through the standard MediaWiki action API and caches them in a shared cache.
- -> extending this so that you can also specify a transform function to run through by providing a custom query in the API
- client wikis use this rather than trying to run their own transforms
Security considerations:
- we're running code yay! but we could also run the same code by asking the API to parse wikitext with {{#invoke:}} in it, in any of several different ways.
- -> any abuse considerations for parsing (performance counters, rate limiters, etc) may need to be replicated here
Data considerations:
- to track dependencies for cache invalidation, it may be wise to pass through information from a ParserOutput; additional page usages via the Lua code will appear in the template links there and can be passed through along with the Module: page itself.
Details:
- where does the API query belong?
- recommend hooking into existing action=jsondata
- example read: https://commons.wikimedia.org/w/api.php?action=jsondata&formatversion=2&format=jsonfm&title=1993_Canadian_federal_election.tab
- be sure to avoid infinite loopback from misconfigured remote stores!
- put the filter fetch on JCCache as a nice pass-through option from JCSingleton::getContent()
- how to pass options?
- language for string localization can be selected via uselang
- as blob transform={blah} with url-encoded JSON?
- or as transform=module&transformfunction=func&transformargs={...} etc?
- we still want arbitrary JSON for the args probably so meh
Next step will be to attach this internally to JCSingleton::getContent() to extend it, or create a side interface, with the transform options, and have that fetch across wikis as well as running locally.





