Page MenuHomePhabricator

Create a Scribunto interface for the Variables extension
Open, LowPublic20 Estimated Story PointsFeature

Description

Variables are often used in Scribunto Modules to keep persistent data between invocations and to allow communication with templates and other wikicode. Currently the only way to do this is to use something like mw.getCurrentFrame():callParserFunction( '#vardefine', { 'foo', 'bar' } ) (or a separate extension like VariablesLua).

I'd like to suggest adding a native Scribunto support to the extension itself to improve usability and performance when using variables from Lua.

The best way to implement this would probably be using metamethods instead of just creating bindings to the already existing functions. Some examples for the suggested syntax can be found below:

  • vars.foo = 'bar' instead of vars.setVarValue( 'foo', 'bar' )
  • vars.foo instead of vars.getVarValue( 'foo' )
  • vars.foo = nil instead of vars.unbindVar( 'foo' )

Additionally it would be useful to have access to the raw variable table ($mVariables).

Event Timeline

Litzsch updated the task description. (Show Details)
Litzsch lowered the priority of this task from Medium to Low.Aug 3 2018, 12:39 PM

@Aklapper: Oh sry I didn't realize the priority was for the assignee, but I'd be happy to work on this if there's a chance of it being added to the Extension.

Thanks for clarifying! Cannot 'judge the chance' so let's better wait for a reply by a maintainer :)

Variables are often used in Scribunto Modules to keep persistent data between invocations and to allow communication with templates and other wikicode. Currently the only way to do this is to use something like mw.getCurrentFrame():callParserFunction( '#vardefine', { 'foo', 'bar' } ) (or a separate extension like VariablesLua).

It is? I'm kind of surprised to her that, since Wikimedia for example decided against activating further parser functions (StringFunctions, Variables, Loops) and decided to go with Lua instead.

I'd like to suggest adding a native Scribunto support to the extension itself to improve usability and performance when using variables from Lua.

This sounds like an interesting idea, if your use case isn't that unusual. (Are there people besided the Liquipedia wiki farm who are using a combination of Scribunto and Variables?) However, I have no knowledge of Scribunto, Lua and conditional Hook registration (ScribuntoExternalLibraries) myself, so I might be unqualified to review your patch.

The best way to implement this would probably be using metamethods instead of just creating bindings to the already existing functions. Some examples for the suggested syntax can be found below:

vars.foo = 'bar' instead of vars.setVarValue( 'foo', 'bar' )
vars.foo instead of vars.getVarValue( 'foo' )
vars.foo = nil instead of vars.unbindVar( 'foo' )

I think there should be something like local vars = mw.ext.Variables.vars needed before?

Additionally it would be useful to have access to the raw variable table ($mVariables).

I don't think the extension should allow to do something via Lua that it doesn't allow otherwise. This would be possible by implementing this proposal. (For example, unsetting variable keys.)

Yugipedia has both Variables and Scribunto installed, though we don't use them in combination currently, and I don't think we have any plans to do so either. I'd be curious to see one of the uses this issue was opened for, since to my knowledge, with careful module design you can completely eliminate the need for Variables. (I have kicked around the idea of porting Extension:Arrays to a module for a few years now, in which case Variables would be handy for saving arraystate between module invocations, but the "correct" way to do it would be to just reimplement the functionality that arrays are being used for, directly in a Lua module, meaning Variables still wouldn't be required.)

By the way, I just took a look at VariablesLua and there are two things that confuse me:

  1. I don't really understand what you're doint with $parser->getPreprocessor()->newFrame(). I don't have enough parser knowledge to definitely judge this, but Variables is using PPFrame to ensure the parser does nothing wrong if parsing a template multiple times. If you do something like this, weird bugs like the ones I fixed in T191574 might occur again. Shoudn't you use something related to mw.getCurrentFrame()?
  2. You use
if ( method_exists( 'ExtVariables', 'pf_varexists' ) ) {
  return [ ExtVariables::pf_varexists( $parser, ...$params ) ];
} else {
  return [ ExtVariables::pfObj_varexists( $parser, $parser->getPreprocessor()->newFrame(), $params ) ];
}

to make your extension work with the version of Variables before and after mentioned change. Given the nature of my change, you might get inconsistent data out of Variables, if you check it this way. You should check for pfObj_varexists first and use the old function as fallback.

Are there people besided the Liquipedia wiki farm who are using a combination of Scribunto and Variables?

It's also done regularly around the Gamepedia wiki farm (e.g. Leaguepedia, Minecraft Wiki, Dota 2 Wiki, …)

I think there should be something like local vars = mw.ext.Variables.vars needed before?

Yes, that would be needed at the top of Modules using the extension to actually load it. The examples I gave were just of a specific implementation (a table with metamethods similar to how frame.args works)

I don't think the extension should allow to do something via Lua that it doesn't allow otherwise. This would be possible by implementing this proposal. (For example, unsetting variable keys.)

  1. Unsetting variables is something I think makes sense in Lua, as there is an actual difference between nil and an empty string (so setting a variable to nil unsets it instead of setting an empty string as the value). This would be more in line with how Lua usually works. Setting something to nil and it storing an empty string just feels wrong imo.
  2. Access to the raw $mVariables table would allow implementing iteration over all variables, which imo would be useful to have but not a necessity. It would allow iteration over all variables and make debugging existing var values easier. It could also possibly be used to improve performance by loading the whole variables table only one time and working with the Lua table from there on when retrieving var values (this would create problems when used in combination with the callParserFunction method though so I'm not sure if the extra performance would be worth it).

with careful module design you can completely eliminate the need for Variables

I've looked around for quite a bit before creating this proposal and couldn't find a way to transfer data between invocations (and even templates), but should there be one it would obviously make this redundant.

I just took a look at VariablesLua and there are two things that confuse me:

  1. Passing a frame from mw.getCurrentFrame() doesn't work since it's not really a frame object. I'm aware of T191574 and am looking for a solution, but for now invokations through #invoke don't get cached anyways. The problem only exists when the #invoke is used in a template and that template gets cached.
  2. It's not my extension. I only fixed some bugs there (and implemented the var_table() function). I just referenced it to show there's a need for such an interface.

with careful module design you can completely eliminate the need for Variables

I've looked around for quite a bit before creating this proposal and couldn't find a way to transfer data between invocations (and even templates), but should there be one it would obviously make this redundant.

It was an intentional design choice on the part of Scribunto to disallow passing data between invocations, since that would have greatly increased the complexity of partial page parsing (e.g. editing a section and previewing; Parsoid also has the explicit goal of allowing for only part of a page to be reparsed on saving, again mainly for section editing). Any mechanism you find to do so which involves only vanilla MediaWiki and standard extensions (i.e. anything installed on WMF wikis) should be reported as a bug and not relied upon to continue working. For that matter, there is probably no guarantee that things such as Variables will continue to work into the indefinite future, but there at least, I am not aware of any plans to introduce breaking changes to MediaWiki.

The idea behind eliminating any need for Variables is to design your modules to work with a single invocation, instead of requiring multiple invocations per page. E.g. if you're currently using a module to build a table by outputting one table row per invocation, and passing data between rows/invocations with Variables, you can rewrite the module to instead generate the entire table in one invocation. Whether you actually follow through with this and to what degree depends on exactly what you're doing (e.g. if you're passing data between an infobox-like template at the top of a page and a navbox-like template at the bottom, you're probably not going to eliminate the need for Variables no matter what you do) and how closely you want to follow the philosophy set forth by the parser and Scribunto of no inter-invocation data passing.

I think there should be something like local vars = mw.ext.Variables.vars needed before?

Yes, that would be needed at the top of Modules using the extension to actually load it. The examples I gave were just of a specific implementation (a table with metamethods similar to how frame.args works)

I thought so, but I wanted to confirm my guess.

I don't think the extension should allow to do something via Lua that it doesn't allow otherwise. This would be possible by implementing this proposal. (For example, unsetting variable keys.)

  1. Unsetting variables is something I think makes sense in Lua, as there is an actual difference between nil and an empty string (so setting a variable to nil unsets it instead of setting an empty string as the value). This would be more in line with how Lua usually works. Setting something to nil and it storing an empty string just feels wrong imo.

Of course, but even though you can use wikitext to change a once defined variable to an empty string, #varexistswould still deliver 1. However, I'm not strictly opposed to allow unsetting Variables by Lua, given the upsides.

I just took a look at VariablesLua and there are two things that confuse me:

  1. Passing a frame from mw.getCurrentFrame() doesn't work since it's not really a frame object. I'm aware of T191574 and am looking for a solution, but for now invokations through #invoke don't get cached anyways. The problem only exists when the #invoke is used in a template and that template gets cached.
  2. It's not my extension. I only fixed some bugs there (and implemented the var_table() function). I just referenced it to show there's a need for such an interface.

I know it's just meant as a reference, but I think of it kind of as a blue print for what you are trying to do. Nevertheless, if you can in some way use similar hooks as if parsing Wikitext, this problem could be avoided. I don't want to introduce any new code that would create a loophole in which the problem handled in T191574 is happening again.

with careful module design you can completely eliminate the need for Variables

I've looked around for quite a bit before creating this proposal and couldn't find a way to transfer data between invocations (and even templates), but should there be one it would obviously make this redundant.

It was an intentional design choice on the part of Scribunto to disallow passing data between invocations, since that would have greatly increased the complexity of partial page parsing (e.g. editing a section and previewing; Parsoid also has the explicit goal of allowing for only part of a page to be reparsed on saving, again mainly for section editing). Any mechanism you find to do so which involves only vanilla MediaWiki and standard extensions (i.e. anything installed on WMF wikis) should be reported as a bug and not relied upon to continue working. For that matter, there is probably no guarantee that things such as Variables will continue to work into the indefinite future, but there at least, I am not aware of any plans to introduce breaking changes to MediaWiki.

Cite is an example of an extension that doesn't always work as it should if you parse only parts of a given page, as there can be named references defined in other sections. (Maybe it's even possible to abuse named refs as Variables.) To make this work without major problems, WMF introduced the setVolatile(); function in PPFrame, which I'm also using to prevent caching issues involving variables.

In general, I think this is a good idea and would like to see you work on it.

I've just pushed a possible patch to Gerrit. There's a small issue with indentation since I had my editor set to use two spaces instead of tabs that I'm not sure how to fix without uploading a separate patch. Is there any way to ammend/change my patch without a separate one?

The Lua interface is very simplistic for now and just provides bindings to the already existing public functions as well as the $mVariables table, though that could certainly be expanded on to be more Lua like.

Additionally I have not found a good solution for T191574 so I would appreciate some help in that regard if someone has an idea how to fix this

I can take a look on the content of your patch next week. Which tool are you using to push patches to gerrit? I recommend git-review. If you wait for the CodeSniffer patch to be merged (and rebase), it will help you to identify code style issues.

Change 452899 had a related patch set uploaded (by MGChecker; owner: Litzsch):
[mediawiki/extensions/Variables@master] Added a Scribunto Lua interface

https://gerrit.wikimedia.org/r/452899

Cite is an example of an extension that doesn't always work as it should if you parse only parts of a given page, as there can be named references defined in other sections.

Parsoid reimplements Cite in its own code to deal with that.

(Maybe it's even possible to abuse named refs as Variables.)

That was T63268: Abuse of Cite extension allows cross-invoke communication.

Change 452899 had a related patch set uploaded (by Litzsch; owner: Litzsch):
[mediawiki/extensions/Variables@master] Added a Scribunto Lua interface

https://gerrit.wikimedia.org/r/452899

I don't think the extension should allow to do something via Lua that it doesn't allow otherwise. This would be possible by implementing this proposal. (For example, unsetting variable keys.)

  1. Unsetting variables is something I think makes sense in Lua, as there is an actual difference between nil and an empty string (so setting a variable to nil unsets it instead of setting an empty string as the value). This would be more in line with how Lua usually works. Setting something to nil and it storing an empty string just feels wrong imo.
  2. Access to the raw $mVariables table would allow implementing iteration over all variables, which imo would be useful to have but not a necessity. It would allow iteration over all variables and make debugging existing var values easier. It could also possibly be used to improve performance by loading the whole variables table only one time and working with the Lua table from there on when retrieving var values (this would create problems when used in combination with the callParserFunction method though so I'm not sure if the extra performance would be worth it).

I put some additional thought into this and while I think unsetting should be no problem, I think you shouldn't be able to get a table of all Variables, (that this is possible in the first case is a bug resolved recently) as this could allow you to confuse extensions that are using variables internally to store their data during a parse. Mutiple unrelated uses of variables ideally shouldn't interfere with each other. However, you can add Lua functions like that get whole arrays of variables of an array of keys you put into the function.

Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptAug 26 2018, 2:29 PM
MGChecker set the point value for this task to 20.Oct 25 2018, 12:10 AM
MGChecker changed the subtype of this task from "Task" to "Feature Request".Mar 1 2019, 11:35 PM

@Litzsch: Hi, do you plan to improve the patch in Gerrit? Asking as you are set as task assignee. Thanks!

@Litzsch: I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!). Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task. Please claim this task again when you plan to work on it (via Add Action...Assign / Claim in the dropdown menu) - it would be welcome. Thanks for your understanding!

Aklapper renamed this task from Create a Scribunto interface to Create a Scribunto interface for the Variables extension.May 17 2020, 10:10 AM

Having variables kept across Lua invokation is still useful as long as these variables have idempotent values (i.e. values that should be independant of the number of calls, but just depending on the state when starting to parse a wiki page: basically there are things that are constant for all the wiki (like the default language) or depend on properties of the current page (the content language, and the starting time when this page gets parsed and rendered, even if the parsing is incomplete), the user's UI language (not specific to the user but shared and cachable for any usage of the same language).
The frame itself tracks such constants, but the lua calls should not depend on their order of invokation and all calls should work as if they were the first one being evaluated in the same context.
Then initializations can be performed only once, cached, and reused without the associated cost (notably in CPU time, as most often memory space is not the major blocker). This means that lua calls should depend on a small environment (even smaller that what is in the "current frame" which contains the current state of analysis and various synchronisation data for caches and different rendering servers).
If we make these Lua calls more functional, with reduced sets of dependencies (taken from any "current frame" where they should not change over time), it becomes possible to render a page incrementally and update only what is necessary, just by invalidating caches more incrementally, and queuing the rendering and caching of other fragments: we then solve the problem of CPU time, as well as later the resources limits on memory space, because caches will fill the gap). The only price will be the total time to have the page full refreshed, but at least we should be able to have a usable page that can be built incrementally in several steps.
Notably parser hooks (for namespaces, or parser functions) could have their own cache independant of the cache for the fully rendered page: what is needed is to track the dependencies for reused cached fragments that the Mediawiki will use, and purging the Mediawiki page cache no longer need to purge the cache used by each hook.
As well the parser hooks can run on different servers/hosts. The main Mediawiki parser will still be able to compose the elements, but will no longer need to run hooks synchronously in a single pass. We get better scalability, common fragments with the same conditions can be parsed only once and reused (via the cache). It will be up to each cache to detect when it has been specifically flushed or is too old for the new page content changing call parameters and then decide to delegate or not the effective execution of the hook to refresh the cached result that Mediawiki uses

Note however that some hook evualations may need more time than what Mediawiki expects for composing the full page: a hook should be able to return a temporary value with a short expiry time equivalent to the estimated time where the evaluation in the hook will be complete. Hooks must then appropriately choose what to cache in their result and be able to manage the expiration time. Mediawiki should not force these hooks to restart the evaluation when the VM hosting the hook is still working on computing the results (which may take time, such as getting data from an other service, like Wikibase/Wikidata, or a Wikifunction evaluator).

For now the rendering of Mediawiki is just supposed to be monolithic and run in a single pass. That's something to change (and that will become crucially needed for wikifunctions). Mediawiki just must be able to get a representation of the last known stable state of the hook parser: getting such view should not take long: each parser should return a status fast and at least a partial result and the correct expiration time before querying it again: refreshing a mediawiki page will not necessarily force all hooks to be refreshed too.

This should apply to all hooks, including template expansion (that already have now their own cache whose dependencies are computed as a hash of the input parameters and the version of the template: this considerably improved Mediawiki. The same should apply to Lua, Wikibase, or other future evaluation functions for internationalization (results computed in different languages), or for Wikifunctions. We actualyl don't need all pages to be fully rendered with all up to date results in the given limit of 10 seconds. Mediawiki wikis would then become more reactive, would waste less ressources, and we would have less limitations on servier performances, and better choices for scalability by allowing more servers (including with different performances) to contribute to the final computing: it's perfectly acceptable for any page to sjust show partial estimates (showing also an indication that the full result may be available later): we don't need these results in less than 10 seconds, but it is perfectly acceptable if these results take minutes or hours to be computed and the page just shows the user that they can revisit it some minutes or hours later.

"Purging" pages should then be exceptional: Wiklimedia will schedule the various jobs to complete the evaluation and until then will still be able to return temporary values or estimates, or an older view of the last computed results: navigation will be much accelerated. And there will be less time dependencies across servers/services.

I like this idea quite much, as it would be an approach to incorporate Variables into Parsoid core. Exposing this caching would be a new feature, I think, which would rather be related to Variables than a replacement. I propose to bring this idea to the attention of the parsing team.

In particular, it would probably be simpler to start from scratch there.