Page MenuHomePhabricator

RFC: Future of Scribunto
Open, Needs TriagePublic

Description

This bug is a request for comments and an invitation to a discussion regarding Scribunto.

Introducing client-written server-side scripting in a proper programming language was a great step forward for MediaWiki. Now, complicated behaviour can be relatively easily programmed without monstrous chains of {{#if:}}'s, {{#ifeq:}}'s and {{#switch:}}'es. This is instrumental to converting low-quality input made by an inexperienced user to high-quality wiki content.

Yet the idea of allowing server-side scripting in MediaWiki was not immediately accepted. Several years before Scribunto, a proposal to install Winter extension on Wikimedia sites was rejected. Among the objections were the security considerations and outright "MediaWiki is not a bloody programming language". This proves that what once seemed impracticable can become desirable.

I welcomed Scribunto and installed it on the wiki I run as soon as it became technically possible. It added many new opportunitiers. However, as time passed, I started to notice limitations. Here are some of them:

  • Lua version is 5.1 and this is really an old one. I miss the features introduced in later versions (5.4.0 is now available) like a metamethod for ipairs and pairs (missed also by lusandbox developers, so they backported it), yielding across C calls, etc.
  • I am dissatisfied about the fact that I have to patch and re-compile luasandbox to enable some standard Lua libraries, like coroutine. I would prefer to switch in on in php.ini.
  • I would also like to have a way to connect Lua libraries in C like lrexlib for PERL-compatible regular expressions, better then patching and rebuilding luasandbox.
  • (UPD) Metatables for userdata will be needed if new libraries will be enabled.
  • Some day, perhaps, I would like to have server-side scripts in languages other than Lua--not because they are better, but to avoind rewriting existing libraries in Lua.

I know that there are objections to any change that would aim to fulfil these wishes:

  • Enabling new features is a security risk. To this I answer that it ought to remain wiki site admin's responsibility.
  • There is already a significant code base in Lua 5.1 in Wikimedia projects, and 5.2 is not fully backward-compatible. My opinion is that obsolete code base is a problem and it will only get worse with time. Something should be done about it as early as possible.

To address these issues, perhaps, a revision of both Scribunto's and luasandbox's architectures is needed. My proposals below can be based on insufficient information, be too brave or vague but I think they should be considered:

  • The information on page content model in Module: namespace should include the language (first, only Lua, later, perhas, others) and its version.
  • The same page can be in multiple versions of Lua, like 5.1,5.2,5.3, which means that the code is expected to work under any listed version of Lua.
  • The language and version compability checks can be automatic,
  • It can be based not only on successful interpretation but also on automated unit test: an assertion defined by the module author.
  • Scribunto settings should define separate engines for different languages and their versions.
  • There can be several luasandboxes or one with several lua environments (if the conflict between lua libraries can be avoided).
  • Perhaps an initialisation script written in Lua, set in php.ini and invoked on PHP startup without the limitations of sandboxed Module: scripts, will be useful; e.g.: luasandbox.ini = "rex = require 'rex_pcre2.so'".

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 15 2018, 10:48 AM
Anomie added a subscriber: Anomie.EditedMar 15 2018, 11:49 AM

Several years before Scribunto, a proposal to install Winter extension on WikiMedia sites was rejected.

Do you have a link to that discussion?

Among the objections were the security considerations and outright "MediaWiki is not a bloody programming language". This proves that what once seemed impracticable can become desirable.

Looking at the extension's page, I see "Winter is very dangerous on a publicly editable site, because it potentially allows for arbitrary JavaScript code to be inserted". That is a very serious security issue that would absolutely prevent installation, with "MediaWiki is not a bloody programming language" arguments being irrelevant.

Lua version is 5.1 and this is really an old one. I miss the features introduced in later versions (5.4.0 is now available) like a metamethod for ipairs and pairs (missed also by lusandbox developers, so they backported it), yielding across C calls, etc.

That is already discussed in T178146: Add support for Lua 5.2 or 5.3 to luasandbox. And we've backported __ipairs and __pairs to 5.1 in Scribunto too, not just LuaSandbox.

I am dissatisfied about the fact that I have to patch and re-compile luasandbox to enable some standard Lua libraries, like coroutine. I would prefer to switch in on in php.ini.

A request for the coroutines library is already filed as T49799: Scribunto should allow coroutines in Lua. IIRC the main reason it doesn't exist yet is that it hasn't been reviewed for compatibility with the sandboxing requirements of Scribunto.

Other libraries have been excluded for security reasons.

I would also like to have a way to connect Lua libraries in C like lrexlib for PERL-compatible regular expressions, better then patching and rebuilding luasandbox.

See T52454: Include a RegEx library for Lua for lrexlib. In general, we'd have to do extra work to make loading Lua C libraries possible in LuaSandbox, and they might well break in the sandbox or break the sandbox.

EDIT: See also T63432: Wiki system admins should be able to whitelist installed Lua libraries for the general case.

Some day, perhaps, I would like to have server-side scripts in languages other than Lua--not because they are better, but to avoind rewriting existing libraries in Lua.

Scribunto's architecture was designed to be language-agnostic, with the language in use on the wiki selected in configuration. Although given the lack of any uses of that capability I'm sure there are places that would need changing to make it work.

If you're wanting Lua and another language both active, that would require some refactoring. Some people already complain about having to know wikitext and Lua. But having any "real" language is a major advantage, outweighing that added complexity. Adding a third possible language on a wiki would only increase the complexity. It would have to be carefully considered whether the additional complexity is worth whatever advantages it brings.

See also existing feature request tasks T61101: Support server-side JavaScript and T64309: Support Rexx programming language.

Enabling new features is a security risk. To this I answer that it ought to remain wiki site admin's responsibility.

On the other hand, the Wikimedia Foundation is not obliged to expend significant extra work to add security holes hidden behind configuration variables when it's more secure to just leave the security holes out.

If you want a non-sandboxed Lua extension, Extension:Lua exists.

There is already a significant code base in Lua 5.1 in WikiMedia projects, and 5.2 is not fully backward-compatible. My opinion is that obsolete code base is a problem and it will only get worse with time. Something should be done about it as early as possible.

That is one consideration, although we could potentially support both 5.1 and some later version. But that's not the only reason, and in particular you might want to read Tim's comments on https://gerrit.wikimedia.org/r/#/c/139479/.

  • The information on page content model in Module: namespace should include the language (first, only Lua, later, perhas, others) and its version.
  • The same page can be in multiple versions of Lua, like 5.1,5.2,5.3, which means that the code is expected to work under any listed version of Lua.
  • The language and version compability checks can be automatic, [...]
  • Scribunto settings should define separate engines for different languages and their versions.
  • There can be several luasandboxes or one with several lua environments (if the conflict between lua libraries can be avoided).

If we were going to actively support multiple languages or versions, then yes, we'd have to do something like that. But that's begging the question of whether we actually want to do that.

  • It can be based not only on successful interpretation but also on automated unit test: an assertion defined by the module author.

That would be an on-save check, much as the current on-save check works.

  • Perhaps an initialisation script written in Lua, set in php.ini and invoked on PHP startup without the limitations of sandboxed Module: scripts, will be useful; e.g.: luasandbox.ini = "rex = require 'rex_pcre2.so'".

I note LuaSandbox currently doesn't even load modules like package. To do that, we'd have to load them and then "unload" them, and the sysadmin would have to hope whatever code is in the startup script didn't copy them somewhere or add other security leaks.

The first three bugs that you have mentioned were created by myself (I intend to close them in favour of this one). I came to a conclusion that those issues are parts of a bigger picture.

Anomie added a comment.EditedMar 15 2018, 12:26 PM

Only one of the five tasks I mentioned was authored by @alex-mashin, that being T178146: Add support for Lua 5.2 or 5.3 to luasandbox. I also just came across T63432: Wiki system admins should be able to whitelist installed Lua libraries which was authored by you and will edit into my earlier post.

Personally, I doubt it'll do much good to try to lump many feature requests into one task, unless you're trying to get buy-in for you to actually implement all this. And even in that case, it seems unlikely you'd get that buy-in for the T52454/T63432 bit. But I won't try to stop you.

The first three bugs that you have mentioned were created by myself (I intend to close them in favour of this one).

That does not sound like a good idea. Ideally, a task should have a clear scope and not be a bundle of several aspects to perform.

Several years before Scribunto, a proposal to install Winter extension on WikiMedia sites was rejected.

Do you have a link to that discussion?

Sorry, it was abot ten years ago, and I did not take part in it, only watched; and cannot find it now. Could be in some mailing list or IRC.

Looking at the extension's page, I see "Winter is very dangerous on a publicly editable site, because it potentially allows for arbitrary JavaScript code to be inserted".

I've seen this warning, don't know how its author came to that conclusion and whether it is an easily correctable bug or a fundamental design flaw. I don't remember its being mentioned that time.

A request for the coroutines library is already filed as T49799: Scribunto should allow coroutines in Lua. IIRC the main reason it doesn't exist yet is that it hasn't been reviewed for compatibility with the sandboxing requirements of Scribunto.

Does it need to be reviewed if coroutines are enabled by a configuration setting switched off by default? A warning in documentation will do.

Scribunto's architecture was designed to be language-agnostic, with the language in use on the wiki selected in configuration. Although given the lack of any uses of that capability I'm sure there are places that would need changing to make it work.

I suppose this feature was not added to remain unused for ever?

If you're wanting Lua and another language both active, that would require some refactoring. Some people already complain about having to know wikitext and Lua. But having any "real" language is a major advantage, outweighing that added complexity. Adding a third possible language on a wiki would only increase the complexity. It would have to be carefully considered whether the additional complexity is worth whatever advantages it brings.

Advantage will include re-use of code initially written not for wikis and opening a path to unification of server-side and client-side code (if server-side JS is enabled).

On the other hand, the Wikimedia Foundation is not obliged to expend significant extra work to add security holes hidden behind configuration variables when it's more secure to just leave the security holes out.

I presume that Scribunto and luasandbox is developed by an open community not by Wikimedia foundation alone, at least, in theory?

If you want a non-sandboxed Lua extension, Extension:Lua exists.

I do value sandboxing. I want a way to configure it with all caution and care.

in particular you might want to read Tim's comments on https://gerrit.wikimedia.org/r/#/c/139479/.

I have read them. His fellings about Lua (its fragmentation and attitude to outside contributions in particular) remind me of what I sometimes feel about MediaWiki and its extensions. I suppose submitting changes upstream is always a pain in open-source community.

But this is not only Lua's problem; and sticking to one old version of Lua for ever is not its best solution: the result will be a separate, isolated ecosystem of "MediaWiki Lua" lacking the access to most of Lua codebase.

And LuaJit is no longer supported by luasandbox as far as I understand?

That would be an on-save check, much as the current on-save check works.

This is encouraging.

I note LuaSandbox currently doesn't even load modules like package. To do that, we'd have to load them and then "unload" them,

Perhaps, the globals can be loaded and then selectively set to nil in the initialisation script (not in library.c as it is now)--or not set, is the admin changes it.

and the sysadmin would have to hope whatever code is in the startup script didn't copy them somewhere or add other security leaks.

The sysadmin will control this code and will copy the dangerous modules elsewhere if he wishes so.

alex-mashin updated the task description. (Show Details)Mar 16 2018, 6:38 AM
Anomie added a subscriber: Legoktm.Mar 16 2018, 2:48 PM

Several years before Scribunto, a proposal to install Winter extension on WikiMedia sites was rejected.

Do you have a link to that discussion?

Sorry, it was abot ten years ago, and I did not take part in it, only watched; and cannot find it now. Could be in some mailing list or IRC.

It doesn't seem to make a very good example to try to support your assertions here, especially if we're asked to rely on your memory of something a decade ago.

A request for the coroutines library is already filed as T49799: Scribunto should allow coroutines in Lua. IIRC the main reason it doesn't exist yet is that it hasn't been reviewed for compatibility with the sandboxing requirements of Scribunto.

Does it need to be reviewed if coroutines are enabled by a configuration setting switched off by default? A warning in documentation will do.

OTOH, why should we go to the trouble of adding a switch with a "warning: don't flip this switch, we don't know if it'll break things" on it?

Scribunto's architecture was designed to be language-agnostic, with the language in use on the wiki selected in configuration. Although given the lack of any uses of that capability I'm sure there are places that would need changing to make it work.

I suppose this feature was not added to remain unused for ever?

I don't know why it was added. But "it exists, therefore we must use it" isn't a useful argument.

Advantage will include re-use of code initially written not for wikis and opening a path to unification of server-side and client-side code (if server-side JS is enabled).

I'm skeptical about "re-use of code initially written not for wikis", particularly in light of licensing issues with the way most wikis are set up.

I'm even more doubtful of "opening a path to unification of server-side and client-side code", since that same argument was used by the proponents of nodejs services and has yet to produce a single result that I'm aware of. Considering that Scribunto runs in the context of a page parse while client-side scripts run in the context of a page view, this reuse seems even more doubtful.

I presume that Scribunto and luasandbox is developed by an open community not by Wikimedia foundation alone, at least, in theory?

In theory, sure. In practice, it's mostly just me working on the code these days, with reviews mostly from @Legoktm. And Lego does the packaging for Debian and other release-oriented things.

But this is not only Lua's problem; and sticking to one old version of Lua for ever is not its best solution: the result will be a separate, isolated ecosystem of "MediaWiki Lua" lacking the access to most of Lua codebase.

To some extent that depends on whether "most of Lua codebase" follows the latest version of Lua in backwards-incompatible ways or not.

It also depends on whether "MediaWiki Lua" actually uses much of anything from "most of Lua codebase". From what I've seen, that does not seem to be the case.

And LuaJit is no longer supported by luasandbox as far as I understand?

Correct. It wasn't being tested, and wasn't being used by anyone we knew of. JITs came up as a potential issue for T184156, so we decided to just remove it.

I note LuaSandbox currently doesn't even load modules like package. To do that, we'd have to load them and then "unload" them,

Perhaps, the globals can be loaded and then selectively set to nil in the initialisation script (not in library.c as it is now)--or not set, is the admin changes it.

As I said, "To do that, we'd have to load them and then 'unload' them". Which has potential risks and drawbacks.

and the sysadmin would have to hope whatever code is in the startup script didn't copy them somewhere or add other security leaks.

The sysadmin will control this code and will copy the dangerous modules elsewhere if he wishes so.

You're putting a lot of responsibility on this hypothetical sysadmin.

Krinkle updated the task description. (Show Details)Jan 15 2019, 4:45 AM