Page MenuHomePhabricator

Scribunto external dependencies - roadmap and requirements
Open, Needs TriagePublic

Description

It has been proposed that we allow users to deploy Lua code which is stored in Git to multiple wikis. Let's figure out what that would look like.

We want something lightweight, and yet we want to take steps towards a full solution, we don't want to preclude future improvements. So we need a plan.

Package granularity

When you are running code from one version of a module, and call require() to get a submodule, you should never get code from a different version of the same module. So instead of modules and submodules, we should talk about packages.

Packages should be in a consistent state during any given parse operation.

require()

We should encourage code that looks like Lua code elsewhere on the internet. Consider LuaRocks as a source of packaging conventions.

In the Lua ecosystem, require() takes a string in hierarchical dotted form, for example require "busted.modules.files.moonscript". There are no relative paths, the string is always globally unique. The first component is the package name.

LuaRocks packages give an explicit map between module names (the string passed to require) and file names (example). The top level name may be defined as a module, conventionally init.lua (example).

Deployment scope

Packages may require() other packages. An on-wiki module may require() any other on-wiki module. Thus the scope of a set of deployed packages and their versions is the whole wiki. A wiki depends on a package. Different modules can't depend on different versions of a package.

The deployment state must be consistent within the context of a parse operation.

With global templates, you can consider the global repository to be a package with its own recursive package dependencies.

Migration and multi-version deployments

@daniel asked if it's possible to allow callers to specify package versions, like require('package@1.0'). Here is a summary of our discussion.

  • Motivation: Say if a library will change, deploying a backwards-incompatible version of the same interface. There might be say 20 on-wiki callers. Instead of adding forwards-compatible branches to all 20 callers, it would be easier to explicitly require the old version, then simultaneously update the calling code and the required version by editing each caller.
  • Mitigation: Discourage backwards-incompatible changes of that kind. Have a deprecation cycle and provide a new interface instead of breaking the existing interface. Remove the old interface when it has no callers. If this is not convenient, make a new package, so instead of require('package@2.0'), you would have require('package2').
  • Con: Specifying the version may increase the effort required for uncomplicated updates. The dashboard proposal (T412317) has updates as one or two clicks, whereas this would potentially require editing many pages.
  • Con: Specifying the version in the caller is unconventional. For example, we don't version the Lua standard library or the mw library, so we are committed to backwards compatibility in those areas.
  • Con: Packages will depend on each other using version constraints in a manifest file, they don't need to specify the version in require. Multiple packages can be updated in one deployment action. Encouraging this require syntax will complicate moving code from the wiki to packages and back.
  • Con: Some packages in LuaRocks assign globals instead of returning a table of functions from require. This would not be possible if we are loading multiple versions of a library in the context of a parse.
  • Con: Specifying the version complicates sandbox and pilot deployments. If a module explicitly requires an old version, under what circumstances can we render a page using a pilot or test version?
  • Con: Purging of old versions becomes more difficult. If users can specify any version, then we need to cache every version.

We're not going to do this.

Development cycle

Developers must be able to see the effects of their code on a real wiki without making a merge request or deploying the change. Consider two types of development workflow:

  • On-wiki. A developer works on module pages, perhaps under a sandbox hierarchy on a production wiki. When they are satisfied, they want to export the current state of the sandbox as a merge request.
  • Local files. A developer checks out the package from git and edits in a local IDE. They have a local MediaWiki installation. During parse, MediaWiki fetches modules directly from the development filesystem.

It follows that there must be at least three types of repository client:

  • Local sandbox
  • Filesystem
  • Server cache

Package registry

Users should refer to packages by name, not by URL, so that URLs can be updated globally, and so that packages can easily be overridden during local development. So we need some way to find named packages.

Options:

  • Just a prefix, e.g. https://gitlab.wikimedia.org/lua-repos/$name
  • Self hosted LuaRocks server. This can solve version dependencies for us, but it has a lot of baggage to take on. Packages would need a rockspec file, which needs to be renamed and updated at each release.
  • Some simple DIY map, like a YAML file in a git repo.

Versions

Tagging packages with a version number before deployment enables the following features:

  • Packages depending on other packages with version constraints.
  • Wikis depending on packages with versions constraints.
  • Human-readable package versions, displayed on the wiki.

We could cope with commit hashes, but version numbers seem nice to have.

Adding and updating deployed packages

Fetching is triggered via an unauthenticated endpoint and is allowed as long as the specified repository is under a configured prefix. Typically, fetching will be triggered by a GitLab webhook (T412320). Packages are fetched to shared tables and are available globally.

However, to use a package on a wiki, it must be deployed, and this is requires a user right. A user edits a wiki page, perhaps via a dashboard special page (T412317), adding the package to the list of deployed packages. Similarly, updating the deployed version of a package is done by editing the version on this wiki page. Deployment is local to the wiki.

Parser integration

The simplest thing is to have no additional #invoke feature. Users can make a proxy module Module:Somepackage:

return require "somepackage"

And then {{#invoke:somepackage|func}} will work.

require() could register a link from the page being parsed to the package name, so that the parser cache can be invalidated when a new version of the package is deployed.

In the future we might like to have global creation or shadowing of these proxy pages. That would fit in well with a global templates feature. Similarly trans-wiki inclusion of a proxy module page, like {{#invoke:commons:somepackage|func}}, would fit under the global templates banner but is not necessary for the present work.

Caching repository client

For availability and performance, there should be no access to GitLab during parse.

Downloading new versions of packages and inserting them into persistent storage should be done before allowing the deployment page to be changed.

Rollback of a deployment page to a previous version is a local operation as long as the cache has not been purged.

For performance reasons when scaling this up to 1000+ wikis, package files should not be stored as pages. The persistent cache can then be shared across all wikis. There could be a package file viewer giving read access to this cache.

Gadget synergies

There is a similar request for gadgets in git (T187749). For example, the caching repository client would be useful for Gadgets. Code that can be shared between Scribunto and Gadgets could be in a separate extension.

Event Timeline

When considering the local files development cycle, would you expect that interacting with the upstream git repository is the typical way to submit a merge request, or that use of a properly configured wiki sandbox would be needed? I think I am trying to reason about two separate aspects here:

  • What authentication requirements will the git code forge side need? I could imagine workflows that demand Wikimedia SUL integration, our more standard Wikimedia Developer account integration, or even a workflow like our OpenStack Horizon plugin for managing Puppet ENC data where a bot account makes git commits on behalf other users.
  • What does the on-wiki revision attribution look like? Are all revisions attributed to Wikimedia SUL accounts? Are some revisions attributed to git commit authors external to the wiki's auth system?

@cscott and I chatted a bit about this yesterday.

I don't think developers submitting PRs to the code forge need to be authenticated against their on-wiki / wikimedia accounts (although we could make that a policy to start with if we want to control what code gets submitted). But, I think we want to limit who gets to approve merges of code. It could be limited to trusted editors (those who would have those rights on-wiki). The details of this will need to be worked out especially wrt devs from different wikis collaborating. I think these policy details can be worked out separately from the pull-from-git-forge-to-a-central-db + sync-from-central-db-to-wiki steps.

If we want to import entire git history of a file as on-wiki revision history, then how git accounts are tied to on-wiki accounts matter. If we are only syncing and only the sync-time master version of the code shows up as a revision on wiki (which I think is what we want per the package constraints that Tim lays out in the description), then the on-wiki attribution will be to the bot / admin / editor who syncs code onto the wiki. Any tools that are interested in attribution history will need to look at the git repo history.

Any tools that are interested in attribution history will need to look at the git repo history.

I guess this would still work with the global Terms of Use which state that attribution can be:

ii. Through hyperlink (where possible) or URL to an alternative, stable online copy that is freely accessible, which conforms with the relevant license, and which provides credit to the authors in a manner equivalent to the credit given on the Project Website;

It does seem that any import system could also produce an attributed revision of the wiki page for each revision in the diff being sent to the wiki if we wanted that. It would require some state tracking on-wiki that could be used to compute which revisions were missing, but I think that might be as simple as placing the git commit hash in the revision's comment record in a reparsable manner. That style of tracking would likely have challenges if on-wiki edits over the imported content are allowed. A git centric way to manage that would be a patch chain that is stashed and then rebased on the upstream origin, but MediaWiki's revision system doesn't work that way as far as I understand it.

One thing that is not clear to me is how testing would take place when Lua modules are no longer developed on-wiki. While gadgets can be tested while they're sitting on your local filesystem using the localhost import method, I'm not aware of an equivalent way to test a module without saving it on-wiki. API:Scribunto-console can test output from an not-yet-saved module, but it's text-based and cannot display any html. Special:TemplateSandbox (and its API:Parse extensions) can display previews, but requires the modules to be saved on-wiki first.

Maybe, a solution could be to extend Special:TemplateSandbox to allow reading in local code using the File System API.

Maybe, a solution could be to extend Special:TemplateSandbox to allow reading in local code using the File System API.

I did have a couple of ideas, in the task description under the heading "Development cycle", not that one though so thanks for that, I'll consider it.

Maybe, a solution could be to extend Special:TemplateSandbox to allow reading in local code using the File System API.

I did have a couple of ideas, in the task description under the heading "Development cycle", not that one though so thanks for that, I'll consider it.

I split this out to T415631: Produnto sandbox: previewing unreleased changes.

It has been proposed that we allow users to deploy Lua code which is stored in Git to multiple wikis.

Proposed by whom?

Have any on-wiki module developers been consulted about this?

It's already possible to put Lua code in Git and to call it from all the wikis. See T209310, for example.

It's actually done very rarely, however, which shows that adding Git to the process is way too complicated for most people.

It has been proposed that we allow users to deploy Lua code which is stored in Git to multiple wikis.

Proposed by whom?

This is coming from the Content Transform team who asked me to help with the development and implementation. @MSantos is doing product management.

It has been proposed that we allow users to deploy Lua code which is stored in Git to multiple wikis.

Proposed by whom?

@Amire80 you can find more about the rationale behind in this post.

Have any on-wiki module developers been consulted about this?

User research is ongoing and this project is being considered an experiment to assess feasibility.

It's already possible to put Lua code in Git and to call it from all the wikis. See T209310, for example.

It's actually done very rarely, however, which shows that adding Git to the process is way too complicated for most people.

I didn't understand how similar is the linked task to the scope of the work here. But the intention is not to just allow "Git" to be the source repository, but encourage an ecosystem that will improve the developer experience and expand it to gadgets, user scripts, etc in the future if the experiment is successful.

Could you elaborate or share more links that can help me understand your point?

It has been proposed that we allow users to deploy Lua code which is stored in Git to multiple wikis.

Proposed by whom?

@Amire80 you can find more about the rationale behind in this post.

Thanks. Is this related to any point in the WMF annual plan?

Have any on-wiki module developers been consulted about this?

User research is ongoing and this project is being considered an experiment to assess feasibility.

I am very interested in reading more about this research. I've been doing such research informally myself since 2018, and my findings clearly point to the fact that what is actually needed is a way to share the code of modules (and someday, templates) that are stored on wiki pages, like they are now. Not in Git.

It's already possible to put Lua code in Git and to call it from all the wikis. See T209310, for example.

It's actually done very rarely, however, which shows that adding Git to the process is way too complicated for most people.

I didn't understand how similar is the linked task to the scope of the work here. But the intention is not to just allow "Git" to be the source repository, but encourage an ecosystem that will improve the developer experience and expand it to gadgets, user scripts, etc in the future if the experiment is successful.

Could you elaborate or share more links that can help me understand your point?

The task T209310 is an example of moving the code of an on-wiki Scribunto module to a Git repository. It shows that this is already possible. However, there are almost no other examples of this, so the fact that it's already possible to put Lua code in Git and call it from on-wiki modules didn't cause many module developers to actually use it and move a lot of modules to Git, even common and stable ones. That's because modules (and templates) live in the wiki and Git is not the wiki. Git is the problem and not the solution; an "easier" Git repo for modules will still be a problem, just as much as Gerrit is.

If you try to set up a Git repository, and it will not be successful, you may think that wiki editors aren't really interested in sharing the code of modules. This will be wrong, of course: the reason for the failure of the experiment will be the need to use Git and not the lack of motivation to share the code of modules.

(Also, expanding it to gadgets and user scripts is not really necessary. It's already possible to share gadgets and user scripts that are stored on wiki pages. It's a bit clunky, and it can be improved, but it works. HotCat is a pretty good example. What it really needs to be expanded to is templates.)

The task T209310 is an example of moving the code of an on-wiki Scribunto module to a Git repository. It shows that this is already possible. However, there are almost no other examples of this, so the fact that it's already possible to put Lua code in Git and call it from on-wiki modules didn't cause many module developers to actually use it and move a lot of modules to Git, even common and stable ones.

Is your point really that modules can be shared by putting them in Scribunto git repository? That's like saying we can have global gadgets today by putting gadget code inside the Gadgets extension, or that we can have a repository of fonts by putting them in ULS. None of that is practical since maintainers of Scribunto or Gadgets or ULS aren't responsible for individual modules or gadgets or fonts, respectively. Including Module:No_globals in Scribunto made sense because it's extremely generic.

(Also, expanding it to gadgets and user scripts is not really necessary. It's already possible to share gadgets and user scripts that are stored on wiki pages. It's a bit clunky, and it can be improved, but it works. HotCat is a pretty good example.

The hacks used by HotCat causes an extra network request and makes no use of ResourceLoader caching or minification. That's not something that can be improved without fundamental new infrastructure for cross-wiki loading.

The task T209310 is an example of moving the code of an on-wiki Scribunto module to a Git repository. It shows that this is already possible. However, there are almost no other examples of this, so the fact that it's already possible to put Lua code in Git and call it from on-wiki modules didn't cause many module developers to actually use it and move a lot of modules to Git, even common and stable ones.

Is your point really that modules can be shared by putting them in Scribunto git repository? That's like saying we can have global gadgets today by putting gadget code inside the Gadgets extension, or that we can have a repository of fonts by putting them in ULS. None of that is practical since maintainers of Scribunto or Gadgets or ULS aren't responsible for individual modules or gadgets or fonts, respectively. Including Module:No_globals in Scribunto made sense because it's extremely generic.

That's exactly what I'm saying: It's possible, but not practical.

This task is exploring how to "allow users to deploy Lua code which is stored in Git to multiple wikis", and I'm saying that it's already possible, but almost no one is actually doing it, and adding a new workflow that involves Git differently is not going to work.

(Also, expanding it to gadgets and user scripts is not really necessary. It's already possible to share gadgets and user scripts that are stored on wiki pages. It's a bit clunky, and it can be improved, but it works. HotCat is a pretty good example.

The hacks used by HotCat causes an extra network request and makes no use of ResourceLoader caching or minification. That's not something that can be improved without fundamental new infrastructure for cross-wiki loading.

Yes, as I said, it's clunky. But some kind of sharing is possible with gadgets, and it is actually used. My explanation for why it is actually used despite the clunkiness is that it's in the wiki and not in Git.

Don't get me wrong: my opinion is that Git is great and that it's a good solution for most software development projects. But my own opinion doesn't matter much—the opinion of the heavy developers and users of on-wiki Scribunto modules matters much more. You have to convince them, not me, that Git is a good solution, otherwise they won't use it. That's why I asked about user research in one of my previous comments. The last collaborator is your audience.

That's exactly what I'm saying: It's possible, but not practical.

This task is exploring how to "allow users to deploy Lua code which is stored in Git to multiple wikis", and I'm saying that it's already possible, but almost no one is actually doing it, and adding a new workflow that involves Git differently is not going to work.

This is a non-sequitur. Because people aren't currently using a non-solution doesn't imply that they will not use an actual solution once one exists.

Being technically possible doesn't mean it's possible. There is no chance that Scribunto repo would accept code for buliding infoboxes or location maps or currency converters, even though they're all fine candidates for global modules.

The hacks used by HotCat causes an extra network request and makes no use of ResourceLoader caching or minification. That's not something that can be improved without fundamental new infrastructure for cross-wiki loading.

Yes, as I said, it's clunky. But some kind of sharing is possible with gadgets, and it is actually used. My explanation for why it is actually used despite the clunkiness is that it's in the wiki and not in Git.

On the contrary, the fact that people are using a clunky solution seems like a good indicator that there's appetite for a better solution. Besides, syncing gadgets from git is a popular request (T187749) and many gadgets are already developed in git: https://github.com/wikimedia-gadgets.

That's exactly what I'm saying: It's possible, but not practical.

This task is exploring how to "allow users to deploy Lua code which is stored in Git to multiple wikis", and I'm saying that it's already possible, but almost no one is actually doing it, and adding a new workflow that involves Git differently is not going to work.

This is a non-sequitur. Because people aren't currently using a non-solution doesn't imply that they will not use an actual solution once one exists.

Being technically possible doesn't mean it's possible. There is no chance that Scribunto repo would accept code for buliding infoboxes or location maps or currency converters, even though they're all fine candidates for global modules.

It could also be done in another extension, not necessarily Scribunto. And yet, it's not done. So I have a reason to suspect that Git is the problem. I might be wrong. But I might also be right. I just hope that if this project is done, and no one ends up using it, decision makers will understand that it's not because there is no community demand for module code sharing (there very definitely is), but because Git is probably not the tool that the community wants to use for it.

The hacks used by HotCat causes an extra network request and makes no use of ResourceLoader caching or minification. That's not something that can be improved without fundamental new infrastructure for cross-wiki loading.

Yes, as I said, it's clunky. But some kind of sharing is possible with gadgets, and it is actually used. My explanation for why it is actually used despite the clunkiness is that it's in the wiki and not in Git.

On the contrary, the fact that people are using a clunky solution seems like a good indicator that there's appetite for a better solution. Besides, syncing gadgets from git is a popular request (T187749) and many gadgets are already developed in git: https://github.com/wikimedia-gadgets.

OK, maybe. But gadgets and user scripts are still very different from modules and templates, because gadgets and user scripts are mostly for advanced editing and reading UI, whereas modules and templates are embedded into the actual content, so they also require parsing support and not just UI integration. Maybe they can have the same storage location, but they are very different in a lot of other ways, and I'm really not sure that talking about them within the same project is right.