Page MenuHomePhabricator

Make a subset of API query modules available to the Lua scripting environment
Open, LowestPublicFeature

Description

"To access the API, I think there would need to be some Lua module acting as a wrapper, and I think it also would need runtime support in the Scribunto implementation to make it use FakeRequest; after all such Lua "API calls" should not actually go over the network. In this wrapper and/or runtime it should be relatively simple to make accessible only a safe (in terms of execution time and other considerations) subset of the API. I would think that simple page-related queries such as the one that would be needed here would be part of that subset." -Lupo

Our conversation was triggered by the removal (T63268) of the ability to access the HTML of [[Special:PrefixIndex]].

Use cases:

  • Iterating over subpages to more efficiently find out which translations exist
  • ...

Event Timeline

Rillke raised the priority of this task from to Lowest.
Rillke updated the task description. (Show Details)
Rillke added a project: Scribunto.
Rillke changed Security from none to None.
Rillke subscribed.
Anomie subscribed.

Rather than trying to identify "safe" API modules, it would probably be better to add needed features directly to Scribunto.

See also T51726.

Rather than trying to identify "safe" API modules, it would probably be better to add needed features directly to Scribunto.

I don't think wrapping is the right approach. action=query tends to be extremely fast, and to build an extra layer on top of it when the whole idea is to let people experiment with it to see what comes out of it (often - not much useful, but eventually something does work) would be a much better way. In the past 2 years since this ticket has been filed, we haven't really added much in terms of exposing things like page history/contributions, page enumeration, and many other basic things, so it is clearly not a good way forward. I see messages on various boards asking for it, and we don't do anything to address it.

Just because no one has done the right thing yet doesn't mean calling the API from in the middle of the parser is suddenly less of a bad idea.

Tacsipacsi changed the subtype of this task from "Task" to "Feature Request".Feb 23 2022, 8:12 PM
Tacsipacsi subscribed.

I've created an extension to provide full API access from LUA. Not safe, not a good approach for big public wikis, but might be handly for private wikis in the meantime:

https://www.mediawiki.org/wiki/Extension:ScribuntoMediawikiApi

Of course, not a solution, just a temp workaround for those who find this page

Actually, calling the API from the middle of a parse is a pretty bad idea. You could query the list of subpages, but what would happen if later a subpage was created or deleted? Nothing, because there would be no indication of this connection, so MediaWiki would have no way to know it needs to invalidate the parser cache. A feature directly added to Scribunto could add a row to the appropriate database table to indicate this connection.

You may say it doesn’t matter as long as people are only experimenting, but those experiments can pretty quickly end up being used on millions on pages, at which point parser cache issues are a significant problem – but an invisible one, so people “experimenting” with using these methods on millions on pages probably won’t be aware of it.

  • Iterating over subpages to more efficiently find out which translations exist

This has just got less expensive two months ago in T376564: Scribunto should provide a bulk ifexists check. (I’ve started experimenting with taking advantage of it in https://commons.wikimedia.org/wiki/Module:Lang_links, but haven’t finished the work yet.) Do we have any other specific use cases for using the API directly?

Support adding LUA function "list subpages" ... this is better than the reported former HTML hack.

Oppose "Make a subset of API query modules available to the Lua scripting environment".

Support adding other LUA functions to repair the damage after crippeling "mw.text.unstrip" T63268: Abuse of Cite extension allows cross-invoke communication, as well as to provide functionality of the "API" in particular cases where need has been demonstrated.

Strong oppose "mw.text.unstrip" in its current inherently useless form returning an empty string, just use

''

instead.

Support adding LUA function "list subpages" ... this is better than the reported former HTML hack.

[…]

Support adding other LUA functions […] to provide functionality of the "API" in particular cases where need has been demonstrated.

Did you read my above comment? What is the “demonstrated need” that cannot be solved by T376564: Scribunto should provide a bulk ifexists check?

Support adding other LUA functions to repair the damage after crippeling "mw.text.unstrip" T63268: Abuse of Cite extension allows cross-invoke communication

I don’t understand what you mean by “repairing the damage” – what exactly is the damage (other than disallowing usage that shouldn’t be possible, i.e. cross-#invoke communication), and how could that be repaired? And how is this thing relevant here?

It is relevant and I did read that comment. T376564: Scribunto should provide a bulk ifexists check does not resolve the problem "removal of access the HTML of [[Special:PrefixIndex]]" (see far above, this is what this item is about), it just could make brute-forcing cheaper. The war on cross-#invoke communication is lossy (causes problems for module developers), while the benefit has not been disclosed yet.

Okay, I see your points, but disagree with them.

T376564: Scribunto should provide a bulk ifexists check does not resolve the problem "removal of access the HTML of [[Special:PrefixIndex]]" (see far above, this is what this item is about), it just could make brute-forcing cheaper.

“Brute-forcing” is currently the only solution that ensures proper handling of the parser cache: if a translation is created or deleted, pages listing all translations need to be purged from the cache, otherwise the new translation will not appear / the deleted translation will continue to appear for an indefinite amount of time. Given that translations cannot be shown without having a recognized subpage name (otherwise we wouldn’t know what the link text should be), the list of subpages needs to be filtered anyway, so I don’t think filtering upfront (compiling a list of possible subpages based on the list of known language codes and then using an SQL IN query) is any worse than filtering afterwards (using an SQL LIKE query and then dropping any subpages whose names are not on the list of known language codes).

The war on cross-#invoke communication is lossy (causes problems for module developers), while the benefit has not been disclosed yet.

Was any actual use of cross-#invoke communication demonstrated? I don’t see anything either in this task or in T63268. The benefit of forbidding it is that forbidding it makes partial updates much easier, for example:

  • if you edit a template in the visual editor, it doesn’t have to update the whole page to make real-time WYSIWYG preview reliable;
  • if you preview a section in the traditional editor, you won’t face even more surprises in the form of other parts of the page (which weren’t visible in the preview as they’re in different sections) changing;
  • in the future, it could happen (AFAIK there are plans for this) that after an edit to a template, only the relevant parts of the transcluding pages need to be updated to make these updates more efficient – it’s much easier to tell what “relevant parts” are if there is no cross-#invoke communication.

Thanks ... that was decently relevant and sane (except VisualEditor). I do NOT advocate for cross-#invoke communication being a bad hack, but I do not consider it a major problem either. However I advocate for replacement solutions for problems caused by the T63268: Abuse of Cite extension allows cross-invoke communication removal of cross-#invoke communication:

Indeed brute-forcing through a list of known langcodes together with T376564: Scribunto should provide a bulk ifexists check helps if the subpages in question are for langcodes. But there are other types of subpages that cannot be picked from a list with known langcodes, thus replacement for HTML of [[Special:PrefixIndex]] is still desirable.

This suffers from mostly the same problems, even if it’s more limited in use:

  • Changing one part of the page (the heading) may affect large technically independent parts of the page (anything in that section), so the entire section needs to be updated both during VE editing and upon future partial updates, even if nothing actually uses the heading text there.
  • While it shouldn’t be possible that a section preview affects a non-edited part of the page, it’s still possible that a section preview is affected by a non-edited part of the page (e.g. if one is editing a ===third-level heading===, and a ==second-level heading== is read out), which also makes things harder to understand and section previews more difficult to generate. (If there is a <ref name="…" /> in the edited section, and the actual definition of the reference is in another section, a warning is generated instead of reading out the reference from elsewhere. What would this feature do in a similar situation? Return a warning? Return an empty string? Both would break things badly.)

Indeed brute-forcing through a list of known langcodes together with T376564: Scribunto should provide a bulk ifexists check helps if the subpages in question are for langcodes. But there are other types of subpages that cannot be picked from a list with known langcodes, thus replacement for HTML of [[Special:PrefixIndex]] is still desirable.

“Presumably was not even expensive” – it was very much expensive in the everyday sense of “expensive”: database queries with large results, dramatically reducing the expiry of the parser cache (from weeks to an hour, meaning the page needs to be reparsed every time it’s loaded by someone if the last load was more than an hour ago). So if it wasn’t marked as expensive, that was a performance problem, as editors could slow down the server. And on top of that, it wasn’t even reliable, as “what links here” and similar queries didn’t use to be updated with changes every time the list changed.

You still haven’t cited any actual use cases for these two features.

This suffers from mostly the same problems, even if it’s more limited in use:

  • Changing one part of the page (the heading) may affect large technically independent parts of the page (anything in that section), so the entire section needs to be updated both during VE editing and upon future partial updates, even if nothing actually uses the heading text there.
  • While it shouldn’t be possible that a section preview affects a non-edited part of the page, it’s still possible that a section preview is affected by a non-edited part of the page (e.g. if one is editing a ===third-level heading===, and a ==second-level heading== is read out), which also makes things harder to understand and section previews more difficult to generate. (If there is a <ref name="…" /> in the edited section, and the actual definition of the reference is in another section, a warning is generated instead of reading out the reference from elsewhere. What would this feature do in a similar situation? Return a warning? Return an empty string? Both would break things badly.)

Indeed brute-forcing through a list of known langcodes together with T376564: Scribunto should provide a bulk ifexists check helps if the subpages in question are for langcodes. But there are other types of subpages that cannot be picked from a list with known langcodes, thus replacement for HTML of [[Special:PrefixIndex]] is still desirable.

“Presumably was not even expensive” – it was very much expensive in the everyday sense of “expensive”: database queries with large results, dramatically reducing the expiry of the parser cache (from weeks to an hour, meaning the page needs to be reparsed every time it’s loaded by someone if the last load was more than an hour ago). So if it wasn’t marked as expensive, that was a performance problem, as editors could slow down the server. And on top of that, it wasn’t even reliable, as “what links here” and similar queries didn’t use to be updated with changes every time the list changed.

You still haven’t cited any actual use cases for these two features.

frwiki rcu and all wikitionnaire

frwiki rcu

What does “rcu” mean?

and all wikitionnaire

This is not an actual use case, just a project. What would all Wiktionaries use the output of {{Special:PrefixIndex}} or <categorytree> for?

frwiki rcu

What does “rcu” mean?

This is probably about what was discussed in T407880.

So neither of them are use cases for interpreting the output of {{Special:PrefixIndex}} or <categorytree>, which is what I asked for. For reading the previous heading, the use cases are – and were already – clear, only the technical feasibility isn’t.

So neither of them are use cases for interpreting the output of {{Special:PrefixIndex}} or <categorytree>, which is what I asked for. For reading the previous heading, the use cases are – and were already – clear, only the technical feasibility isn’t.

why this is not feaseable