Page MenuHomePhabricator

Add ability to generate a list of pages based on prefix to Scribunto/Lua
Open, HighPublic

Description

Looking at https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual, I don't see a way currently to generate a list of pages based on prefix. For example, I wanted to write a module that would take each page listed at https://meta.wikimedia.org/wiki/Special:PrefixIndex/Global_message_delivery/Targets/ and generate output based on iterating over this generated list.

Rather than using a generated list, I was forced to specify each page title. This isn't great, as pages may be added or deleted and I don't want to update such a list by hand.

An equivalent to [[Special:PrefixIndex]] (or the MediaWiki API's list=allpages&apprefix=) inside Scribunto/Lua would be wonderful.


Version: unspecified
Severity: enhancement

Details

Reference
bz47137

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:30 AM
bzimport set Reference to bz47137.
bzimport added a subscriber: Unknown Object (MLST).

darklama wrote:

I've provided an iterator solution, but it has the same limitations as
the prefixindex special page:

https//meta.wikimedia.org/wiki/Module:Subpages

I think an equivalent to the MediaWiki API's list=allpages&apprefix=
in iterator form inside Scribunto/Lua would be better though.

Jackmcbarn raised the priority of this task from Low to High.Dec 2 2014, 4:26 AM
Jackmcbarn set Security to None.
Jackmcbarn added a subscriber: Jackmcbarn.

Bumped priority up now that the way users did this before (unstrip) doesn't work anymore.

Bumped priority up now that the way users did this before (unstrip) doesn't work anymore.

I really don't care at all what the priority of this (or any) task is.

That said, it feels a bit strange for the sudden absence of this functionality to be considered high priority. The previous implementation (using transclusion) was pretty clearly a giant fragile hack. I think everyone involved knew that this hack was almost certainly going to break at some point as Special page transclusion was never considered a stable programmatic interface.

Rillke added a subscriber: Rillke.Dec 10 2014, 7:03 PM

Special page transclusion was never considered a stable programmatic interface.

It was the only viable for achieving a goal in regard to maintenance work, performance and functionality. At Commons, we used it for listing language subpages (/af, /de, /nl, ...), so if is easier to implement this specific functionality, I'd be happy with that. Perhaps I should open a ticket for this specific request?

And talking about stable interfaces, I don't know how often I had to change my scripts making use of API queries because something changed in incompatible ways. Sometimes I was under the impression that gadgets doing screen scraping had less frequently to be updated.

onei added a subscriber: onei.Mar 10 2015, 2:08 PM
Rical added a subscriber: Rical.Apr 10 2015, 10:07 AM
Danny_B added a subscriber: Danny_B.Jun 5 2015, 9:44 PM
Rical added a comment.EditedJun 6 2015, 9:25 AM

In a multilingual module, I put translations of arguments names, categories and error messages in the submodule "module_name/I18N". Then the main_module can change without change translations in any language in any wiki. But without the module_name itself I cannot automatize that for any modules.

The present change could resolve that, giving at the same time "module_name/I18N" and "module_name". The change could be helped with a parameter to select a part of sub titles, which contain "I18N" in my case.

I could also ask a change "Get the module_name itself". But the present change is more general and can be used for a group of sub modules and their datas.

Rical added a comment.Jun 6 2015, 9:51 AM

Perhaps, each new sub-titled page could record itself in a dedicated table in the "mother page". That could easy help to solve any tree of pages questions. For existing pages a bot could once build these tables.

He7d3r added a comment.Jun 6 2015, 1:48 PM

Perhaps, each new sub-titled page could record itself in a dedicated table in the "mother page".

That sounds like the old problem from T17071: Wikibooks/Wikisource needs means to associate separate pages with books.

Rical added a comment.EditedJun 6 2015, 2:40 PM

Sorry, I was not enough explicit. My proposition was only about sub pages like from Module:pages or from User:pages. About pages of books in wikisource, in Page: space, I don't know if the users of wikisource are interested. These pages are managed by the special Extension:Proofread_Page https://www.mediawiki.org/wiki/Extension:Proofread_Page which compares the text of one page of book in front of an image.

I'm wondering about how this feature would work with the current system of page protection, link tables and the expensive function count.

Every time this new prefixIndex function was used, we would have to have some way of tracking when a page with the prefix was created or deleted. When such a creation or deletion occurred, we would have to update all the transclusions of the page (probably a template) that used it, so presumably every page with the given prefix would have to count the template as a transclusion in the link table.

Now let's say this is a template with millions of transclusions. In this case, anyone creating or deleting a page that has the right prefix would trigger a re-rendering of all of these millions of pages. As things stand, there would be no kind of page protection preventing this, so the person doing the creating or deleting might not have any idea that their action was so expensive. It could also be used maliciously to put unnecessary strain on a site's servers. And while deletion is limited to admins, creation could potentially be done by anonymous users.

The previous workaround forced transclusions to update by simply disabling caching, but that's not an option, as it's even worse from a site-stability perspective. If we did that on a widely-transcluded template, it might actually bring the site down, as the pages would all have to be re-rendered on every page view.

Also, with this function, it would be possible to see whether a given page existed or not. If we treat this like the #ifexist parser function, then we would need to make it an expensive function. In fact, as you can check the existence of many pages at once, presumably we would need to make one prefixIndex call count as many expensive function calls. (As many as there are possible results that could be returned?)

I'm as keen as anyone else to see this feature implemented, but we need to think about how to deal with these questions first.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 17 2015, 3:53 AM

I'm a bit confused about the concerns you have here.

  • We already allow transclusion of Special:PrefixIndex.
  • We have a unique index on (page_namespace, page_title), so listing by prefix is cheap.
  • Scribunto/Lua modules are treated very similar to templates already and rely on the same cache invalidation infrastructure (links updates, etc.), as I understand it.

Long-term, it would be great if we could rely less on caching. The adoption of Scribunto modules over ParserFunctions templates, the deployment of HHVM, and other changes should get us closer to this goal eventually, I hope.

In my understanding, this function needs to work only when a page is created or renamed or deleted. Then a tree-table is updated for all pages up and down in the tree. Then each of these pages record the part of the table for all of it's sub-pages.
Later a scribunto function gives to the module the tree-table of the page, with no cost. Only access to any of these pages is expensive.

  • We already allow transclusion of Special:PrefixIndex.

Although Special:PrefixIndex can be transcluded, its contents are stripped, meaning that modules can't parse it.

In my understanding, this function needs to work only when a page is created or renamed or deleted. Then a tree-table is updated for all pages up and down in the tree. Then each of these pages record the part of the table for all of it's sub-pages.
Later a scribunto function gives to the module the tree-table of the page, with no cost. Only access to any of these pages is expensive.

I don't understand what you mean here. Can you clarify this comment?

  • We already allow transclusion of Special:PrefixIndex.

Although Special:PrefixIndex can be transcluded, its contents are stripped, meaning that modules can't parse it.

Right. I was speaking generally here. That is, users can transclude {{Special:PrefixIndex/Foo}} into wiki pages and subsequent page deletions and creations don't cause the servers to explode. In general, listing pages by prefix is pretty cheap, so I'm not sure there would be a huge problem with the performance of Scribunto/Lua modules if this functionality existed.

The difference is that in Lua someone can try to write a loop instead of having to be satisfied with getting just the 200 pages {{Special:PrefixIndex/Foo}} will give you.

Danny_B added a subscriber: ori.Jul 19 2015, 5:19 PM
Rical added a comment.Jul 19 2015, 5:35 PM

In T49137#1463321, I tried to describe the cheaper-for-use implemention, but I am not sure because I'm not a system coder.

Verdy_p added a subscriber: Verdy_p.EditedApr 14 2016, 5:19 PM

In my understanding, this function needs to work only when a page is created or renamed or deleted. Then a tree-table is updated for all pages up and down in the tree. Then each of these pages record the part of the table for all of it's sub-pages.
Later a scribunto function gives to the module the tree-table of the page, with no cost. Only access to any of these pages is expensive.

In fact this is more complex than that because the same saved page can generate dozens or hundreds distinct variants (based on the current user name, or the current language used, and they can all change based on current time if there are uses of magic words like {{CURRENTMINUTE}}, which forces the server-side caches (not just the browser caches) to be given a shorter expiration time (meaning that these pages will be parsed and generated again). MediaWiki sets a minimum expiration time for all pages (to avoid resource attacks), but does not limit the number of languages.

Multiply all these variants by the number of *source* subpages to iterate, this page can generate a huge charge on the server with thousands or tens of thousands page being flushed from the cache: if then a remote user attempts to load all these pages (without even needing to load them completely and wait for their completion), the server charge will suddenly explode (in terms of CPU, not much in term of disk I/O as the Wiki source pages are all the same, but still lot of I/O on the server frontend cache).

But I agree: we could still allow a scribunto parser to get of list a limited number of subpages (e.g. 200) within a range (just like when just transcluding a Prefixindex). This would allow creating pages with navigation buttons to get the next or previous range, overwhich a script could loop.

Rical added a comment.EditedApr 15 2016, 8:37 AM

In my present application the bindmodules() function tries to require("Module:Author/I18N"), across pcall to not fail, for all modules and libraries and their alternative versions, like Module:MathRoman02 and their I18N submodules for i18n tables.

This have already an answer in the Module:Central.
Another MW answer is usefull only if it is not expensive.

I believe understand that MW do not work like a classic PC finder for the subpages. Could a such structure be a MW answer ?
Then a module could ask existing subpages from a give one, just 1 level below, then select some pages, then ask another level ... recursively but under the control of the module.

hoo added a subscriber: hoo.Jun 22 2016, 10:27 AM
Erutuon added a subscriber: Erutuon.Mar 9 2018, 1:12 AM