Page MenuHomePhabricator

Expensive functions ($wgExpensiveParserFunctionLimit, severe inconsistencies) + batching
Open, Needs TriagePublic

Description

The limit is currently 500 on public wikis and requests to increase it usually get rejected (T160685).

But there are 2 severe problems/inconsistencies with the satus quo:

There can be far over 500 links on a wiki page, and they still get their colour (blue or red) according to existense status of the page. In order to set the colour, the existense status of the target page must be peeked and this is presumably internally expensive but free (as beer) for the wikitext writer. Why cannot the title object "exists" be free too?

Consider a module performing an "ifexist" query 499 times (counting and stopping before 500 is reached and the page is caught in a pillory tracking category) and then displaying 600 links, those 499 and 101 additional ones. All 600 links will be active and have correct colour. How many expensive fetches are internally carried out? Maybe 1'099 because the results are not cached?

The limit can be circumvented by following trick exploiting "msgnw":

function ifexists (a, strpagename)
  return ( '[[:' .. strpagename .. ']]' ~= a:preprocess ('{{msgnw::' .. strpagename .. '}}') ) 
end--function ifexists

This function is free (as beer):

https://eo.wiktionary.org/w/index.php?title=Vikivortaro:Provujo&oldid=901560 -- 676 pages tested, ZERO of 500 expensive function calls consumed !!!

The function exploits the fact that "ifexist" queries cost whereas transclusions (without or even with parsing) are free. I lack deep knowledge of wiki internals, but given 3 tasks:

  • a) check whether a page exists and return a boolean
  • b) check whether a page exists and return its content (up to 2 MiO)
  • c) check whether a page exists, fetch and recursively parse its content and return the result (up to 2 MiO)

to me a) seems cheapest and c) most expensive. Yet the status quo is that a) costs whereas b) and c) are free.

An "ifexist" function can be even constructed without "msgnw". Is is theoretically still "free" but the tremendous number of recursive junk transclusions make it timeout long before 500 queries are done. This confirms my assumption that transclusions are internally expensive, much more than "ifexist" queries.

It would be probably easy the cripple "msgnw" making is unusable for this purpose, but I propose a much more productive fix (more fetches available for modules with same internal cost for the servers):

  • separate the limit for "ifexist" queries from other expensive limits
  • set a sane value for it (maybe 1000 on public wikis)
  • cache the results of such "ifexist" queries (title object "exists", "#ifexist", and ordinary links) (number of cachable results = limit for number of calls)
  • apply this new limit to all title object "exists", "#ifexist" and ordinary links
  • repeated peeking same target page (module called multiple times, module tests existence and then displays links, ...) increases the counter by ONE only
  • when the limit is reached then the page gets caught in a tracking category, title object "exists" fails, "#ifexist" returns false, and ordinaty links are inactive (black and not clickable)

Other (maybe better solution):

  • keep a single limit
  • raise it sufficiently so that (almost) no existing wikis break
  • make an "ifexist" query (title object "exists", "#ifexist" and ordinary links) and "pagesincategory" query cost 1 unit, and a transclusion 2 units
  • cache the results of such "ifexist" queries (title object "exists", "#ifexist", and ordinary links) (number of cachable results = limit for number of expensive calls)
  • repeated "ifexist" queries of same target page (module called multiple times, module tests existence and then displays links, ...) increase the counter by ONE only (question remains how to do for repeated transclusions, but making them cost again and again (only ONE because "ifexist" result is cached whereas the transclusion content is not) is not a problem because repeated transclusions can be avoided by good page design)
  • when the limit is reached then the page gets caught in a tracking category, title object "exists" fails, "#ifexist" returns false, and ordinaty links are inactive (black and not clickable)

This way pages having few (internally expensive) transclusions could afford far more than 500 "ifexist" or "pagesincategory" queries without raising the total cost for the severs, and more efficient design (few smart modules instead of many transclusions repetitively evaluating same tasks) would be promoted.

Event Timeline

There can be far over 500 links on a wiki page, and they still get their colour (blue or red) according to existense status of the page. In order to set the colour, the existense status of the target page must be peeked and this is presumably internally expensive but free (as beer) for the wikitext writer. Why cannot the title object "exists" be free too?

Traditional wikilinks are not anywhere close to as expensive because MediaWiki is able to batch the existence checks so it only ends up making one query instead of in Lua in which a query needs to be made for each exists call.

Your msgnw hack has bigger problems because MediaWiki won't register the page in the links table, so if the page being transcluded gets edited, MediaWiki won't know that this page needs to be updated as well.

The only actionable thing I can think of is allowing Lua to batch lookup link existence.

Hello ... thanks for answering. Allowing to batch "ifexist" calls from LUA (unlimited or at least high number of pages for the cost of one) would be a great thing. This explains why there can be more than 500 links and they still have right colour, whereas "ifexist" stops working.

The batching doesn't invalidate the astonishment about the fact that "ifexist" costs whereas (more expensive) translusion is free.

Taylor renamed this task from Expensive functions ($wgExpensiveParserFunctionLimit, severe inconsistencies) to Expensive functions ($wgExpensiveParserFunctionLimit, severe inconsistencies) + batching.Aug 2 2021, 11:44 PM
Taylor updated the task description. (Show Details)

I agree with @Taylor's proposal. This needs to be fixed.

I don't know whether this batching would be possible for pagesincategory queries too. If YES then it is the preferred fix. I have an alternative idea that I would put into a separate task if batching is possible for ifexist only.