Page MenuHomePhabricator

Scribunto's use of dynamic Parser::$scribunto_engine should be rewritten/removed
Open, Needs TriagePublic

Description

There's no guarantee on identity on the specific parser objects used during article pass. With both async fragment rendering as well as the Parsoid/Parser migration (which may cause different parts of the page to be variously rendered using the new Parsoid or legacy parser), the actual Parser objects seen may be different for different instances of an extension tag on the same page.

When PHP 8.2 disallowed dynamic property creation, the following error was given (T324891):

Deprecated: Creation of dynamic property Parser::$scribunto_engine is deprecated in /var/www/wiki/mediawiki/extensions/Scribunto/includes/Hooks.php on line 99

Deprecated: Creation of dynamic property Parser::$scribunto_engine is deprecated in /var/www/wiki/mediawiki/extensions/Scribunto/includes/Scribunto.php on line 62

This was "fixed" by allowing dynamic properties on the Parser class, but this fix should eventually be reverted once the underlying usage in Scribunto is removed.

We can potentially have a short term cache to reuse the scribunto engine between modules "as available" but this can't be guaranteed in all cases: in an async parse or in the case of a context switch between Parsoid and the legacy parser the previous engine object may not be available. Communication between scribunto invocations on the same page should be discouraged, or mediated through a ParserOutput object (which implies that the state should be serializable).

Event Timeline

Since the property is used in a single class (except for tests and a hook handler made unnecessary by my proposed approach), it could be migrated to a static WeakMap as a stopgap solution once we’re on PHP 8: while it wouldn’t solve the issue, at least it would make core not depend on Scribunto (and remove the need for the hook handler, as cloning the Parser won’t duplicate the WeakMap entry). Making the state serializable won’t be easy: LuaStandaloneEngine keeps references to the Lua process and its standard input and output pipes – all three are resources and thus unserializable. (I guess the same mostly applies to LuaSandboxEngine as well, but its references are buried in a PHP extension.)

Or maybe the whole thing is nothing more than a performance optimization, and it’s fine in the long term to keep the WeakMap solution? (Git/SVN history doesn’t help us much, writing in the Parser has been there since the very first version rSVN110208.)

With https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1004257 in place this shouldn't be causing issues in production on PHP 7.4 or 8.1, but it's been added as a blocker – @Reedy, did the work-around not work?

Change #1191146 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/Scribunto@master] Remove use of Parser::scribunto_engine

https://gerrit.wikimedia.org/r/1191146

Change #1194279 had a related patch set uploaded (by Tacsipacsi; author: Tacsipacsi):

[mediawiki/core@master] chore(Parser): Remove no longer used dynamic property

https://gerrit.wikimedia.org/r/1194279

Change #1191146 merged by jenkins-bot:

[mediawiki/extensions/Scribunto@master] Remove use of Parser::$scribunto_engine

https://gerrit.wikimedia.org/r/1191146

The patch sets removes the property on the Parser instance, it does not analyse or handle the possible impact on fragmentation parsing as part of the description:

There's no guarantee on identity on the specific parser objects used during article pass. With both async fragment rendering as well as the Parsoid/Parser migration (which may cause different parts of the page to be variously rendered using the new Parsoid or legacy parser), the actual Parser objects seen may be different for different instances of an extension tag on the same page.

Each engine linked with the parser instance has a title object and the parser instance.
But it seems that each engine is otherwise stateless and using more parser for different #invoke on the same page just creates more engines at the moment, but it should not have a impact on the parser result (maybe less caching and impact on the parse time or parse memory limit).
The extension does not allow to have communication between different #invoke, this only happen by accident and maybe is treated as a security bug.
The merge of parser output, like flags or template links, and the merge of the limit report must be handled by core already.

From my point of view this can be marked as resolved, but not sure if there are other points to look at.

The limits are not only there to be reported, they also have to be enforced, how will that happen? Especially given that caching may have a big impact on it (think of mw.load{Data,JsonData}()). I think this needs careful consideration.

(Also, this task should definitely be closed before https://gerrit.wikimedia.org/r/1194279 is merged. I get your point about easier rollback, but that also means that not even that part of the task is done yet.)

Change #1194279 merged by jenkins-bot:

[mediawiki/core@master] chore(Parser): Remove no longer used dynamic property

https://gerrit.wikimedia.org/r/1194279

Change #1200435 had a related patch set uploaded (by Tacsipacsi; author: Tacsipacsi):

[mediawiki/core@REL1_45] chore(Parser): Remove no longer used dynamic property

https://gerrit.wikimedia.org/r/1200435

Change #1200435 merged by jenkins-bot:

[mediawiki/core@REL1_45] chore(Parser): Remove no longer used dynamic property

https://gerrit.wikimedia.org/r/1200435