Page MenuHomePhabricator

Create a library for extensions using global parser state
Open, Needs TriagePublic

Description

There are some extensions relying on global parser state to do their job by effectively creating pagewide globals. This openly contradicts the goals of the Parsoid team, however, they are popular and widely used nevertheless.

These extensions all want to do similar things:

  • store arrays of some sort of data in the parser
  • be reset onParserClearState
  • be deep cloned whenever the parser is cloned
  • mark the current parser frame as volatile whenever they are accessed, as invalid parsing results are produced otherwise.
    • This is the only point on this list that might actually be hard to do at the moment. I'm still not entirely sure how to do so in general, or if this is even possible with the current MediaWiki core, but it would resolve so many inconsistencies with these extensions.
  • maybe get the values of these information after parsing is finished (Variables does this, and it should be handy for Arrays as well.)
  • hopefully somehow continue to work if the WMF is unifying MediaWiki-Parser and Parsoid. I didn't test it, but I think VisualEditor and these extensions aren't really compatible these days.

I think instead of implementing these functionality over and over, there should be a unified library used by these extensions providing all of this functionality instead, so they don't have to be updated individually when parser behavior changes and bugs can be fixed in a central place.

I currently have something like this in mind to be injected in the different main or Store classes of these extensions, so access would be controlled by these classes:

interface IParserStore {
  // these allow you to access data directly. They are needed especially if you want to save all information somewhere.
  public function getData() : array;

  // these mirror functionality these extension use in some way or another.
  public function getValue( string $key );
  public function setValue( string $key ) : void;
  public function keyExists( string $key ) : bool;
  public function unset( string $key) : bool;
  public function holdsData() : bool;
  
}

Handling this finalized value functionality should be the task of an additional class however, that I didn't really think about yet. Plase note I expect MediaWiki HHVM support to be dropped before this reaches production wikis, so this is using scalar type hints.

Event Timeline

Can you start a complete list of extensions that might benefit from this? In addition to Variables and Arrays, you mentioned Extension:Loops on your de.wp talk page, and IIRC you've also mentioned at least one other extension somewhere? Are there any others you're aware of?

Extension:Cite had to be specifically worked around by the Parsoid team, because of its own global page state; it might be worth looking at how it was handled to see if there are any lessons that can be applied here as well (or vice versa).

Legoktm subscribed.

It would be helpful to document why ParserOutput::get/setExtensionData() doesn't meet what you need.

(Also removing Librarization since that's intended for refactoring code to be independent of MediaWiki, while this requested functionality would inherently be specific to the Parser.)

Can you start a complete list of extensions that might benefit from this? In addition to Variables and Arrays, you mentioned Extension:Loops on your de.wp talk page, and IIRC you've also mentioned at least one other extension somewhere? Are there any others you're aware of?

There is the MediaWiki category I just linked, however I think there might be other ones that maybe aren't even on Gerrit. Loops isn't completely of this type, but half of it's functionality doesn't work withour Variables.

Extension:Cite had to be specifically worked around by the Parsoid team, because of its own global page state; it might be worth looking at how it was handled to see if there are any lessons that can be applied here as well (or vice versa).

Probably, as more complicated bugs in Variables often are only fixable because Cite had similar issues at some point. However, I completely lack knowledge Parsoid. Maybe we should just ask how to handle this contrast, as it's a question they have to ask themselves at some point in the future anyway.

It would be helpful to document why ParserOutput::get/setExtensionData() doesn't meet what you need.

I wasn't aware of this function, I'll take a look on it. Many modern developments aren't really taken in consideration in these extensions, as much of their codebase has been written while MW 1.16 was the current MediaWiki version.

@Legoktm I wasn't aware of this function, and I agree the mentioned extensions should store their data there instead of using a parser member variable.

However, this proposal isn't targeted at the way these extemsions store their data in the Parser, but rather at the way how the objects stored as ExtensionData work internally, to share the code they have in common.

However, this proposal isn't targeted at the way these extemsions store their data in the Parser, but rather at the way how the objects stored as ExtensionData work internally, to share the code they have in common.

I don't really understand the distinction, could you clarify?

Imagine this as a framework to provide functionality extensions of this type often use. For example: Both Variables and Arrays store their data in a simple array in an extension class, which is stored in a Parser attribute afterwards (or with setExtensionData after refactoring, see T203530 for that). I don't want to change much about this path, except for the simple array thing, because all of these extensions have problems with similar things:

  • You can't just get data from them, you have to make sure there's no template caching going on producing invalid data. In other words: At each point where you get data, you have to modify the given PPFrame. If doing it like we do it today becomes impossible at some point (Parsoid), it's much simpler to change the way it works in a central place.
  • To interact, these extensions define public interaction functions just like these, that look always the same. You would just have to define decorators in the extensions if you start from a ParserStore class like that.

This ParserStore would just be injected by DI into the extensions storage class and be used for all this stuff. Currently, it doesn't do too complicated things, but it makes adjustments to changing parser logic much easier.

However, this is only the first step of what I'm planning to do. Variables uses a really interesting feature that allows it to use the var values as they are after parsing is finished in an #var_final parser function. This is something that isn't just useful to Variables as well, but could probably be implemented in similar extensions like Arrays very similarly. Furthermore, it's more complex than the first class I proposed right now, so it might justify the need of a library like that more.

Hopefully, this clarifies my idea a bit.