Page MenuHomePhabricator

Refactor Parser.php to allow alternate parser (Parsoid)
Open, MediumPublic

Description

The Parser.php class has over a hundred public methods, for mostly historical reasons.

We need to clean this up and create a robust API that can be implemented by Parsoid in order to (eventually) allow replacement of the legacy parser.

This is the tracking bug for that work.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 589459 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Deprecate unused parser-related hooks

https://gerrit.wikimedia.org/r/589459

Change 589463 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Deprecate InternalParseBeforeLinks hook

https://gerrit.wikimedia.org/r/589463

Change 589459 merged by jenkins-bot:
[mediawiki/core@master] Deprecate infrequently-used parser-related hooks

https://gerrit.wikimedia.org/r/589459

ParserFetchTemplate was deprecated without replacement, but it is used in a patch that relates to Language Team's active sprint work: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/573450

Edit: Replacement was suggested in RELEASE_NOTES: BeforeParserFetchTemplateAndTitle

I see two problems with this:

  • Have to support two hooks simultaneously. Can be messy if there is no feature detection to check which one should be handled.
  • We cannot actually do the thing we want with BeforeParserFetchTemplateAndTitle, because it does not allow changing the title or dependencies (only revision of the provided title can be changed), so template tracking would be wrong due title / version mismatch.

I think BeforeParserFetchTemplateAndTitle is the correct hook, although maybe we need it to work slightly differently to allow changing the title as well as the revision. I agree w/ not wanting to support two hooks simultaneously, which is why I deprecated ParserFetchTemplate in the first place. ParserFetchTemplate doesn't let you override the title either, you are just faking it by replacing the text (after we've already fetched the original template in the wrong language) and rewriting the deps.

I think you don't actually need to rewrite the $title; the code in statelessFetchTemplate seems to already handle the case where the $revId returned belongs to a different page than $title (it handles this as a redirect). So I think that should "just work".

But if it doesn't, I can make the second parameter to BeforeParserFetchTemplateAndTitle (with proper capitalization, this is a hook I just added) pass-by-reference, to allow changing the title as well as the ID.

I think that's a better solution, ParserFetchTemplate happens at the "wrong" time (after the 'wrong' template is already fetched) and as I said I'd prefer not to have to support two different ways of doing the same thing, especially as we're about to refactor Parser.php pretty drastically.

(Also note, there was a capitalization change to the BeforeParserFetchTemplateAndTitle hook in 1.35, but it has existed for a long time so backwards-compat shouldn't be a problem. The new hooks mechanism coming in 1.35 should make it even easier to keep backwards-compat...)

Change 589459 merged by jenkins-bot:
[mediawiki/core@master] Deprecate infrequently-used parser-related hooks

https://gerrit.wikimedia.org/r/589459

This patch claims the InternalParseBeforeSanitize hook would be "infrequently-used". This hook is used for example in the Variables extension, which is among the Top 10 most-deployed extensions that are not used by WMF. The InternalParseBeforeLinks hook can be used instead, if Variables sanitizes its output independently (again, as it did before its author introduced this hook in MW 1.20), but this does not seem to be a sustainable alternative either.

Both alternative hook suggestions won't work here, as the Variables extension is dependent on having all MediaWiki variables replaced before outputting the final value of the variables, while the final values are needed for the correct parsing of other stuff like links and tables. There is no migration path possible here, except for dropping support for the #var_final parser function altogether.

I'm a bit disappointed to see how this is handled. There is no hint in docs/hooks.txt what the proposed replacements are for these deprecations. No hint on the mediawiki.org pages either. No comment in the code.

As of now, extensions like Wigo3 are broken, with no easy way out. In this particular case it's the ParserPreSaveTransformComplete hook, which was just introduced in 1.35 and deprecated in the same version. o_O? @tstarling?

The release notes currently say:

* The following parser-related hooks have been deprecated:
  […]
  - ParserSectionCreate
    * No replacement; <section> tag wrapping will be done by core in future.
  - ParserPreSaveTransformComplete
    * No replacement; Content::preSaveTransform() provides for customizable PSTs
  - BeforeParserrenderImageGallery
    * No replacement; MediaHandler provides for customizable media rendering

I'm not exited to see "no replacement". What I wish we would have done is to first get rid of deprecated calls in codebases we actively monitor, and only then hard-deprecate stuff. What are the next steps now? Rewrite code that relies on these hooks? Who is supposed to do this? When? Consuming which budget? And a rewrite seems what's now necessary to keep the Wigo3 extension running.

I don't own any of the codebases in question, so I have not much reason to care about them. I'm not even paid to look after them. But what I care about is that we try to help volunteers as good as we can. And I think we can do better.

I'm a bit disappointed to see how this is handled. There is no hint in docs/hooks.txt what the proposed replacements are for these deprecations. No hint on the mediawiki.org pages either. No comment in the code.

As of now, extensions like Wigo3 are broken, with no easy way out. In this particular case it's the ParserPreSaveTransformComplete hook, which was just introduced in 1.35 and deprecated in the same version. o_O? @tstarling?

The release notes currently say:

* The following parser-related hooks have been deprecated:
  […]
  - ParserSectionCreate
    * No replacement; <section> tag wrapping will be done by core in future.
  - ParserPreSaveTransformComplete
    * No replacement; Content::preSaveTransform() provides for customizable PSTs
  - BeforeParserrenderImageGallery
    * No replacement; MediaHandler provides for customizable media rendering

I'm not exited to see "no replacement". What I wish we would have done is to first get rid of deprecated calls in codebases we actively monitor, and only then hard-deprecate stuff.

We have. We monitor Wikimedia production code only. (Well, plus the ReplaceText extension which is in the tarball.)

What are the next steps now? Rewrite code that relies on these hooks? Who is supposed to do this? When? Consuming which budget? And a rewrite seems what's now necessary to keep the Wigo3 extension running.

Maintainers of said extensions are welcome to try to migrate. In some cases it won't be possible, because they're using a feature we're explcitly removing. This is known and accepted fallout from the Parser replacement work.

Change 598783 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Un-deprecate the ParserPreSaveTransformComplete hook

https://gerrit.wikimedia.org/r/598783

In the specific case of ParserPreSaveTransformComplete I might have been a bit over-eager. I'd forgotten the discussion we had in https://gerrit.wikimedia.org/r/571109 about this hook. Although it's true that Parsoid doesn't currently support exposing the PST, it probably will have to eventually. The part that makes me most nervous about the hook is actually the fact that it exposes the Parser object, which is likely to be refactored RSN. But if the hook just modifies the $text it's probably "fine".

Anyway, https://gerrit.wikimedia.org/r/598783 removes the deprecation.

Can we do the same for the InternalParseBeforeSanitize hook?

We have.

Depends on the definition of "monitoring". I prefer this one. To be fair, it's not as bad as I made it sound like. But still.

This is known and accepted fallout […]

Known to whom? Seriously. Neither of the places I got confronted with (the deprecation warning, hooks.txt, release notes) mention any reasoning for the deprecation. That's what I ask you to fix.

By the way, what I really do not like here is that there is no reasonable path forward for extension developers. If I replace hooks by other hooks that are not deprecated yet, there is a high probablity that those other hooks will be deprecated a few months later, for example the InternalParserBeforeLinks hook.

This is partly my fault - in that I haven't been upfront about communicating the entire plan. We'll fix that. We have been working on a document that lays out the transition plans to Parsoid and the implications of the change.

The deprecations that @cscott has been doing is one step towards that path but at this time, these are all in the nature of early warnings that these hooks will not have direct equivalents in Parsoid. I clarified some of that in T250963#6080908. At least for the Wikimedia wikis, extensions that rely on the hooks will have to be updated to make them compatible with Parsoid. But, for non-Wikimedia / 3rd-party wikis, this is optional. If they wish to, they can continue to operate without Parsoid and hence the deprecations are not as much of an issue. That said, we figured it is better to simply deprecate functionality that is not used by anyone (based on codesearch). In some cases, it may simply be an honest mistake and we can revert back. But, once again, to re-emphasize, independent of whether we deprecate a hook or not, if it doesn't have a Parsoid equivalent, it will not be supported when a wiki switches over to Parsoid.

Our first goal is to effect the Parsoid switchover on Wikimedia wikis and all our efforts around extension APIs are focused on that transition. The common extensions are already Parsoid-compatible. Others like ImageMap are being made Parsoid-compatible ( T94793, https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/585344 ). Non-Wikimedia wikis aren't expected to switch over to Parsoid and might prefer for all the early kinks and bumps to be ironed out anyway.

Anyway, TLDR we'll publish a more fleshed out document outlining the transition path and the impacts on extensions.

The problem of deprecations is that they can not just be ignored by extension developers, as they make unit tests fail. Such, jenkins awards -1 Verified to every changeset uploaded for an extension that does use them.

Thanks a lot for the longer response! I wonder: It sounds like you do not plan to ever remove these hooks? Why hard-deprecate them then? Shouldn't a soft deprecation (only via @deprecated) be enough then? As said above, the hard deprecations break CI for the affected codebases.

If a hook is not used OR if there is a better replacement, it does make sense to remove them. That is the strategy that @cscott (and @Jdforrester-WMF) have pursued. Based on this feedback about some of those hooks, we can revisit their deprecation. But, note that once Parsoid is the default, at some later release, we might remove the current parser (and alongwith it all the hooks) out of MediaWiki and into a library / extension. The exact strategy / roadmap is unclear at this point since it is too early, but a lot of these hooks will be removed OR move out of MediaWiki core at some point.

But, in the interim, yes, we will revisit hard deprecations of hooks where it is unnecessarily disruptive - as I said earlier, I suspect it is just something that was overlooked or didn't show up in code search or production monitoring.

@thiemowmde and @MGChecker: there are ways to mark your codebase to acknowledge your use of a deprecated hook that won't break CI, but they are awkward. I've filed T253768: No easy way to suppress hard-deprecation warnings for hooks for this; you might want to comment there citing your experience.

@cscott, the issue is the SpecialPageFatalTest that core enforces on the special pages provided by an extension: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-noselenium-docker/10570/console. Am I supposed to add some kind of catch-all $this->filterDeprecated( '//' ) to core to fix the browser tests failing in Wigo3?

Adding @Addshore as he wrote the SpecialPageFatalTest. To me it looks like this test should indeed suppress all deprecation warnings. The purpose of this test is to (quoting Addshore) "make sure that special pages do not fatal in their most basic form (anon user viewing the page)." Deprecation warnings don't show up for users. I don't think they should make this test fail.

@cscott, the issue is the SpecialPageFatalTest that core enforces on the special pages provided by an extension: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-noselenium-docker/10570/console. Am I supposed to add some kind of catch-all $this->filterDeprecated( '//' ) to core to fix the browser tests failing in Wigo3?

Use $wgDevelopmentWarnings = false; (over-riding DevelopmentSettings.php) if you want to hide pending breakage from yourself.

Adding @Addshore as he wrote the SpecialPageFatalTest. To me it looks like this test should indeed suppress all deprecation warnings. The purpose of this test is to (quoting Addshore) "make sure that special pages do not fatal in their most basic form (anon user viewing the page)." Deprecation warnings don't show up for users. I don't think they should make this test fail.

No. This code is explicitly intended to catch exactly this. It's a core concern of CI to catch bad code, including using deprecated codepaths.

Use $wgDevelopmentWarnings = false;

In https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/refs/heads/master/zuul/layout.yaml#7401? How?

It's a core concern of CI to catch bad code, including using deprecated codepaths.

Huh? No. Sorry. This is just not true. Deprecations – even hard ones via wfDeprecated() – never cause any random CI job to fail, except this one bad special page test. This is a bug.

Change 599608 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/core@master] Fix SpecialPageFatalTest failing on unrelated deprecations

https://gerrit.wikimedia.org/r/599608

Change 598783 merged by jenkins-bot:
[mediawiki/core@master] Un-deprecate the ParserPreSaveTransformComplete hook

https://gerrit.wikimedia.org/r/598783

Change 599608 merged by jenkins-bot:
[mediawiki/core@master] Fix SpecialPageFatalTest failing on unrelated deprecations

https://gerrit.wikimedia.org/r/599608

Didn't get this entirely done for 1.35; retargetting for 1.36.

Change 622623 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Remove Parser::setFunctionTagHook(), deprecated in 1.35

https://gerrit.wikimedia.org/r/622623

Change 622681 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/Description2@master] Remove <metadesc>

https://gerrit.wikimedia.org/r/622681

Change 622681 merged by jenkins-bot:
[mediawiki/extensions/Description2@master] Remove <metadesc>

https://gerrit.wikimedia.org/r/622681

Change 622623 merged by jenkins-bot:
[mediawiki/core@master] Remove Parser::setFunctionTagHook(), deprecated in 1.35

https://gerrit.wikimedia.org/r/622623

Change 634786 had a related patch set uploaded (by Paladox; owner: C. Scott Ananian):
[mediawiki/extensions/Description2@REL1_35] Remove <metadesc>

https://gerrit.wikimedia.org/r/634786

Change 634786 merged by jenkins-bot:
[mediawiki/extensions/Description2@REL1_35] Remove <metadesc>

https://gerrit.wikimedia.org/r/634786

Change 670159 had a related patch set uploaded (by MGChecker; owner: MGChecker):
[mediawiki/core@master] Undeprecate InternalParseBeforeSanitize

https://gerrit.wikimedia.org/r/670159

The attitude I'm seeing here from WMF is rather concerning. Essentially, it's "it's been decided". Well, that's lovely for WMF, but what about the rest of the wikis out there who maybe don't use (or perhaps even want/need) Parsoid, who don't use the Visual Editor, who don't use Flow, etc.? What do we do?

Our wiki has an extension similar to the Variables extension that'll be hit by some of the recent changes whenever we make it up to 1.35 or thereabouts. In our case, we're using a custom preprocessor...you know, the one in $wgParserConf that someone blithely commented that nobody ever changes. We did! What's more, we did it at the suggestion of Tim Starling, apparently. (I wasn't the original developer, but it's in the code comments.)

If your attitude is going to be "we do what's best for WMF, we only monitor WMF sites, and we don't care about non-WMF wikis", then I'd like to suggest that there be an official fork of the code for community development so that those of us who want to use the original wiki design, more or less as is, can do so without having to worry about massive changes to the internal design being foisted on us, leaving us with the unenviable choice of either spending insane amounts of time (often by volunteers) redeveloping or we stick to old versions of MW that aren't getting bug fixes, new features, and so forth.

@RobinHood70 I see that you are angry about changes in the parser that will break something your organization has developed, and that causes you extra work and uncertainty about the future. Is that right?

I am not part of the team that develops the parser, but I develop the Translate extension, which is also incompatible with the upcoming parser changes. The parser developers have been really helpful for redesigning it to work with the new version.

Perhaps, if you would like to describe your use case and needs, they can give also you suggestions how to proceed. Also, I know developers do look into whether code is used or not using the code search tool, so it is a good idea to have your code included in the tool if it is public per instructions in https://www.mediawiki.org/wiki/Codesearch. The best way to influence the development of the MediaWiki is to participate in the development and discussions. Please assume good faith.

@RobinHood70 I see that you are angry about changes in the parser that will break something your organization has developed, and that causes you extra work and uncertainty about the future. Is that right?

I am not part of the team that develops the parser, but I develop the Translate extension, which is also incompatible with the upcoming parser changes. The parser developers have been really helpful for redesigning it to work with the new version.

...and they offered to do the same for Variables except it was the Variables maintainer who said they didn't want to do anything: T250963#6169439.

As a general principle, we cannot avoid breaking changes or technical upgrades just because there are wikis that use some specific set of features or use them in a particular way.

As for this specific scenario, Parsoid is the future of wikitext processing for MediaWiki. There are no two ways about it. If wikis want to continue to use the legacy parser, we may provide options where we can package that as an extension for such wikis to use them. But, none of these changes will happen overnight and for for many reasons (Wikimedia wikis use of specific features, extensions and their dependencies, 3rd party wikis), it takes a long time. Deprecation processes, backward compatibility support, and refactoring tasks like this are some of the ways how we try to minimize impact.

There is, unfortunately, no zero-pain path available for these technical upgrades.

I'm not angry, really, so much as I see this sort of WMF-centric thinking from the developers often, and I think there needs to be some better feedback mechanism than simply trusting wfDeprecated() and the like to tell the developers what's in use and what's not. The reality outside of WMF wikis is that most lag several versions behind the current. Just browsing around, I easily found wikis between 1.25 and 1.33; I found none at 1.34 or above. So, deprecating something in 1.34 and then removing it in 1.35 or 1.36 because nobody complained or was logged as using the feature is not really a good plan for wikis like these. I think, if nothing else, there needs to be some kind of communication of planned deprecations/removals that allows extension developers who may not be at the current version to be made aware of breaking changes in advance and be able to say "Hey, we're still using this. We need a path forward."

In my case in particular, it's the inability to customize the default preprocessor that's the issue. Our custom extension is basically a miniature version of Variables and Scribunto that adds a lot of power to templates (or any other transcluded page) that isn't in the default implementation, much as MGChecker described. The functions ours adds include the following:

  • #define:arg|value. Sets a default value for a parameter if none was provided, so repetitions of {{{var|value}}} are no longer needed.
  • #local:arg|value. Much the same as #define, but overwrites the existing value or creates it if it didn't exist. Used both for local variables and sanitizing input.
  • #preview:arg|value. Sets the argument to a given value in preview mode only. This is safe to save with the template, since it won't affect the actual functioning of the template outside of preview mode.
  • #inherit:arg1...argX: Climbs the frame stack looking for the nearest one with the specified argument. Excellent for things like behaviour switches where they can be set on the page itself (via #local, #define) and then automatically applied to all calls to the template from there on. Also useful to avoid passing arguments as blank {{{arg|}}} type values to subtemplates when in fact they weren't specified at all.
  • #return:arg1...argX: The inverse of #inherit, it passes values back up the stack, allowing templates to act like functions.

All of these are made possible by making a slightly modified version of PPTemplateFrame into the default frame produced by Preprocessor_Hash.newFrame(). (This was the approach suggested by Tim Starling that I mentioned above.) Naturally, this means we have to override the default preprocessor, which we've been doing via the $wgParserConf setting. I gather that's been moved into ParserFactory() in recent versions, though I'm a little unclear if changing the configuration there would change all parsers created from then on, even by MW's calls, or if it only affects user-created parsers. We would, obviously, need it to affect everything.

There are data-storage/retrieval parser functions as well, but I haven't had much of a chance to delve into those, so I can't speak to how they might be affected. My impression is that if the above issues are addressed, that will resolve anything with the data storage functions as well.

@RobinHood70 I see that you are angry about changes in the parser that will break something your organization has developed, and that causes you extra work and uncertainty about the future. Is that right?

I am not part of the team that develops the parser, but I develop the Translate extension, which is also incompatible with the upcoming parser changes. The parser developers have been really helpful for redesigning it to work with the new version.

...and they offered to do the same for Variables except it was the Variables maintainer who said they didn't want to do anything: T250963#6169439.

I would like to rebut this statement. In this issue and in the followi, we came to the conclusion that it is fundamentally impossible to port the main Variables features to Parsoid, since it depends on pages being parsed in linear order. @ssastry has confirmed this just a few posts later, in T250963#6934680.

Since there is no perspective for Parsoid compatibility anyway, it does not appear worthwhile to pretend that working around the current deprecations would help in anyway to avert or decelerate the doom of this extension.

If it would be possible to remove the deprecation warning, that might be helpful (see T253768), but on the other hand, this warning gives an incentive to visist MediaWiki.org and learn about the state of the extension.

As for this specific scenario, Parsoid is the future of wikitext processing for MediaWiki. There are no two ways about it. If wikis want to continue to use the legacy parser, we may provide options where we can package that as an extension for such wikis to use them. But, none of these changes will happen overnight and for for many reasons (Wikimedia wikis use of specific features, extensions and their dependencies, 3rd party wikis), it takes a long time. Deprecation processes, backward compatibility support, and refactoring tasks like this are some of the ways how we try to minimize impact.

There is, unfortunately, no zero-pain path available for these technical upgrades.

I think many users of extensions which might have issues with the deprecation of these extensions would appreciate if there were any guarantees that this parser librarization will happen, since this would remove the pressure of migrating "immediately".

For many system administrators, one MediaWiki development cycle of about 6 months is not much time after all. Their ecosystems mostly change on a scale of every few years only.

I have been on the side of "bringing out the garbage as fast as possible" in other projects as well, and I understand your point of view. Nevertheless, one should keep in mind, that WMF is not a typical stakeholder regarding MediaWiki software. Minor stakeholders should be taken into account somewhat as well.

I'm not angry, really, so much as I see this sort of WMF-centric thinking from the developers often, and I think there needs to be some better feedback mechanism than simply trusting wfDeprecated() and the like to tell the developers what's in use and what's not. The reality outside of WMF wikis is that most lag several versions behind the current. Just browsing around, I easily found wikis between 1.25 and 1.33; I found none at 1.34 or above. So, deprecating something in 1.34 and then removing it in 1.35 or 1.36 because nobody complained or was logged as using the feature is not really a good plan for wikis like these. I think, if nothing else, there needs to be some kind of communication of planned deprecations/removals that allows extension developers who may not be at the current version to be made aware of breaking changes in advance and be able to say "Hey, we're still using this. We need a path forward."

So...Parsoid as a project has been going on for what, 8 years now? The Parsing Team has done multiple surveys of callers, put out calls for feedback on the extension API, given tech talks about porting, really, what more are you looking for?

...and they offered to do the same for Variables except it was the Variables maintainer who said they didn't want to do anything: T250963#6169439.

I would like to rebut this statement. In this issue and in the followi, we came to the conclusion that it is fundamentally impossible to port the main Variables features to Parsoid, since it depends on pages being parsed in linear order. @ssastry has confirmed this just a few posts later, in T250963#6934680.

From the same comment you linked:

If this is a real blocker, then, we'll have to introduce some kind of page property OR some kind of extension config option that will force full sequential reparse for any page that has that flag or extension config set.

To me that sounds like the Parsing Team is open to modifying Parsoid to support your use case.

If this is a real blocker, then, we'll have to introduce some kind of page property OR some kind of extension config option that will force full sequential reparse for any page that has that flag or extension config set.

To me that sounds like the Parsing Team is open to modifying Parsoid to support your use case.

I would really appreciate clarification on this point, since right above it is stated "However, if extensions introduce state that depends on the page parsed in linear order, yes, we cannot support that.". At the time I read this, this lead me to the conclusion that this is more of a theoretical possibility which has no real chance to be implemented.

But rereading this again, this might actually be a misunderstanding.

We're not interested in migrating to Parsoid, nor were we aware that this was to become integrated rather than simply yet another extension, so wikis like ours likely ignored any calls for feedback, if they were aware of them at all. I certainly don't recall seeing anything about it in the few things I pay attention to/mailing lists I'm subscribed to, but that could well be what I just mentioned...the assumption that this was an optional component.

Our needs are simply to stay with the legacy parser and have it work as it has since practically the single-digit versions of MediaWiki. As I understand it, the legacy parser will still be an option (and, at least according to one thread I found the default option) in the future, so why are features that at least some developers require being removed from it?

None of my comments above and here is to engage with any of the criticism of us making WMF-centric decisions. That is a separate issue and I am not going to wade those waters here. I would like to focus my attention and discussion on the specific issues of wikitext engines in MediaWiki and their future and the migration paths.

We're not interested in migrating to Parsoid, nor were we aware that this was to become integrated rather than simply yet another extension

FWIW, my understanding is that your view among 3rd parties is a minority viewpoint. We got hammered for years for keeping Parsoid as a separate service and ironically, for making WMF-centric decisions and not offering Parsoid in core and making it difficult for 3rd party wikis to install Parsoid and hence VisualEditor.

Our needs are simply to stay with the legacy parser and have it work as it has since practically the single-digit versions of MediaWiki. As I understand it, the legacy parser will still be an option (and, at least according to one thread I found the default option) in the future, so why are features that at least some developers require being removed from it?

Parsoid will be the default wikitext engine in MediaWiki and this will enable us to make further changes to wikitext semantics and usability in the longer run. In the long run, legacy parser is a dead end path for wikis and we considered the librarization / extension packaging of it and mentioned it above as a possible solution for wikis that might want to keep using it for whatever reasons. We haven't actually worked on it and through all the details of it yet.

In terms of rollout timelines, our immediate priority for rollout of Parsoid as the default is Wikimedia wikis. That will uncover (as it has been already) a whole bunch of gaps in our extension support in Parsoid and we'll work through those. Along the way, we'll continue to deprecate hooks and features that are unsupportable in Parsoid. Once we are done with Wikimedia wikis and things look stable, we'll start moving on this for MediaWiki after that. If I had to take a guess, that will probably land in the next LTS which is probably 2+ years away given that 1.35 just came out last year.

So, there is time to resolve these issues, but I do not want to offer any false hopes that MediaWiki + legacy parser usability will be a long-term viable solution for 3rd party wikis. Packaging of legacy parser as a library / extension offers a temporary way out and a medium term solution at best.

So it doesn't get lost in the other discussion here, one of us will engage with the substance of your preprocessor requirements and respond separately in the coming days / week.

Thank you for the explanation and the offer to engage with our requirements. Perhaps as the migration to Parsoid continues, better solutions will present themselves, but for us, right now the reality is that we've got 1700+ templates affecting over 75k content pages (not to mention those that affect non-content pages, like talk and redirect pages, bringing us to 300k pages overall). Having had our custom extension in place for 12 years now, most of our templates rely on it at this point, and we have only a handful of template coders to maintain them. So, hopefully you can understand that migrating to something that will essentially break all of that isn't just a pain point for us, it's simply not a viable option. While I would hope that this isn't the case, the reality may well have to be that we stop upgrading at whatever the latest version is that will support our needs.

If this is a real blocker, then, we'll have to introduce some kind of page property OR some kind of extension config option that will force full sequential reparse for any page that has that flag or extension config set.

To me that sounds like the Parsing Team is open to modifying Parsoid to support your use case.

I would really appreciate clarification on this point, since right above it is stated "However, if extensions introduce state that depends on the page parsed in linear order, yes, we cannot support that.". At the time I read this, this lead me to the conclusion that this is more of a theoretical possibility which has no real chance to be implemented.

But rereading this again, this might actually be a misunderstanding.

Sorry about using imprecise language in my comments across phabricator tasks.

High-performance and other functionality that depends on independent / decoupled parsing of fragments of a page cannot, by definition, support linear in-order parsing (To clarify: that does not mean hat the final page output won't be in the right order. It will be.). But, if we provide an opt-out mechanism in Parsoid for that to force a linear-ordered full reparse of a page, then I suppose that does mean that Parsoid supports such pages (but disable certain features / functionality on the page as well). Once again, we haven't worked through the feasibility and complexity of providing such support (that is a separate technical investigation), but hopefully that answers your question. And, all things considered, once again so we aren't offering false hopes, we'll lean towards breaking incompatible functionality in the pursuit of higher-performance and future-proofing the technology. But, we will not go down the route of breaking things simply because we can. We have attempted to minimize impacts to the extent it is practical to do so.

So, hopefully you can understand that migrating to something that will essentially break all of that isn't just a pain point for us, it's simply not a viable option.

I can certainly understand that it is difficult for you. But, I also hope you can understand that a software as complex as MediaWiki used to run Wikipedias and other sister projects serving billions of page views cannot cater to *every* 3rd party wiki user and their customized needs, especially when it seems like a minority position even among 3rd party wiki users, and especially when the engineering team on our end has at most 5 people. And, while this is not a concern in this instance, in the future, please refrain from making charged statements that may not necessarily be grounded in practical reality - you might get better support and help that way.

I wasn't trying to be difficult or insulting; I was just presenting the reality that I, personally, have seen. As a user of mostly small- to medium-sized wikis, most of which lag a fair bit behind the current version, I think my view of things is very different from yours where, as you say, you're supporting primarily large wikis with very different needs. All I was trying to say, really, was that while we may not have the page count/page views of the larger wikis, there is nevertheless a set of wikis that will prefer the legacy parser for one reason or the other. It may not even be for technical reasons such as ours, it might be simply a matter of processing power, preference, or whatever else. As I said in the beginning, and as has been on display throughout this thread, the attitude here is clearly "this is the direction we're going, get on board", and from the perspective of these smaller wikis that are behind by several versions, that's practically an overnight shift and it comes as a slap in the face.

While forking the entire project might have been a bit of a stretch, I still think it would be best to develop a base parser class, leave the legacy parser as a user of that base class, then let the community that wants/needs it continue development on it to support their needs. Is this in some way fundamentally incompatible with Parsoid?

I appreciate and welcome your input but was mostly countering your assertion that your viewpoint and experience is representative of small and medium-sized wikis or that we are ignoring small and medium-sized 3rd party wikis in general. When smaller wikis are behind by several versions, a proposed shift in a future LTS doesn't really count as "an overnight shift" nor as a "slap in the face". Please be careful with your charged statements.

Anyway, let us please end this thread here. We'll take a look at your specific usecase and see what we can do.

I'm not trying to make "charged statements", and I apologize if I'm coming across that way. I'm just presenting my personal view and the views from a few other wikis that I've dealt with. I'm not trying to say it's representative of anyone else, and I'm not trying to be contentious. I'm just saying that not everybody's at the same place you are.

I've already said that we may simply have to accept the fact that there will come a version at which point we won't be able to upgrade further. While I feel it would be unfortunate, that's a perfectly acceptable reality. I'd just like to put it off for as long as possible, and from my understanding of the code, the ability to specify a custom default preprocessor is still easily backed out at this point. That's really all I'm interested in.

@RobinHood70 Since, in my experience this can be helpful to have everyone aware what is actually talked about: Is your custom preprocessor implementation published somewhere?

If this is a real blocker, then, we'll have to introduce some kind of page property OR some kind of extension config option that will force full sequential reparse for any page that has that flag or extension config set.

To me that sounds like the Parsing Team is open to modifying Parsoid to support your use case.

I would really appreciate clarification on this point, since right above it is stated "However, if extensions introduce state that depends on the page parsed in linear order, yes, we cannot support that.". At the time I read this, this lead me to the conclusion that this is more of a theoretical possibility which has no real chance to be implemented.

But rereading this again, this might actually be a misunderstanding.

Sorry about using imprecise language in my comments across phabricator tasks.

High-performance and other functionality that depends on independent / decoupled parsing of fragments of a page cannot, by definition, support linear in-order parsing (To clarify: that does not mean hat the final page output won't be in the right order. It will be.). But, if we provide an opt-out mechanism in Parsoid for that to force a linear-ordered full reparse of a page, then I suppose that does mean that Parsoid supports such pages (but disable certain features / functionality on the page as well). Once again, we haven't worked through the feasibility and complexity of providing such support (that is a separate technical investigation), but hopefully that answers your question. And, all things considered, once again so we aren't offering false hopes, we'll lean towards breaking incompatible functionality in the pursuit of higher-performance and future-proofing the technology. But, we will not go down the route of breaking things simply because we can. We have attempted to minimize impacts to the extent it is practical to do so.

I created T282499: Consider whether Parsoid will support forced linear parsing. to have a canonical place for discussions and updates regarding this specific topic in the future, since this seems not really the right place to lead many different discussions.

I appreciate your work reworking the Parser!

Yup, here it is: https://github.com/uesp/uesp-wikimetatemplate

The main code you would want to take a look at is in MetaTemplatePPFrame.php. As you can see, it's a stripped-down version of PPTemplateFrame (a really old one, cuz most of this was developed around the MW 1.1x days). The only real differences are the $context parts of getNumberedArgument() and getNamedArgument(). MetaTemplateParserStack then makes use of the fact that all frames are now template-like frames to assign argument values during preview as needed. At least that's my understanding of the code at this time. I've only just picked up the entire project and started to modernize/redevelop it as of a few days ago. Previously, the only parts I was responsible for were the special pages and API, and even those were mostly just copy-pastes from modules like page props.

Re-reading, I realize now that my initial comment came out much more acerbic than I intended, and undoubtedly set the wrong tone for my later comments, so I apologize for that. I also picked up this comment, which I'd somehow missed the first time around.

Also, I know developers do look into whether code is used or not using the code search tool, so it is a good idea to have your code included in the tool if it is public per instructions in https://www.mediawiki.org/wiki/Codesearch.

Thank you for that suggestion, Nikerabbit! I don't think we want our old MW 1.1x code on Codesearch at this particular time, but once I'm finished redeveloping it, I'll discuss the idea with the site owner and see what he thinks.

Redirecting this task to the actual title (refactoring Parser.php): I realized over the past few days that some of the linked dependencies relating to deprecating the "clone" and "resetOutput" functions of Parser.php don't actually block the insertion of a new abstract base class in the hierarchy. The fundamental blocker was code which did "new Parser" because that would break if the Parser class became abstract. But that's been deprecated and removed for some years now. Code which clones an existing Parser will still work if you're holding on to a LegacyParser object, so is not blocking our next refactor step.