Page MenuHomePhabricator

Refactor Parser.php to allow alternate parser (Parsoid)
Open, MediumPublic

Description

The Parser.php class has over a hundred public methods, for mostly historical reasons.

We need to clean this up and create a robust API that can be implemented by Parsoid in order to (eventually) allow replacement of the legacy parser.

This is the tracking bug for that work.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 583750 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Deprecate Parser::enableOOUI()

https://gerrit.wikimedia.org/r/583750

Change 583750 merged by jenkins-bot:
[mediawiki/core@master] Deprecate Parser::enableOOUI()

https://gerrit.wikimedia.org/r/583750

Change 587563 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/TemplateData@master] Don't use deprecated Parser::enableOOUI() hook

https://gerrit.wikimedia.org/r/587563

Change 587565 had a related patch set uploaded (by Reedy; owner: C. Scott Ananian):
[mediawiki/extensions/TemplateData@wmf/1.35.0-wmf.27] Don't use deprecated Parser::enableOOUI() hook

https://gerrit.wikimedia.org/r/587565

Change 587566 had a related patch set uploaded (by Reedy; owner: C. Scott Ananian):
[mediawiki/extensions/TemplateData@wmf/1.35.0-wmf.26] Don't use deprecated Parser::enableOOUI() hook

https://gerrit.wikimedia.org/r/587566

Change 587565 merged by jenkins-bot:
[mediawiki/extensions/TemplateData@wmf/1.35.0-wmf.27] Don't use deprecated Parser::enableOOUI() hook

https://gerrit.wikimedia.org/r/587565

Change 587566 merged by jenkins-bot:
[mediawiki/extensions/TemplateData@wmf/1.35.0-wmf.26] Don't use deprecated Parser::enableOOUI() hook

https://gerrit.wikimedia.org/r/587566

Change 587563 merged by jenkins-bot:
[mediawiki/extensions/TemplateData@master] Don't use deprecated Parser::enableOOUI() hook

https://gerrit.wikimedia.org/r/587563

Mentioned in SAL (#wikimedia-operations) [2020-04-08T18:16:21Z] <reedy@deploy1001> Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: T236809 (duration: 01m 10s)

Mentioned in SAL (#wikimedia-operations) [2020-04-08T18:17:40Z] <reedy@deploy1001> Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: T236809 (duration: 01m 06s)

Change 589338 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Deprecate old-style accessor/mutation methods of Parser

https://gerrit.wikimedia.org/r/589338

Change 589338 merged by jenkins-bot:
[mediawiki/core@master] Deprecate old-style accessor/mutation methods of Parser

https://gerrit.wikimedia.org/r/589338

Change 589459 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Deprecate unused parser-related hooks

https://gerrit.wikimedia.org/r/589459

Change 589463 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Deprecate InternalParseBeforeLinks hook

https://gerrit.wikimedia.org/r/589463

Change 589459 merged by jenkins-bot:
[mediawiki/core@master] Deprecate infrequently-used parser-related hooks

https://gerrit.wikimedia.org/r/589459

ParserFetchTemplate was deprecated without replacement, but it is used in a patch that relates to Language Team's active sprint work: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/573450

Edit: Replacement was suggested in RELEASE_NOTES: BeforeParserFetchTemplateAndTitle

I see two problems with this:

  • Have to support two hooks simultaneously. Can be messy if there is no feature detection to check which one should be handled.
  • We cannot actually do the thing we want with BeforeParserFetchTemplateAndTitle, because it does not allow changing the title or dependencies (only revision of the provided title can be changed), so template tracking would be wrong due title / version mismatch.

I think BeforeParserFetchTemplateAndTitle is the correct hook, although maybe we need it to work slightly differently to allow changing the title as well as the revision. I agree w/ not wanting to support two hooks simultaneously, which is why I deprecated ParserFetchTemplate in the first place. ParserFetchTemplate doesn't let you override the title either, you are just faking it by replacing the text (after we've already fetched the original template in the wrong language) and rewriting the deps.

I think you don't actually need to rewrite the $title; the code in statelessFetchTemplate seems to already handle the case where the $revId returned belongs to a different page than $title (it handles this as a redirect). So I think that should "just work".

But if it doesn't, I can make the second parameter to BeforeParserFetchTemplateAndTitle (with proper capitalization, this is a hook I just added) pass-by-reference, to allow changing the title as well as the ID.

I think that's a better solution, ParserFetchTemplate happens at the "wrong" time (after the 'wrong' template is already fetched) and as I said I'd prefer not to have to support two different ways of doing the same thing, especially as we're about to refactor Parser.php pretty drastically.

(Also note, there was a capitalization change to the BeforeParserFetchTemplateAndTitle hook in 1.35, but it has existed for a long time so backwards-compat shouldn't be a problem. The new hooks mechanism coming in 1.35 should make it even easier to keep backwards-compat...)

Change 589459 merged by jenkins-bot:
[mediawiki/core@master] Deprecate infrequently-used parser-related hooks

https://gerrit.wikimedia.org/r/589459

This patch claims the InternalParseBeforeSanitize hook would be "infrequently-used". This hook is used for example in the Variables extension, which is among the Top 10 most-deployed extensions that are not used by WMF. The InternalParseBeforeLinks hook can be used instead, if Variables sanitizes its output independently (again, as it did before its author introduced this hook in MW 1.20), but this does not seem to be a sustainable alternative either.

Both alternative hook suggestions won't work here, as the Variables extension is dependent on having all MediaWiki variables replaced before outputting the final value of the variables, while the final values are needed for the correct parsing of other stuff like links and tables. There is no migration path possible here, except for dropping support for the #var_final parser function altogether.

I'm a bit disappointed to see how this is handled. There is no hint in docs/hooks.txt what the proposed replacements are for these deprecations. No hint on the mediawiki.org pages either. No comment in the code.

As of now, extensions like Wigo3 are broken, with no easy way out. In this particular case it's the ParserPreSaveTransformComplete hook, which was just introduced in 1.35 and deprecated in the same version. o_O? @tstarling?

The release notes currently say:

* The following parser-related hooks have been deprecated:
  […]
  - ParserSectionCreate
    * No replacement; <section> tag wrapping will be done by core in future.
  - ParserPreSaveTransformComplete
    * No replacement; Content::preSaveTransform() provides for customizable PSTs
  - BeforeParserrenderImageGallery
    * No replacement; MediaHandler provides for customizable media rendering

I'm not exited to see "no replacement". What I wish we would have done is to first get rid of deprecated calls in codebases we actively monitor, and only then hard-deprecate stuff. What are the next steps now? Rewrite code that relies on these hooks? Who is supposed to do this? When? Consuming which budget? And a rewrite seems what's now necessary to keep the Wigo3 extension running.

I don't own any of the codebases in question, so I have not much reason to care about them. I'm not even paid to look after them. But what I care about is that we try to help volunteers as good as we can. And I think we can do better.

I'm a bit disappointed to see how this is handled. There is no hint in docs/hooks.txt what the proposed replacements are for these deprecations. No hint on the mediawiki.org pages either. No comment in the code.

As of now, extensions like Wigo3 are broken, with no easy way out. In this particular case it's the ParserPreSaveTransformComplete hook, which was just introduced in 1.35 and deprecated in the same version. o_O? @tstarling?

The release notes currently say:

* The following parser-related hooks have been deprecated:
  […]
  - ParserSectionCreate
    * No replacement; <section> tag wrapping will be done by core in future.
  - ParserPreSaveTransformComplete
    * No replacement; Content::preSaveTransform() provides for customizable PSTs
  - BeforeParserrenderImageGallery
    * No replacement; MediaHandler provides for customizable media rendering

I'm not exited to see "no replacement". What I wish we would have done is to first get rid of deprecated calls in codebases we actively monitor, and only then hard-deprecate stuff.

We have. We monitor Wikimedia production code only. (Well, plus the ReplaceText extension which is in the tarball.)

What are the next steps now? Rewrite code that relies on these hooks? Who is supposed to do this? When? Consuming which budget? And a rewrite seems what's now necessary to keep the Wigo3 extension running.

Maintainers of said extensions are welcome to try to migrate. In some cases it won't be possible, because they're using a feature we're explcitly removing. This is known and accepted fallout from the Parser replacement work.

Change 598783 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Un-deprecate the ParserPreSaveTransformComplete hook

https://gerrit.wikimedia.org/r/598783

In the specific case of ParserPreSaveTransformComplete I might have been a bit over-eager. I'd forgotten the discussion we had in https://gerrit.wikimedia.org/r/571109 about this hook. Although it's true that Parsoid doesn't currently support exposing the PST, it probably will have to eventually. The part that makes me most nervous about the hook is actually the fact that it exposes the Parser object, which is likely to be refactored RSN. But if the hook just modifies the $text it's probably "fine".

Anyway, https://gerrit.wikimedia.org/r/598783 removes the deprecation.

Can we do the same for the InternalParseBeforeSanitize hook?

We have.

Depends on the definition of "monitoring". I prefer this one. To be fair, it's not as bad as I made it sound like. But still.

This is known and accepted fallout […]

Known to whom? Seriously. Neither of the places I got confronted with (the deprecation warning, hooks.txt, release notes) mention any reasoning for the deprecation. That's what I ask you to fix.

By the way, what I really do not like here is that there is no reasonable path forward for extension developers. If I replace hooks by other hooks that are not deprecated yet, there is a high probablity that those other hooks will be deprecated a few months later, for example the InternalParserBeforeLinks hook.

This is partly my fault - in that I haven't been upfront about communicating the entire plan. We'll fix that. We have been working on a document that lays out the transition plans to Parsoid and the implications of the change.

The deprecations that @cscott has been doing is one step towards that path but at this time, these are all in the nature of early warnings that these hooks will not have direct equivalents in Parsoid. I clarified some of that in T250963#6080908. At least for the Wikimedia wikis, extensions that rely on the hooks will have to be updated to make them compatible with Parsoid. But, for non-Wikimedia / 3rd-party wikis, this is optional. If they wish to, they can continue to operate without Parsoid and hence the deprecations are not as much of an issue. That said, we figured it is better to simply deprecate functionality that is not used by anyone (based on codesearch). In some cases, it may simply be an honest mistake and we can revert back. But, once again, to re-emphasize, independent of whether we deprecate a hook or not, if it doesn't have a Parsoid equivalent, it will not be supported when a wiki switches over to Parsoid.

Our first goal is to effect the Parsoid switchover on Wikimedia wikis and all our efforts around extension APIs are focused on that transition. The common extensions are already Parsoid-compatible. Others like ImageMap are being made Parsoid-compatible ( T94793, https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/585344 ). Non-Wikimedia wikis aren't expected to switch over to Parsoid and might prefer for all the early kinks and bumps to be ironed out anyway.

Anyway, TLDR we'll publish a more fleshed out document outlining the transition path and the impacts on extensions.

The problem of deprecations is that they can not just be ignored by extension developers, as they make unit tests fail. Such, jenkins awards -1 Verified to every changeset uploaded for an extension that does use them.

Thanks a lot for the longer response! I wonder: It sounds like you do not plan to ever remove these hooks? Why hard-deprecate them then? Shouldn't a soft deprecation (only via @deprecated) be enough then? As said above, the hard deprecations break CI for the affected codebases.

If a hook is not used OR if there is a better replacement, it does make sense to remove them. That is the strategy that @cscott (and @Jdforrester-WMF) have pursued. Based on this feedback about some of those hooks, we can revisit their deprecation. But, note that once Parsoid is the default, at some later release, we might remove the current parser (and alongwith it all the hooks) out of MediaWiki and into a library / extension. The exact strategy / roadmap is unclear at this point since it is too early, but a lot of these hooks will be removed OR move out of MediaWiki core at some point.

But, in the interim, yes, we will revisit hard deprecations of hooks where it is unnecessarily disruptive - as I said earlier, I suspect it is just something that was overlooked or didn't show up in code search or production monitoring.

@thiemowmde and @MGChecker: there are ways to mark your codebase to acknowledge your use of a deprecated hook that won't break CI, but they are awkward. I've filed T253768: No easy way to suppress hard-deprecation warnings for hooks for this; you might want to comment there citing your experience.

@cscott, the issue is the SpecialPageFatalTest that core enforces on the special pages provided by an extension: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-noselenium-docker/10570/console. Am I supposed to add some kind of catch-all $this->filterDeprecated( '//' ) to core to fix the browser tests failing in Wigo3?

Adding @Addshore as he wrote the SpecialPageFatalTest. To me it looks like this test should indeed suppress all deprecation warnings. The purpose of this test is to (quoting Addshore) "make sure that special pages do not fatal in their most basic form (anon user viewing the page)." Deprecation warnings don't show up for users. I don't think they should make this test fail.

@cscott, the issue is the SpecialPageFatalTest that core enforces on the special pages provided by an extension: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-noselenium-docker/10570/console. Am I supposed to add some kind of catch-all $this->filterDeprecated( '//' ) to core to fix the browser tests failing in Wigo3?

Use $wgDevelopmentWarnings = false; (over-riding DevelopmentSettings.php) if you want to hide pending breakage from yourself.

Adding @Addshore as he wrote the SpecialPageFatalTest. To me it looks like this test should indeed suppress all deprecation warnings. The purpose of this test is to (quoting Addshore) "make sure that special pages do not fatal in their most basic form (anon user viewing the page)." Deprecation warnings don't show up for users. I don't think they should make this test fail.

No. This code is explicitly intended to catch exactly this. It's a core concern of CI to catch bad code, including using deprecated codepaths.

Use $wgDevelopmentWarnings = false;

In https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/refs/heads/master/zuul/layout.yaml#7401? How?

It's a core concern of CI to catch bad code, including using deprecated codepaths.

Huh? No. Sorry. This is just not true. Deprecations – even hard ones via wfDeprecated() – never cause any random CI job to fail, except this one bad special page test. This is a bug.

Change 599608 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/core@master] Fix SpecialPageFatalTest failing on unrelated deprecations

https://gerrit.wikimedia.org/r/599608

Change 598783 merged by jenkins-bot:
[mediawiki/core@master] Un-deprecate the ParserPreSaveTransformComplete hook

https://gerrit.wikimedia.org/r/598783

Change 599608 merged by jenkins-bot:
[mediawiki/core@master] Fix SpecialPageFatalTest failing on unrelated deprecations

https://gerrit.wikimedia.org/r/599608

Didn't get this entirely done for 1.35; retargetting for 1.36.

Change 622623 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Remove Parser::setFunctionTagHook(), deprecated in 1.35

https://gerrit.wikimedia.org/r/622623

Change 622681 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/Description2@master] Remove <metadesc>

https://gerrit.wikimedia.org/r/622681

Change 622681 merged by jenkins-bot:
[mediawiki/extensions/Description2@master] Remove <metadesc>

https://gerrit.wikimedia.org/r/622681

Change 622623 merged by jenkins-bot:
[mediawiki/core@master] Remove Parser::setFunctionTagHook(), deprecated in 1.35

https://gerrit.wikimedia.org/r/622623

Change 634786 had a related patch set uploaded (by Paladox; owner: C. Scott Ananian):
[mediawiki/extensions/Description2@REL1_35] Remove <metadesc>

https://gerrit.wikimedia.org/r/634786

Change 634786 merged by jenkins-bot:
[mediawiki/extensions/Description2@REL1_35] Remove <metadesc>

https://gerrit.wikimedia.org/r/634786